CN110782622A - Safety monitoring system, safety detection method, safety detection device and electronic equipment - Google Patents

Safety monitoring system, safety detection method, safety detection device and electronic equipment Download PDF

Info

Publication number
CN110782622A
CN110782622A CN201810826730.5A CN201810826730A CN110782622A CN 110782622 A CN110782622 A CN 110782622A CN 201810826730 A CN201810826730 A CN 201810826730A CN 110782622 A CN110782622 A CN 110782622A
Authority
CN
China
Prior art keywords
voice data
emotional
target
determining
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810826730.5A
Other languages
Chinese (zh)
Inventor
李婉瑜
陈展
周洪伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201810826730.5A priority Critical patent/CN110782622A/en
Publication of CN110782622A publication Critical patent/CN110782622A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B25/00Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems
    • G08B25/01Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems characterised by the transmission medium
    • G08B25/012Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems characterised by the transmission medium using recorded signals, e.g. speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The embodiment of the invention provides a safety monitoring system, a safety detection method, a safety detection device and electronic equipment. The system comprises: the object side equipment is used for acquiring voice data, determining an emotional state recognition result of the voice data, and sending warning information to monitoring side equipment corresponding to the object side equipment when the emotional state recognition result contains at least one preset emotional state; wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: alerting the target object of information of a current occurrence of a security event; and the monitoring terminal equipment is used for receiving the warning information and outputting the warning information. By applying the embodiment of the invention, the safety warning can be effectively carried out in time when the object faces the safety event.

Description

Safety monitoring system, safety detection method, safety detection device and electronic equipment
Technical Field
The present invention relates to the field of security monitoring, and in particular, to a security monitoring system, a security detection method, a security detection device, and an electronic apparatus.
Background
In view of the frequent child safety events in recent years, in order to ensure the safety of children, schools in which children are located usually monitor the safety of children. Specifically, the adopted monitoring method mainly utilizes a plurality of cameras in the school to record the behavior and activity of the children in the school.
However, videos shot by the camera are generally used for restoring true phases of events after the safety events occur, so that help is provided for responsibility judgment and dispute resolution of the child safety events, and when the safety events occur, safety warning cannot be timely and effectively performed, so that help is timely provided for children.
Therefore, how to timely and effectively warn when a concerned object such as a child faces a safety event is a problem to be solved urgently.
Disclosure of Invention
The embodiment of the invention aims to provide a safety monitoring system, a safety detection method, a safety detection device and electronic equipment, so as to achieve the purpose of timely and effectively carrying out safety warning when an object faces a safety event. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a security monitoring system, where the system includes:
the object side equipment is used for acquiring voice data, determining an emotional state recognition result of the voice data, and sending warning information to monitoring side equipment corresponding to the object side equipment when the emotional state recognition result contains at least one preset emotional state; wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: alerting the target object of information of a current occurrence of a security event;
and the monitoring terminal equipment is used for receiving the warning information and outputting the warning information.
Optionally, the warning information includes: the voice data and/or the emotion state recognition result.
Optionally, the object-side device is further configured to: prior to determining the emotion state recognition result of the voice data,
and judging whether the sender of the voice data is a target object holding the object end equipment, and if so, executing the step of determining the emotional state recognition result of the voice data.
Optionally, the determining, by the target device, whether the sender of the voice data is a target object holding the target device includes:
extracting target voiceprint characteristics of the voice data;
judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object;
wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment.
Optionally, the determining, by the object-side device, whether the target voiceprint feature matches a reference voiceprint feature includes:
inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence coefficient, wherein the voiceprint model is obtained by training based on the reference voiceprint features;
judging whether the voiceprint confidence is larger than a preset voiceprint threshold value or not;
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
Optionally, the determining, by the object-side device, an emotion state recognition result of the voice data includes:
extracting target emotional characteristics of the voice data, wherein the target emotional characteristics are as follows: features related to an emotional state of the speech data;
determining the matching degree of the target emotional features and each reference emotional feature, wherein the reference emotional features are as follows: the target end equipment pre-stores characteristics of sample voice data related to each preset emotional state, wherein the preset emotional state is fear, 24696cry or scream, and the reference emotional characteristics are obtained based on the sample voice data of the target object;
and determining the emotion state recognition result of the voice data based on the matching degrees.
Optionally, the determining, by the object-side device, the matching degree between the target emotional feature and each reference emotional feature includes:
inputting the target emotional characteristics into a pre-trained emotional model to obtain emotional confidence coefficients corresponding to all preset emotional states, wherein the emotional model is obtained by training based on all reference emotional characteristics;
the determining the emotion state recognition result of the voice data based on the matching degrees comprises the following steps:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value;
and if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
Optionally, the security monitoring system further includes: a monitoring server;
the sending of the warning information to the monitoring terminal equipment corresponding to the object terminal equipment by the object terminal equipment comprises the following steps:
the object terminal equipment sends the warning information to the monitoring server;
and the monitoring server is used for receiving the warning information sent by the object terminal equipment, determining monitoring personnel information corresponding to the target object, and sending the warning information to the monitoring terminal equipment held by the monitoring personnel based on the monitoring personnel information.
In a second aspect, an embodiment of the present invention provides a security detection method, which is applied to an object side device in a security monitoring system, where the security monitoring system further includes a monitoring side device, and the method includes:
obtaining voice data;
determining an emotional state recognition result of the voice data;
when the emotional state identification result contains at least one preset emotional state, sending warning information to monitoring end equipment corresponding to the object end equipment, so that the monitoring end equipment receives the warning information and outputs the warning information;
wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: and warning the target object of the current safety event.
Optionally, the warning information includes: the voice data and/or the emotion state recognition result.
Optionally, before determining the emotion state recognition result of the voice data, the method further includes:
and judging whether the sender of the voice data is a target object holding the object end equipment, and if so, executing the step of determining the emotional state recognition result of the voice data.
Optionally, the determining whether the sender of the voice data is a target object holding the object-side device includes:
extracting target voiceprint characteristics of the voice data;
judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object;
wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment.
Optionally, the determining whether the target voiceprint feature is matched with the reference voiceprint feature includes:
inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence coefficient, wherein the voiceprint model is obtained by training based on the reference voiceprint features;
judging whether the voiceprint confidence is larger than a preset voiceprint threshold value or not;
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
Optionally, the determining the emotion state recognition result of the voice data includes:
extracting target emotional characteristics of the voice data, wherein the target emotional characteristics are as follows: features related to an emotional state of the speech data;
determining the matching degree of the target emotional features and each reference emotional feature, wherein the reference emotional features are as follows: the target end equipment pre-stores characteristics of sample voice data related to each preset emotional state, wherein the preset emotional state is fear, 24696cry or scream, and the reference emotional characteristics are obtained based on the sample voice data of the target object;
and determining the emotion state recognition result of the voice data based on the matching degrees.
Optionally, the determining the matching degree between the target emotional feature and each reference emotional feature includes:
inputting the target emotional characteristics into a pre-trained emotional model to obtain emotional confidence coefficients corresponding to all preset emotional states, wherein the emotional model is obtained by training based on all reference emotional characteristics;
the determining the emotion state recognition result of the voice data based on the matching degrees comprises the following steps:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value;
and if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
Optionally, the security monitoring system further includes: a monitoring server;
the sending of the warning information to the monitoring terminal device corresponding to the object terminal device includes:
and sending the warning information to the monitoring server so that the monitoring server receives the warning information, determines monitoring personnel information corresponding to the target object, and sends the warning information to monitoring end equipment held by the monitoring personnel based on the monitoring personnel information.
In a third aspect, an embodiment of the present invention provides a security detection apparatus, which is applied to an object side device in a security monitoring system, where the security monitoring system further includes a monitoring side device, and the apparatus includes:
an obtaining module for obtaining voice data;
the determining module is used for determining the emotion state recognition result of the voice data;
the sending module is used for sending warning information to the monitoring end equipment corresponding to the object end equipment when the emotional state identification result contains at least one preset emotional state, so that the monitoring end equipment receives the warning information and outputs the warning information;
wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: and warning the target object of the current safety event.
Optionally, the warning information includes: the voice data and/or the emotion state recognition result.
Optionally, the apparatus further comprises:
and the judging module is used for judging whether a sender of the voice data is a target object holding the object end equipment before determining the emotional state recognition result of the voice data, and if so, executing the step of determining the emotional state recognition result of the voice data.
Optionally, the determining module includes:
the first extraction submodule is used for extracting the target voiceprint characteristics of the voice data;
the judging submodule is used for judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object;
wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment.
Optionally, the determining submodule is specifically configured to:
inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence coefficient, wherein the voiceprint model is obtained by training based on the reference voiceprint features;
judging whether the voiceprint confidence is larger than a preset voiceprint threshold value or not;
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
Optionally, the determining module includes:
the second extraction submodule is used for extracting target emotional characteristics of the voice data, wherein the target emotional characteristics are as follows: features related to an emotional state of the speech data;
the first determining submodule is used for determining the matching degree of the target emotional features and each reference emotional feature, wherein the reference emotional features are as follows: the target end equipment pre-stores characteristics of sample voice data related to each preset emotional state, wherein the preset emotional state is fear, 24696cry or scream, and the reference emotional characteristics are obtained based on the sample voice data of the target object;
and the second determining submodule is used for determining the emotion state recognition result of the voice data based on each matching degree.
Optionally, the first determining submodule is specifically configured to:
inputting the target emotional characteristics into a pre-trained emotional model to obtain emotional confidence coefficients corresponding to all preset emotional states, wherein the emotional model is obtained by training based on all reference emotional characteristics;
the second determining submodule is specifically configured to:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value;
and if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
Optionally, the security monitoring system further includes: a monitoring server;
the sending module is specifically configured to:
and sending the warning information to the monitoring server so that the monitoring server receives the warning information, determines monitoring personnel information corresponding to the target object, and sends the warning information to monitoring end equipment held by the monitoring personnel based on the monitoring personnel information.
In a fourth aspect, embodiments of the present invention provide an electronic device, including a processor and a memory, wherein,
the memory is used for storing a computer program;
the processor is configured to implement the steps of the security detection method provided by the embodiment of the present invention when executing the program stored in the memory.
In the scheme provided by the embodiment of the invention, the voice data of the target object is obtained by the object side equipment, and whether the emotional state of the target object is the preset emotional state when facing the security event is detected based on the voice data. If so, sending warning information to corresponding monitoring end equipment, and receiving and outputting the warning information by the monitoring end equipment, thereby achieving the purpose of warning the current occurrence of the safety event of the target object. Therefore, the scheme provided by the embodiment of the invention can effectively warn safety in time when the object faces a safety event.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a security monitoring system according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a security monitoring system according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a security detection method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a security detection apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to effectively warn safety in time when an object faces a safety event, the embodiment of the invention provides a safety monitoring system, a safety detection method, a safety detection device, electronic equipment and a storage medium.
First, the principle and feasibility of the speech emotion recognition technology according to the embodiment of the present invention are briefly described:
the reason why human beings can capture the emotional state changes of the other party by listening to the voice is that the human brain has the ability to perceive and understand the information (such as special tone words, tone changes, etc.) in the voice data, which can reflect the emotional state of the speaker. Automatic speech emotion recognition is the automatic recognition of the emotional state of speech data input by a computer or the like. The automatic speech emotion recognition is the simulation of the emotion perception and understanding process of human beings by a computer, and the task of the automatic speech emotion recognition is to extract acoustic features representing emotional states from collected speech data and find out the mapping relation between the acoustic features and the human emotional states.
The voice data with different tone expressions has different structural characteristics and distribution rules in the aspects of time structure, amplitude structure, fundamental frequency structure, formant structure and other characteristics. Therefore, the structural characteristics and the distribution rules of the voice data of various specific modes in the aspects of time structure, amplitude structure, fundamental frequency structure, formant structure and other characteristics can be measured, calculated and analyzed, and the emotional states implicit in all the voice data can be identified on the basis or by using the structural characteristics and the distribution rules as templates.
When a human is faced with a security event, the expressed speech may contain naturally exposed emotional states. Especially children, are less likely to disguise their mood in the face of danger or accidents. Therefore, the scheme provided by the embodiment of the invention is feasible for identifying whether the emotional state of the object is the emotional state facing the security problem or not based on the voice data of the object.
Next, a safety monitoring system provided in an embodiment of the present invention is described first.
As shown in fig. 1, a security monitoring system provided in an embodiment of the present invention may include:
the object side device 110 is configured to obtain voice data, determine an emotion state recognition result of the voice data, and send warning information to a monitoring side device corresponding to the object side device when the emotion state recognition result includes at least one preset emotion state; wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: and warning the target object of the current safety event.
And the monitoring end device 120 is configured to receive the warning information and output the warning information.
For clarity, the following describes each component of the security monitoring system:
(1) the introduction contents for the object side device are as follows:
in the embodiment of the present invention, the object-side device may be an intelligent device, and the functions include, but are not limited to, sound collection, data processing, data transmission, data reception, and the like. The object side equipment can be a smart phone, a smart band and other intelligent wearing equipment.
Embodiments of the present invention are directed to objects including, but not limited to, children, for example, elderly people and patients who need attention.
The following describes the target device in detail with reference to different functions of the target device.
1) Obtaining voice data;
in this embodiment of the present invention, the manner of obtaining the voice data by the object-side device may be: and acquiring voice data by a sound collector and the like in the object side equipment. The sound collector may be a microphone or other voice recording device.
Optionally, in this embodiment of the present invention, the object-side device is further configured to:
before the voice data is obtained, sound signals are collected and processed.
It is understood that the sound signal collected by the object-side device may include various sounds, such as animal sound, music sound, and the like, and the object-side device may digitize and process the sound signal and obtain voice data.
Wherein, the processing of the sound signal by the object-side device may include: the sound signal is subjected to digital transform processing in the time domain or the frequency domain, background noise in the sound signal is removed, a human sound signal is extracted, and a valid period in the sound signal is determined, and the like.
It can be understood that, through the processing of the sound signal, effective sound data which can be used for emotion state recognition and is related to human beings can be obtained as voice data, a basis can be provided for subsequent emotion state recognition, and the efficiency of the subsequent emotion state recognition is improved.
It should be noted that, the collection process of the sound signal may be completed by the sound collector. The processing process of the sound signal can be completed by the sound collector, and can also be completed by other software modules, such as a signal processing module and the like. The method used for processing the audio signal may be any of various existing audio signal processing methods, and is not limited herein.
2) And determining the emotional state recognition result of the voice data.
In this embodiment of the present invention, the process of determining the emotion state recognition result of the voice data by the object-side device may include the following steps a to c:
step a, extracting target emotional characteristics of the voice data.
In the embodiment of the present invention, the target device may extract a plurality of target emotion features for the voice data. It can be understood that the target emotional characteristics are: features relating to an emotional state of the speech data. Such as features related to emotional states in terms of temporal configurations, amplitude configurations, fundamental frequency configurations, or formant configurations, etc.
The target emotion characteristics of the voice data are extracted by the object side equipment, and various existing characteristic extraction methods such as a characteristic extractor can be adopted.
The obtained target emotional feature is a multi-dimensional feature vector.
And b, determining the matching degree of the target emotional characteristics and each reference emotional characteristic.
In the embodiment of the present invention, the reference emotional characteristics are: and the characteristics of the sample voice data about each preset emotional state, which are prestored in the object side equipment. The reference emotional characteristic is obtained based on sample voice data of the target object. The determination of the reference emotional characteristics is described later. The preset emotional state is fear, 246966, crying or screaming. It is emphasized that, in the embodiment of the present invention, the predetermined emotional states include, but are not limited to, the three emotional states.
Because the reference emotional characteristics have corresponding relations with the preset emotional states, the emotional states corresponding to the voice data, the matching degrees with the preset emotional states and the like can be determined subsequently by determining the matching degrees of the target emotional characteristics and the reference emotional characteristics, so that the emotional state recognition result is determined.
In the embodiment of the invention, the matching degree of the target emotional characteristic and each reference emotional characteristic can be determined by adopting methods such as characteristic comparison and the like. The matching degree may be a probability of representing similarity, and the like.
Optionally, in this embodiment of the present invention, the determining, by the object-side device, the matching degree between the target emotional feature and each reference emotional feature includes:
and inputting the target emotion characteristics into a pre-trained emotion model to obtain emotion confidence degrees corresponding to all preset emotion states.
Wherein the emotion confidence is: the target emotional feature is the probability of the reference emotional feature corresponding to the preset emotional state, for example, the emotional confidence corresponding to the fear is 90%, which indicates that the probability that the target emotional feature is the reference emotional feature corresponding to the fear is 90%.
And the emotion model is obtained by training based on each reference emotion characteristic. For example, the emotion model may be a neural network trained based on various reference emotion features. The emotion model property can simply and quickly output the emotion confidence corresponding to each preset emotion state for any input emotion feature. The determination of the emotional model is described later.
And c, determining the emotion state recognition result of the voice data based on each matching degree.
In the embodiment of the present invention, the target device may determine the emotion state recognition result of the voice data by using a plurality of methods based on each matching degree. For example, the values of the matching degrees may be compared, and the preset emotional state corresponding to the matching degree with the highest value may be determined as the emotional state recognition result of the target object.
Optionally, in this embodiment of the present invention, the process of determining the emotion state recognition result of the voice data by the object-side device based on each matching degree may be:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: and the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value.
And if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
It is understood that the preset emotional state may not be included in the voice data of the target object enough to be concerned. Therefore, in the embodiment of the invention, the emotion threshold value related to the emotion confidence coefficient can be set for each preset emotion state, so that the effectiveness of emotion state identification can be improved.
When an emotion confidence is greater than a corresponding emotion threshold, it indicates that a preset emotional state corresponding to the emotion confidence in the speech data has reached a degree that needs to be focused on, and then it may be determined that the emotion state recognition result includes the preset emotional state that needs to be focused on. For example, the emotion threshold corresponding to fear is 60%, and then, when the emotion confidence of the speech data about fear is greater than 60%, it may be determined that the emotion state recognition result of the speech data includes fear.
When the emotion confidence degrees are not greater than the corresponding emotion threshold values, the preset emotion states in the voice data do not reach the attention required degree, and then it can be determined that the emotion state recognition result of the voice data comprises a normal state.
It should be noted that the emotion confidence threshold may be set according to the actual situation of the corresponding preset emotion state, and the emotion threshold of each emotion confidence may be different.
In combination with the above, in the embodiment of the present invention, the obtained emotional state identification result may include a normal state or include at least one preset emotional state.
Optionally, in this embodiment of the present invention, the object-side device is further configured to: prior to determining the emotion state recognition result of the voice data,
and judging whether the sender of the voice data is a target object holding the object end equipment, and if so, executing the step of determining the emotional state recognition result of the voice data.
It can be understood that the object side device and the object have a corresponding relationship, and the object side device can perform emotion state recognition on voice data of a target object corresponding to the object side device. Therefore, after obtaining the voice data and before determining the emotion state recognition result of the voice data, the target device needs to determine whether the sender of the voice data is a target object holding the target device, if so, the subsequent step of determining the emotion state recognition result of the voice data is performed, and if not, the subsequent step of determining the emotion state recognition result of the voice data is not performed.
It should be noted that, in general, the object-side device may correspond to only one target object; however, for some special application scenarios, the object-side device may correspond to multiple target objects, for example, if one possible application scenario is that multiple patients in a ward share one object-side device, the target object corresponding to the object-side device may be each patient in the ward, and so on.
In the embodiment of the present invention, various methods may be employed to determine whether the sender of the voice data is a target object holding the target device, for example, to determine whether the identity code of the object in the voice data is correct, and if so, to determine that the sender of the voice data is the target object holding the target device, and so on.
Optionally, in this embodiment of the present invention, the determining, by the target device, whether the sender of the voice data is a target object holding the target device includes:
and A, extracting the target voiceprint characteristics of the voice data.
In the embodiment of the present invention, the target device may extract a plurality of target voiceprint features for the voice data. It is understood that the target voiceprint features are: a feature relating to a speaker of the voice data. Such as features related to the emitter in terms of temporal, amplitude, fundamental or formant configurations.
The target voiceprint feature of the voice data is extracted by the object side equipment, and various existing feature extraction methods can be adopted. And will not be illustrated here.
It should be noted that the obtained target voiceprint feature is a multi-dimensional feature vector.
B, judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object;
wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment. The determination of the reference voiceprint characteristics is described later.
Since the reference voiceprint feature has a corresponding relationship with the target object, the target device may determine whether the target voiceprint feature matches the reference voiceprint feature by using a feature comparison method or the like, so as to determine whether the sender of the voice data is the target object.
Optionally, in this embodiment of the present invention, the determining, by the object-side device, whether the target voiceprint feature matches a reference voiceprint feature includes:
and B1, inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence, wherein the voiceprint model is obtained by training based on the reference voiceprint features.
Wherein the voiceprint confidence is: the probability that the target voiceprint feature is the reference voiceprint feature, for example, the voiceprint confidence is 70%, which means that the probability that the target voiceprint feature is the reference voiceprint feature of the target object is 70%.
The voiceprint model is obtained based on reference voiceprint feature training. For example, the voiceprint model may be a neural network trained based on reference voiceprint features. The reference voiceprint modellability can simply and quickly output the corresponding voiceprint confidence to any input voiceprint feature. The process of determining the voiceprint model is described hereinafter.
B2, judging whether the voiceprint confidence is larger than a preset voiceprint threshold value,
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
In practice, since there may be similarities between human and human voices, there may be a similarity between a target voiceprint feature and a reference voiceprint feature of a target object, but the similarity is not enough to determine that the target voiceprint feature belongs to the target object. Therefore, in the embodiment of the present invention, the preset voiceprint threshold may be set according to the actual condition of the voiceprint characteristics of the human, so as to improve the accuracy of the determination. When the voiceprint confidence is greater than a preset voiceprint threshold, the target voiceprint feature can be determined, and the target voiceprint feature is matched with the reference voiceprint feature to represent that the target voiceprint feature is affiliated to the target object.
3) And judging whether the emotion state identification result contains at least one preset emotion state.
As mentioned above, the emotional state recognition result may include a normal state or at least one preset emotional state.
The object side equipment can confirm the emotional state in the emotional state recognition result and judge whether the emotional state recognition result contains at least one of fear, 24696, crying or scream. And when the emotional state identification result comprises at least one preset emotional state, the target object is represented as the emotional state facing the security event, and subsequent security warning needs to be carried out.
4) And if so, sending warning information to the monitoring terminal equipment corresponding to the object terminal equipment.
The target device may send the warning information to the monitoring device corresponding to the target device by using a plurality of methods, for example, sending the warning information to the monitoring device corresponding to the target device by using a sending module in the target device. The specific sending form can be short message, WeChat, mail, etc.
The warning information may include voice, text, pictures, and the like. The warning information may be preset information, such as a preset voice or text for "the target object has a safety event, please note," or a picture containing safety warning content.
Optionally, in an embodiment of the present invention, the warning information includes: the voice data and/or the emotion state recognition result. Of course, the warning information may also include the current time and the geographic location, etc. Therefore, the corresponding monitoring terminal equipment can obtain the relevant information which is relative to the target object and has richer content and more detail, and the safety event of the target object can be effectively solved.
The following is a brief description of the determining process of the voiceprint model and the emotion model by the object side device.
In the embodiment of the present invention, the voiceprint model and the emotion model may be a neural network, specifically:
the determining process of the voiceprint model comprises the following steps:
① determine a first neural network.
The first network may be any neural network that is currently available.
② sample speech data of the target object is obtained.
Sample voice data of a target object within a preset time period may be obtained, and the preset time period may be half a year, and the like.
③ extract reference voiceprint features of the sample speech data.
④ training the first neural network by using the voiceprint features to obtain a second neural network, and using the second neural network as a voiceprint model.
The emotion model determining process comprises the following steps:
① determining a third neural network;
the third network may be any existing neural network.
② sample voice data of the target object under the preset emotional state is obtained.
Sample voice data in a preset emotional state within a preset time period can be obtained, wherein the preset time period can be half a year, a year and the like.
③ extracting reference emotional features of the sample voice data.
④, training the third neural network by using the reference emotional characteristics and the corresponding preset emotional state to obtain a fourth neural network, and using the fourth neural network as an emotional model.
In the embodiment of the present invention, the reference voiceprint features and the reference emotion features of the sample voice data are extracted by using various existing feature extraction methods, which are not illustrated here.
It is emphasized that the reference vocal print features and the reference emotional features are used for characterizing different aspects of the target object. Therefore, the reference vocal print feature and the reference emotional feature cannot be used alternatively.
It should be noted that, in the embodiment of the present invention, after determining that the voice data belongs to the target object, the target device may store the obtained voice data, for example, in a voiceprint model library about the target object in the target device, so as to use the voice data as sample voice data of a voiceprint model; similarly, after determining the emotion state recognition result of the voice data, the target device may store the obtained voice data, for example, in an emotion model library about the target object in the target device, as sample voice data of an emotion model, and so on.
(2) The introduction contents for the monitoring end device are as follows:
in the embodiment of the present invention, the monitoring end device may be an intelligent device, and the functions include, but are not limited to, data reception and the like. The monitoring terminal equipment can be a smart phone, a smart bracelet and other intelligent wearing equipment and the like.
The monitoring terminal device in the embodiment of the present invention is a device held by a monitoring person corresponding to a target object, where the monitoring person includes, but is not limited to, a teacher, a parent, a doctor, and the like.
The mode of receiving the warning information by the monitoring end device may include that a receiving module in the monitoring end device receives the warning information and the like.
After the monitoring terminal equipment receives the warning information, the warning information can be displayed in various forms. For example, a preset application program is used to display the warning information, and a warning sound can be sent out to remind the monitoring personnel to look up the warning information.
And after receiving the warning information, the monitoring terminal equipment can also call a monitoring video about the target object.
Specifically, the monitoring terminal device may call a monitoring video within a position range of a target object by using a preset camera application program, so that the monitoring personnel may further determine a security event faced by the target object through the monitoring video. For example, parents of the target child can combine the warning information with the monitoring video to further understand and judge the safety event faced by the target child, so that the parents timely and effectively help the target child.
In the scheme provided by the embodiment of the invention, the safety monitoring system consisting of the object side equipment and the monitoring side equipment is provided, the object side equipment can obtain the voice data of the target object, and whether the emotional state of the target object is the preset emotional state when facing the safety event or not is detected based on the voice data. If so, sending warning information to corresponding monitoring end equipment, and receiving and outputting the warning information by the monitoring end equipment, thereby achieving the purpose of warning the object of the current occurrence of the safety event. Therefore, the scheme provided by the embodiment of the invention can effectively warn safety in time when the object faces a safety event.
As another embodiment, referring to fig. 2, fig. 2 is another schematic structural diagram of a security monitoring system according to an embodiment of the present invention, where the monitoring system includes: the object side device 110, the monitoring server 130 and the monitoring side device 120.
The descriptions of the object-side device 110 and the monitor-side device 120 are the same as those of the embodiment shown in fig. 1, and are not repeated herein.
The introduction to the monitoring server 130 is as follows:
and the monitoring server 130 is configured to receive the warning information sent by the object side device, determine monitoring staff information corresponding to the target object, and send the warning information to the monitoring side device held by the monitoring staff based on the monitoring staff information.
In this embodiment of the present invention, the monitoring server is a third-party device serving the object-side device and the monitoring-side device. The monitoring server may be a server of the government, a server of a social supervision or service organization for the object of interest, and the like. The monitoring personnel also include governments, personnel of social supervision or service agencies for the object of interest, and the like.
In the safety monitoring system of the embodiment of the present invention, the sending process of the warning information is as follows: the object side equipment sends the warning information to the monitoring server, and then the monitoring server sends the warning information to the monitoring side equipment held by the monitoring personnel.
The monitoring server can provide forwarding service for the object side equipment and the corresponding monitoring side equipment. It should be noted that the monitoring server stores attribute information of each target object and corresponding monitoring person, where the attribute information may include name, age, and device address. Therefore, the monitoring server can quickly find the monitoring personnel information corresponding to a target object based on the target object, and send the warning information to the monitoring end equipment held by the monitoring personnel based on the monitoring personnel information, such as the address of the monitoring end equipment held by the monitoring personnel.
Of course, the monitoring server may send the warning information to the monitoring end device corresponding to the object end device by using a variety of methods. Moreover, the warning information can have various forms and contents. The details of the above-mentioned part are similar to those of step 4) in the embodiment shown in fig. 1, and will not be illustrated here.
Optionally, in the embodiment of the present invention, the monitoring server may further perform voice communication with the monitoring person based on the phone number of the monitoring person, and remind the monitoring person to refer to the warning information sent by the monitoring server in the voice communication. Therefore, the monitoring personnel can be reminded to look up the warning information in time, and the safety event faced by the target object can be quickly processed.
It should be noted that the monitoring server may also count and analyze the warning information of each target object in addition to forwarding the warning information, so as to provide various data for governments, social supervision and service agencies for the concerned object, and the like. Other functions related to the monitoring server are not illustrated here.
In the scheme provided by the embodiment of the invention, a safety monitoring system composed of object side equipment, a monitoring server and monitoring side equipment is provided. If yes, sending warning information to the monitoring end server; secondly, the monitoring end server receives the warning information sent by the object end equipment, determines monitoring personnel information corresponding to the target object, and sends the warning information to the monitoring end equipment held by the monitoring personnel based on the monitoring personnel information; and finally, the monitoring terminal equipment receives the warning information and outputs the warning information. Therefore, the purpose of warning the current occurrence of the safety event of the object is achieved. Therefore, the scheme provided by the embodiment of the invention can effectively warn safety in time when the object faces a safety event.
The following provides a security detection method, which is applied to object side equipment in a security monitoring system, where the security monitoring system further includes monitoring side equipment. It should be noted that an execution subject of the security detection method provided by the embodiment of the present invention may be a security detection apparatus, and the apparatus may be operated in an electronic device. The electronic device may be an object-side device or the like.
Next, a security detection method provided in an embodiment of the present invention is described first.
As shown in fig. 3, a security detection method provided in an embodiment of the present invention may include the following steps:
s201, voice data is obtained.
S202, determining the emotion state recognition result of the voice data.
Optionally, in this embodiment of the present invention, the determining the emotion state recognition result of the voice data includes the following steps:
firstly, extracting target emotional characteristics of the voice data.
Wherein the target emotional characteristics are as follows: features relating to an emotional state of the speech data.
And secondly, determining the matching degree of the target emotional characteristics and each reference emotional characteristic.
Wherein the reference emotional characteristics are as follows: the target terminal equipment is used for pre-storing characteristics of sample voice data related to each preset emotional state, the preset emotional state is fear, 24696and crying or screaming, and the reference emotional characteristics are obtained based on the sample voice data of the target object.
Specifically, the step may include:
and inputting the target emotional characteristics into a pre-trained emotional model to obtain emotional confidence degrees corresponding to all preset emotional states, wherein the emotional model is obtained by training based on all reference emotional characteristics.
And thirdly, determining the emotion state recognition result of the voice data based on each matching degree.
Specifically, the step may include:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value;
and if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
Optionally, in this embodiment of the present invention, before S202, the method further includes:
and judging whether the voice data sender is a target object holding the object end equipment, if so, executing the step S202 of determining the emotion state recognition result of the voice data.
Optionally, in this embodiment of the present invention, the determining whether the sender of the voice data is a target object holding the target device includes the following steps:
a. and extracting the target voiceprint characteristics of the voice data.
b. And judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object.
Wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment.
Wherein, step b specifically includes:
b1. inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence coefficient, wherein the voiceprint model is obtained by training based on the reference voiceprint features;
b2. judging whether the voiceprint confidence is larger than a preset voiceprint threshold value or not;
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
S203, when the emotional state identification result contains at least one preset emotional state, sending warning information to the monitoring end equipment corresponding to the object end equipment, so that the monitoring end equipment receives the warning information and outputs the warning information.
Wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: and warning the target object of the current safety event.
The warning information includes: the voice data and/or the emotion state recognition result.
Optionally, in an embodiment of the present invention, the security monitoring system further includes: a monitoring server;
the sending of the warning information to the monitoring terminal device corresponding to the object terminal device includes:
and sending the warning information to the monitoring server so that the monitoring server receives the warning information, determines monitoring personnel information corresponding to the target object, and sends the warning information to monitoring end equipment held by the monitoring personnel based on the monitoring personnel information.
Details of the above steps refer to the relevant contents of the object-side device in the embodiment shown in fig. 1, and are not described herein again.
In the scheme provided by the embodiment of the invention, a safety detection method applied to the object side equipment is provided. The scheme provided by the embodiment of the invention can obtain the voice data of the target object and detect whether the emotional state of the target object is the preset emotional state when facing the security event or not based on the voice data. If yes, sending warning information. Therefore, the purpose of warning the current occurrence of the safety event of the object is achieved. Therefore, the scheme provided by the embodiment of the invention can effectively warn safety in time when the object faces a safety event.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a security detection apparatus, where the security detection apparatus is applied to an object side device in a security monitoring system, and the security monitoring system further includes a monitoring side device. As shown in fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain voice data;
a determining module 402, configured to determine an emotion state recognition result of the voice data;
a sending module 403, configured to send warning information to a monitoring end device corresponding to the object end device when the emotional state identification result includes at least one preset emotional state, so that the monitoring end device receives the warning information and outputs the warning information;
wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: and warning the target object of the current safety event.
Optionally, in an embodiment of the present invention, the warning information includes: the voice data and/or the emotion state recognition result.
Optionally, in an embodiment of the present invention, the apparatus further includes:
and the judging module is used for judging whether a sender of the voice data is a target object holding the object end equipment before determining the emotional state recognition result of the voice data, and if so, executing the step of determining the emotional state recognition result of the voice data.
Optionally, in an embodiment of the present invention, the determining module includes:
the first extraction submodule is used for extracting the target voiceprint characteristics of the voice data;
the judging submodule is used for judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object;
wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment.
Optionally, in the embodiment of the present invention, the determining sub-module is specifically configured to:
inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence coefficient, wherein the voiceprint model is obtained by training based on the reference voiceprint features;
judging whether the voiceprint confidence is larger than a preset voiceprint threshold value or not;
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
Optionally, in this embodiment of the present invention, the determining module 402 includes:
the second extraction submodule is used for extracting target emotional characteristics of the voice data, wherein the target emotional characteristics are as follows: features related to an emotional state of the speech data;
the first determining submodule is used for determining the matching degree of the target emotional features and each reference emotional feature, wherein the reference emotional features are as follows: the target end equipment pre-stores characteristics of sample voice data related to each preset emotional state, wherein the preset emotional state is fear, 24696cry or scream, and the reference emotional characteristics are obtained based on the sample voice data of the target object;
and the second determining submodule is used for determining the emotion state recognition result of the voice data based on each matching degree.
Optionally, in an embodiment of the present invention, the first determining sub-module is specifically configured to:
inputting the target emotional characteristics into a pre-trained emotional model to obtain emotional confidence coefficients corresponding to all preset emotional states, wherein the emotional model is obtained by training based on all reference emotional characteristics;
the second determining submodule is specifically configured to:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value;
and if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
Optionally, in an embodiment of the present invention, the security monitoring system further includes: a monitoring server;
the sending module 403 is specifically configured to:
and sending the warning information to the monitoring server so that the monitoring server receives the warning information, determines monitoring personnel information corresponding to the target object, and sends the warning information to monitoring end equipment held by the monitoring personnel based on the monitoring personnel information.
In the scheme provided by the embodiment of the invention, a safety detection device applied to the object side equipment is provided. The voice data of the target object can be obtained, and whether the emotional state of the target object is a preset emotional state when facing a security event or not is detected based on the voice data. If so, sending warning information to corresponding monitoring end equipment so that the monitoring end equipment receives and outputs the warning information, thereby achieving the purpose of warning the object of the current occurrence of the safety event. Therefore, the scheme provided by the embodiment of the invention can effectively warn safety in time when the object faces a safety event.
Corresponding to the above method embodiment, the embodiment of the present invention further provides an electronic device, as shown in fig. 5, which may include a processor 501 and a memory 502, wherein,
the memory 502 is used for storing computer programs;
the processor 501 is configured to implement the steps of the security detection method provided by the embodiment of the present invention when executing the program stored in the memory 502.
The Memory may include a RAM (Random Access Memory) or an NVM (Non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
Through above-mentioned electronic equipment, can realize: and acquiring voice data of the target object, and detecting whether the emotional state of the target object is a preset emotional state when facing a security event or not based on the voice data. If so, sending warning information to corresponding monitoring end equipment so that the monitoring end equipment receives and outputs the warning information, thereby achieving the purpose of warning the object of the current occurrence of the safety event. Therefore, the scheme provided by the embodiment of the invention can effectively warn safety in time when the object faces a safety event.
In addition, corresponding to the security detection method provided in the foregoing embodiment, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the security detection method provided in the embodiment of the present invention are implemented.
The computer-readable storage medium stores an application program that executes the security detection method provided by the embodiment of the present invention when the application program runs, so that the following can be implemented: and acquiring voice data of the target object, and detecting whether the emotional state of the target object is a preset emotional state when facing a security event or not based on the voice data. If so, sending warning information to corresponding monitoring end equipment so that the monitoring end equipment receives and outputs the warning information, thereby achieving the purpose of warning the object of the current occurrence of the safety event. Therefore, the scheme provided by the embodiment of the invention can effectively warn safety in time when the object faces a safety event.
For the embodiments of the electronic device and the computer-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (18)

1. A security monitoring system, comprising:
the object side equipment is used for acquiring voice data, determining an emotional state recognition result of the voice data, and sending warning information to monitoring side equipment corresponding to the object side equipment when the emotional state recognition result contains at least one preset emotional state; wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: alerting the target object of information of a current occurrence of a security event;
and the monitoring terminal equipment is used for receiving the warning information and outputting the warning information.
2. The system of claim 1, wherein the alert information comprises: the voice data and/or the emotion state recognition result.
3. The system of claim 1, wherein the object-side device is further configured to: prior to determining the emotion state recognition result of the voice data,
and judging whether the sender of the voice data is a target object holding the object end equipment, and if so, executing the step of determining the emotional state recognition result of the voice data.
4. The system according to claim 3, wherein the determining, by the target device, whether the sender of the voice data is a target object holding the target device comprises:
extracting target voiceprint characteristics of the voice data;
judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object;
wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment.
5. The system according to claim 4, wherein the determining, by the object-side device, whether the target voiceprint feature matches the reference voiceprint feature comprises:
inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence coefficient, wherein the voiceprint model is obtained by training based on the reference voiceprint features;
judging whether the voiceprint confidence is larger than a preset voiceprint threshold value or not;
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
6. The system according to any one of claims 1-5, wherein the determining, by the device at the object side, the emotion state recognition result of the speech data comprises:
extracting target emotional characteristics of the voice data, wherein the target emotional characteristics are as follows: features related to an emotional state of the speech data;
determining the matching degree of the target emotional features and each reference emotional feature, wherein the reference emotional features are as follows: the target end equipment pre-stores characteristics of sample voice data related to each preset emotional state, wherein the preset emotional state is fear, 24696cry or scream, and the reference emotional characteristics are obtained based on the sample voice data of the target object;
and determining the emotion state recognition result of the voice data based on the matching degrees.
7. The system of claim 6, wherein the determining, by the target-side device, the matching degree of the target emotional feature and each reference emotional feature comprises:
inputting the target emotional characteristics into a pre-trained emotional model to obtain emotional confidence coefficients corresponding to all preset emotional states, wherein the emotional model is obtained by training based on all reference emotional characteristics;
the determining the emotion state recognition result of the voice data based on the matching degrees comprises the following steps:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value;
and if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
8. The system of any one of claims 1-5, wherein the security monitoring system further comprises: a monitoring server;
the sending of the warning information to the monitoring terminal equipment corresponding to the object terminal equipment by the object terminal equipment comprises the following steps:
the object terminal equipment sends the warning information to the monitoring server;
and the monitoring server is used for receiving the warning information sent by the object terminal equipment, determining monitoring personnel information corresponding to the target object, and sending the warning information to the monitoring terminal equipment held by the monitoring personnel based on the monitoring personnel information.
9. A safety detection method is characterized in that the method is applied to object side equipment in a safety monitoring system, and the safety monitoring system also comprises monitoring side equipment; the method comprises the following steps:
obtaining voice data;
determining an emotional state recognition result of the voice data;
when the emotional state identification result contains at least one preset emotional state, sending warning information to monitoring end equipment corresponding to the object end equipment, so that the monitoring end equipment receives the warning information and outputs the warning information;
wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: and warning the target object of the current safety event.
10. The method of claim 9, wherein the alert information comprises: the voice data and/or the emotion state recognition result.
11. The method of claim 9, wherein prior to determining the emotion state recognition result for the speech data, the method further comprises:
and judging whether the sender of the voice data is a target object holding the object end equipment, and if so, executing the step of determining the emotional state recognition result of the voice data.
12. The method according to claim 11, wherein the determining whether the sender of the voice data is a target object holding the target device comprises:
extracting target voiceprint characteristics of the voice data;
judging whether the target voiceprint features are matched with the reference voiceprint features, if so, determining that the sender of the voice data is the target object, and if not, determining that the sender of the voice data is not the target object;
wherein the reference voiceprint feature is: and voice print characteristics of the sample voice data of the target object pre-stored in the object side equipment.
13. The method of claim 12, wherein said determining whether the target voiceprint feature matches a reference voiceprint feature comprises:
inputting the target voiceprint features into a pre-trained voiceprint model to obtain a voiceprint confidence coefficient, wherein the voiceprint model is obtained by training based on the reference voiceprint features;
judging whether the voiceprint confidence is larger than a preset voiceprint threshold value or not;
if yes, determining the target voiceprint feature to be matched with the reference voiceprint feature, and if not, determining the target voiceprint feature to be not matched with the reference voiceprint feature.
14. The method of any of claims 9-13, wherein determining the emotion state recognition result for the speech data comprises:
extracting target emotional characteristics of the voice data, wherein the target emotional characteristics are as follows: features related to an emotional state of the speech data;
determining the matching degree of the target emotional features and each reference emotional feature, wherein the reference emotional features are as follows: the target end equipment pre-stores characteristics of sample voice data related to each preset emotional state, wherein the preset emotional state is fear, 24696cry or scream, and the reference emotional characteristics are obtained based on the sample voice data of the target object;
and determining the emotion state recognition result of the voice data based on the matching degrees.
15. The method of claim 14, wherein the determining the degree of matching between the target emotional feature and each of the reference emotional features comprises:
inputting the target emotional characteristics into a pre-trained emotional model to obtain emotional confidence coefficients corresponding to all preset emotional states, wherein the emotional model is obtained by training based on all reference emotional characteristics;
the determining the emotion state recognition result of the voice data based on the matching degrees comprises the following steps:
each sentiment confidence is compared to a corresponding sentiment threshold,
if at least one emotion confidence is greater than the corresponding emotion threshold, determining the emotion state recognition result of the voice data comprises: the preset emotional state corresponding to the emotion confidence coefficient which is greater than the corresponding emotion threshold value;
and if the emotion confidence degrees are not greater than the corresponding emotion threshold values, determining that the emotion state recognition result of the voice data comprises a normal state.
16. The method of any of claims 9-13, wherein the security monitoring system further comprises: a monitoring server;
the sending of the warning information to the monitoring terminal device corresponding to the object terminal device includes:
and sending the warning information to the monitoring server so that the monitoring server receives the warning information, determines monitoring personnel information corresponding to the target object, and sends the warning information to monitoring end equipment held by the monitoring personnel based on the monitoring personnel information.
17. A safety detection device is characterized in that the safety detection device is applied to object side equipment in a safety monitoring system, and the safety monitoring system further comprises monitoring side equipment; the device comprises:
an obtaining module for obtaining voice data;
the determining module is used for determining the emotion state recognition result of the voice data;
the sending module is used for sending warning information to the monitoring end equipment corresponding to the object end equipment when the emotional state identification result contains at least one preset emotional state, so that the monitoring end equipment receives the warning information and outputs the warning information;
wherein, the preset emotional state is as follows: the target object holding the object side equipment is in an emotional state when facing a security event, and the warning information is as follows: and warning the target object of the current safety event.
18. An electronic device comprising a processor and a memory, wherein,
the memory is used for storing a computer program;
the processor, when executing the program stored in the memory, implementing the method steps of any of claims 9-16.
CN201810826730.5A 2018-07-25 2018-07-25 Safety monitoring system, safety detection method, safety detection device and electronic equipment Pending CN110782622A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810826730.5A CN110782622A (en) 2018-07-25 2018-07-25 Safety monitoring system, safety detection method, safety detection device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810826730.5A CN110782622A (en) 2018-07-25 2018-07-25 Safety monitoring system, safety detection method, safety detection device and electronic equipment

Publications (1)

Publication Number Publication Date
CN110782622A true CN110782622A (en) 2020-02-11

Family

ID=69377244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810826730.5A Pending CN110782622A (en) 2018-07-25 2018-07-25 Safety monitoring system, safety detection method, safety detection device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110782622A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111653291A (en) * 2020-06-01 2020-09-11 莫毓昌 Intelligent health monitoring method for power equipment based on voiceprint
CN113241060A (en) * 2021-07-09 2021-08-10 明品云(北京)数据科技有限公司 Security early warning method and system
WO2023239562A1 (en) * 2022-06-06 2023-12-14 Cerence Operating Company Emotion-aware voice assistant

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN104580685A (en) * 2013-10-23 2015-04-29 中兴通讯股份有限公司 Terminal state processing method and device
CN105096940A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method and device for voice recognition
CN105286240A (en) * 2015-10-21 2016-02-03 金纯� Intelligent walking stick device and monitoring system thereof
CN108305615A (en) * 2017-10-23 2018-07-20 腾讯科技(深圳)有限公司 A kind of object identifying method and its equipment, storage medium, terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN104580685A (en) * 2013-10-23 2015-04-29 中兴通讯股份有限公司 Terminal state processing method and device
CN105096940A (en) * 2015-06-30 2015-11-25 百度在线网络技术(北京)有限公司 Method and device for voice recognition
CN105286240A (en) * 2015-10-21 2016-02-03 金纯� Intelligent walking stick device and monitoring system thereof
CN108305615A (en) * 2017-10-23 2018-07-20 腾讯科技(深圳)有限公司 A kind of object identifying method and its equipment, storage medium, terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张旭红: "基于语音情感的危险状况辨识研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111653291A (en) * 2020-06-01 2020-09-11 莫毓昌 Intelligent health monitoring method for power equipment based on voiceprint
CN111653291B (en) * 2020-06-01 2023-02-14 莫毓昌 Intelligent health monitoring method for power equipment based on voiceprint
CN113241060A (en) * 2021-07-09 2021-08-10 明品云(北京)数据科技有限公司 Security early warning method and system
WO2023239562A1 (en) * 2022-06-06 2023-12-14 Cerence Operating Company Emotion-aware voice assistant

Similar Documents

Publication Publication Date Title
US10810510B2 (en) Conversation and context aware fraud and abuse prevention agent
US11941968B2 (en) Systems and methods for identifying an acoustic source based on observed sound
EP3640935B1 (en) Notification information output method, server and monitoring system
CN110782622A (en) Safety monitoring system, safety detection method, safety detection device and electronic equipment
WO2020253128A1 (en) Voice recognition-based communication service method, apparatus, computer device, and storage medium
US20180350354A1 (en) Methods and system for analyzing conversational statements and providing feedback in real-time
US20200035261A1 (en) Sound detection
CN108109445B (en) Teaching course condition monitoring method
CN110852147B (en) Security alarm method, security alarm device, server and computer readable storage medium
CN108109446B (en) Teaching class condition monitoring system
CN109993044B (en) Telecommunications fraud identification system, method, apparatus, electronic device, and storage medium
JP2019095552A (en) Voice analysis system, voice analysis device, and voice analysis program
Saifan et al. A machine learning based deaf assistance digital system
CN110795971B (en) User behavior identification method, device, equipment and computer storage medium
CN110800053A (en) Method and apparatus for obtaining event indications based on audio data
CN111739558B (en) Monitoring system, method, device, server and storage medium
CN112599130A (en) Intelligent conference system based on intelligent screen
CN111179969A (en) Alarm method, device and system based on audio information and storage medium
JP2012058944A (en) Abnormality detection device
CN115665438A (en) Audio and video processing method, device, equipment and storage medium for court remote court trial
CN108694388B (en) Campus monitoring method and device based on intelligent camera
KR102573186B1 (en) Apparatus, method, and recording medium for providing animal sound analysis information
CN115424634A (en) Audio and video stream data processing method and device, electronic equipment and storage medium
CN109634554B (en) Method and device for outputting information
CN113516997A (en) Voice event recognition device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200211