CN115171335A - Image and voice fused indoor safety protection method and device for elderly people living alone - Google Patents

Image and voice fused indoor safety protection method and device for elderly people living alone Download PDF

Info

Publication number
CN115171335A
CN115171335A CN202210687087.9A CN202210687087A CN115171335A CN 115171335 A CN115171335 A CN 115171335A CN 202210687087 A CN202210687087 A CN 202210687087A CN 115171335 A CN115171335 A CN 115171335A
Authority
CN
China
Prior art keywords
people
old people
recognition
old
living alone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210687087.9A
Other languages
Chinese (zh)
Inventor
李晓飞
钱庆庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210687087.9A priority Critical patent/CN115171335A/en
Publication of CN115171335A publication Critical patent/CN115171335A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • G08B21/04Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
    • G08B21/0407Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons based on behaviour analysis
    • G08B21/043Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons based on behaviour analysis detecting an emergency event, e.g. a fall
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Social Psychology (AREA)
  • Child & Adolescent Psychology (AREA)
  • Psychology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Signal Processing (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Gerontology & Geriatric Medicine (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Alarm Systems (AREA)

Abstract

The invention discloses an indoor safety protection method for image and voice fusion for solitary old people, which comprises the following steps: acquiring monitoring video data in a home environment, wherein the monitoring video data comprises image data and voice data; carrying out face detection and face recognition on the image data, and judging to obtain the number of people in the home environment and identity authentication; in response to the fact that the old people living alone are judged, carrying out falling action recognition on the old people living alone to obtain a falling recognition result of the old people living alone; cloud voice recognition processing is carried out on the voice data to obtain emotion analysis results of the old; comprehensively analyzing the state of the old people by combining the number of people in the home environment, the identity authentication, the old people falling identification result and the emotion analysis result to obtain the old people state analysis result; and sending a safety protection notice to a corresponding terminal or platform according to the state analysis result of the old. The invention can reduce the condition of false fall recognition of the old people indoors, thereby reducing the influence on the relatives of the old people and the waste of social medical resources.

Description

Image and voice fused indoor safety protection method and device for elderly people living alone
Technical Field
The invention relates to an indoor safety protection method and device for solitary old people by fusing images and voice, and belongs to the technical field of computer vision and voice processing.
Background
According to the seventh census result in 2020, the population accounts for 18.7% in China over 60 years old, and compared with 2010, the population is increased by more than 5%, and the aging process is further deepened. Data show that falls are the fourth cause of injury death in our country, and are the first cause in older adults over 65. In addition to causing death, falls can cause serious injury and even disability. For the elderly living alone, whether the elderly fall down can be found in time is directly related to life safety. At present, due to uneven distribution of medical resources, the existing limited medical resources are not enough to meet the daily nursing requirement of the old in China. The traditional nursing mode taking "hospital" as the core gradually changes to the intelligent mode of "hospital + family". In recent years, with the acceleration of informatization process, intelligent monitoring systems are continuously developed and improved, and cameras are gradually installed in families to ensure the property safety and life safety of the families. However, in an application scenario, once a detection error occurs, normal work and life of the family members of the elderly are affected, and social resources such as hospitals are wasted.
Disclosure of Invention
With the development of voice technology, the application of voice technology combined with image for double verification can exert more and more important values.
Current research in fall identification is mainly focused on computer vision based methods. The vision-based method mainly acquires an image sequence through a camera and analyzes the image sequence by utilizing an image processing technology. Because the change of body posture is different from daily behaviors when a person falls down, the traditional method mostly adopts human body contours or shapes and the like as identification features, and then adopts a Support Vector Machine (SVM) to identify the falling behavior. The deep learning method can actively learn the space-time characteristics in the image sequence, and avoids the complex characteristic extraction and data reconstruction process. The image sequence is directly used as the input of the deep convolutional neural network, so that the application of the deep convolutional neural network is greatly expanded. The vision-based method has high recognition accuracy, but the performance of the vision-based method is greatly influenced by illumination. As deep learning has attracted attention, more and more research focuses have turned to deep learning based speech processing. The deep learning model generally refers to a deeper structural model, which has more layers of nonlinear transformation than a traditional shallow model, is more powerful in expression and modeling capability, and is more advantageous in processing complex signals than a traditional Gaussian mixture model.
The invention aims to overcome the defects in the prior art, and provides the image and voice fused old people falling identification method, which can make up the condition of false alarm generated by the existing falling identification system and effectively reduce the waste of social resources caused by the false alarm.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
in a first aspect, a method for indoor safety protection of solitary old people is provided, which comprises the following steps:
acquiring monitoring video data in a home environment, wherein the monitoring video data comprises image data and voice data;
carrying out face detection and face recognition on the image data, and judging to obtain the number of people in the home environment and identity authentication;
responding to the judgment that the old people living alone fall down, and identifying the falling down action of the old people living alone to obtain a falling down identification result of the old people living alone;
carrying out cloud voice recognition processing on voice data to obtain emotion analysis results of the old;
comprehensively analyzing the state of the old people by combining the number of people in the home environment, the identity authentication, the old people falling identification result and the emotion analysis result to obtain the old people state analysis result;
and sending a safety protection notice to a corresponding terminal or platform according to the state analysis result of the old people.
In some embodiments, cloud-based speech recognition processing is performed on speech data, including:
carrying out voice recognition on voice data by utilizing voice processing of a cloud end to obtain text information;
vectorizing words in the text information, inputting position information of the words at the same time, and combining to obtain a final word vector;
inputting the final word vector into a Transformer network, and enriching the association among words by using a multi-head self-attention mechanism, so that the network can understand the semantic and grammatical structure information of the sentence; the output layer firstly uses convolution operation to extract the features again, and finally performs feature fusion through full-connection layer operation to obtain fusion features;
and identifying the fusion characteristics by using the pre-trained emotional state identification network model of the old people to obtain an emotion analysis result.
In some embodiments, face detection and face recognition are performed on image data, including:
inputting an image sequence in a monitoring video into a YOLOv3 face detection network, and calibrating face region coordinates;
determining the number of people in the home environment according to the calibration result;
when the number of people is 1, aligning the calibrated face areas, and extracting features by using the trained ResNet;
extracting the characteristic vectors of a local face library by using ResNet, and calculating the cosine similarity of the characteristic vectors of a detection target and a local image;
and determining whether the recognized face is the monitored solitary old man or not based on the calculation result, and performing falling behavior recognition when the recognized face is determined to be the monitored solitary old man.
In some embodiments, the comprehensive analysis of the state of the elderly is performed by combining the number of people in the home environment, the identity authentication, the fall recognition result of the elderly and the emotion analysis result, and comprises the following steps:
and performing weighted fusion on the voice judgment, the fall detection judgment and the identity identification, wherein the weighted fusion is represented by the following formula:
Figure BDA0003700053920000041
in the formula (1), e (k) represents the confidence level of whether the weighted fused old people need help or not, e i (k) Confidence coefficient, omega, at moment k including emotion analysis, identity recognition and fall detection i Weights ω representing weights, including emotion analysis, fall recognition and identity authentication 1 ,ω 2 ,ω 3 The ratio of the weights of the three parameters from large to small is omega 1 ,ω 2 ,ω 3 And ω is 123 =1。
When more than one person is detected indoors and the people are identified as the relatives of the old people, the state analysis of the old people is not carried out.
When more than one person is detected in the room and there are no authenticated relatives, then a weighted fusion identification is performed, at which time e i (k) Confidence, omega, at time k including sentiment analysis and identity recognition i Representing weights, including weight ω of emotion analysis 1 And weight ω for fall identification 2 At this time, ω 1 =0.3,ω 2 =0.7,ω 3 =0, if the result shows that the old people meet strangers with malicious intentions, the situation is sent to the relative terminal.
When only the old people are detected indoors, the old people are subjected to weighted fusion recognition, e i (k) Confidence coefficient, omega, at moment k including emotion analysis, identity recognition and fall detection i Weights ω representing weights, including situational analysis, fall identification, and identity authentication 1 ,ω 2 ,ω 3 At this time, ω 1 =0.3,ω 2 =0.4,ω 3 =0.3, if the elderly people living alone fall down, the elderly people living alone will fall downThe indoor situation is informed to the elderly relative terminal and the medical service organization platform.
When the number of people in the home environment is 1, obtaining a falling identification result of the elderly people living alone, and if the elderly people living alone are judged to fall down, sending a message to the family terminal and informing the medical institution platform; and when the number of people in the home environment is more than 1, identifying the people in the video, and if the people are not the logged-in safety people and monitor the abnormal emotion of the old, sending a message to the relative terminal.
In a second aspect, the invention provides an indoor safety protection device for solitary old people, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to the first aspect.
In a third aspect, the present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
Compared with the prior art, the method and the device for protecting the safety of the old people in the fusion image and voice floor provided by the embodiment of the invention have the following beneficial effects:
the method comprises the steps of acquiring a monitoring video in a home environment, and processing voice and images; performing voice processing based on voice information of the monitoring video, and monitoring the emotional state of the old people indoors; carrying out face detection and falling action recognition based on image information of the monitoring video to obtain the falling recognition results of the number of people indoors and the elderly living alone; the invention can determine the number and the identity of the objects appearing in the monitoring video, can further ensure the safety of the old people indoors, and simultaneously reduces the false alarm rate and increases the reliability of the result.
The invention fuses the processing results of voice and images, carries out different processing on different conditions, can monitor pertinently and further ensures the safety of the old.
Drawings
Fig. 1 is a flowchart of a method for identifying a fall of a solitary old person by fusing identity features according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
In the description of the present invention, the meaning of a plurality is one or more, the meaning of a plurality is two or more, and the above, below, exceeding, etc. are understood as excluding the present numbers, and the above, below, within, etc. are understood as including the present numbers. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, reference to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Example 1
An indoor safety protection method for solitary old people comprises the following steps:
acquiring monitoring video data in a home environment, wherein the monitoring video data comprises image data and voice data;
carrying out face detection and face recognition on the image data, and judging to obtain the number of people in the home environment and identity authentication;
in response to the fact that the old people living alone are judged, carrying out falling action recognition on the old people living alone to obtain a falling recognition result of the old people living alone;
cloud voice recognition processing is carried out on the voice data to obtain emotion analysis results of the old;
comprehensively analyzing the state of the old people by combining the number of people in the home environment, the identity authentication, the old people falling identification result and the emotion analysis result to obtain the old people state analysis result;
and sending a safety protection notice to a corresponding terminal or platform according to the state analysis result of the old man.
In some embodiments, as shown in fig. 1, the present invention provides a method for protecting indoor security of elderly people living alone by fusing images and voices, including:
step 1: the method comprises the steps of obtaining monitoring videos of a plurality of spaces in a home environment, wherein the monitoring video data comprise image data and voice data;
step 2: carrying out face detection and face recognition on the image data, and judging to obtain the number of people in the home environment and identity authentication; thereby judging whether the old people living alone are in the scene of home;
step 2-1: inputting an image sequence in a monitoring video into a YOLOv3 face detection network, and calibrating face region coordinates;
determining the number of people in the home environment according to the calibration result;
when the number of people is 1, aligning the calibrated face areas, and extracting the features of the face areas by using the trained ResNet;
similarly, extracting the characteristic vectors of a local face library by using ResNet, and calculating the cosine similarity of the characteristic vectors of the detection target and the local image;
and determining whether the recognized face is the monitored solitary old man or not based on the calculation result, and performing falling behavior recognition when the recognized face is determined to be the monitored solitary old man.
Step 2-2: and when the number of people is 1, aligning the calibrated face areas, and extracting the features of the face areas by using the trained ResNet.
Step 2-3: and extracting the characteristic vector of the local face library by using ResNet, and calculating the cosine similarity of the characteristic vectors of the detection target and the local image.
Step 2-4: inputting the coordinates of the face area and the coordinates of the key points into a tracker to generate a detection class; predicting a tracking frame Tracks of the next position of the target by using Kalman filtering according to the mean value, the variance and the id generated by the single Detection; matching the predicted Tracks with the detection in the current frame by using a Hungarian algorithm, and updating prediction data of Kalman filtering according to a matching result; if the mahalanobis distance between the tracking frames Tracks and Detection is within the threshold, the two IDs are associated, and if the newly obtained target object Detection has no matching tracking frame Tracks, a new Tracks is generated.
And step 3: and carrying out falling action identification based on the images in the monitoring video to obtain a falling detection result of the elderly people living alone.
Inputting every 30 frames of image sequences into a trained slowfast falling action recognition network, extracting the space-time characteristics of the image sequences, and classifying through a full connection layer to obtain a falling recognition result aiming at a specific monitored object, wherein the falling is 1, the non-falling is 0, and the falling confidence coefficient c is obtained A (ii) a The trained Fall action recognition network is obtained by training public Fall data sets Le2i-Fall and FDD.
And 4, step 4: cloud voice recognition processing is carried out on the voice data to obtain emotion analysis results of the old;
firstly, carrying out voice recognition on a voice signal acquired by an AI camera by utilizing the voice processing of a cloud end to obtain text information; then, vectorizing the words, inputting position information of the words and combining the position information to obtain a final word vector; inputting the final word vector into a Transformer network, and further enriching the association among the words by using a multi-head self-attention mechanism, so that the network can understand the semantic and grammatical structure information of the sentence; the output layer firstly uses convolution operation to extract the features again, and finally performs feature fusion through full-connection layer operation; carrying out different labels on a large number of different emotion sentences, and training the different emotion sentences as training samples to obtain a network model capable of identifying the emotional state of the old; signals such as pain, fear and help seeking are identified by utilizing the network model.
And 5: comprehensively analyzing the state of the old people by combining the number of people in the home environment, the identity authentication, the old people falling identification result and the emotion analysis result to obtain the old people state analysis result;
and sending a safety protection notice to a corresponding terminal or platform according to the state analysis result of the old people.
When the number of people in the home environment is 1, obtaining a falling recognition result of the solitary old man, and if the solitary old man is judged to fall, sending a message to the parent terminal and informing the medical institution platform; and when the number of people in the home environment is more than 1, identifying the people in the video, and if the people are not the logged-in safety people and monitor the abnormal emotion of the old, sending a message to the relative terminal.
When the old man is judged to be in the abnormal states such as falling down, the falling down condition is sent to the relative mobile phone terminal and the medical institution service platform.
The method comprises the steps of acquiring a monitoring video in a home environment, monitoring the emotional state of the old people indoors based on voice processing, carrying out face detection and identification on the people in the monitoring video when the emotional state of the old people is monitored to be signals such as fear, fear and help seeking, and if the old people are identified to be solitary old people and fall down, sending the information of the relevant old people to a family terminal and a hospital mechanism platform; if the old people are identified as many people and strangers indoors and are abnormal emotions, the old people are judged to be in a dangerous condition, and the indoor condition is sent to the relative terminal.
The embodiment identifies the falling action based on the images in the monitoring video to obtain the falling identification result of the old people at home; the verification is carried out by fusing the images and the voice, and the risk of manpower and social medical resource waste caused by false alarm is overcome.
The embodiment provides and can in time send the condition of tumbleing to relatives' cell-phone end and medical institution service platform, has the advantage such as the response is rapid, reports to the police in time.
The embodiment provides an indoor old man safety protection system who fuses image and pronunciation characteristic, includes: AI camera module, data processing analysis module and terminal communication module.
The AI camera module is used for face recognition, old man's action behavior analysis and unusual behavior discrimination, and the AI camera is through face recognition and the action of falling down action recognition network judgement old man's of house identity and action unusual to through detecting and tracking the specific area of old man's face, and transmit the monitoring result to data fusion analysis module.
And the data fusion analysis module is used for analyzing the obtained voice and image detection results and judging the current state of the old man so as to decide which measure to take.
And the terminal communication module is used for sending the recognized falling condition of the old people at home to the family mobile phone terminal and the medical institution service platform.
Example 2
In a second aspect, the present embodiment provides an indoor safety protection device for elderly people living alone, including a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of embodiment 1.
Example 3
In a third aspect, the present embodiment provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of embodiment 1.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (9)

1. An indoor safety protection method for solitary old people is characterized by comprising the following steps:
acquiring monitoring video data in a home environment, wherein the monitoring video data comprises image data and voice data;
carrying out face detection and face recognition on the image data, and judging to obtain the number of people in the home environment and identity authentication;
in response to the fact that the old people living alone are judged, carrying out falling action recognition on the old people living alone to obtain a falling recognition result of the old people living alone;
cloud voice recognition processing is carried out on the voice data to obtain emotion analysis results of the old;
comprehensively analyzing the state of the old people by combining the number of people in the home environment, the identity authentication, the old people falling identification result and the emotion analysis result to obtain the old people state analysis result;
and sending a safety protection notice to a corresponding terminal or platform according to the state analysis result of the old man.
2. The elderly living alone indoor security protection method of claim 1, wherein performing cloud-based speech recognition processing on the speech data comprises:
carrying out voice recognition on voice data by utilizing voice processing of a cloud end to obtain text information;
vectorizing words in the text information, inputting position information of the words at the same time, and combining to obtain a final word vector;
inputting the final word vector into a Transformer network, and enriching the association among words by using a multi-head self-attention mechanism, so that the network can understand the semantic and grammatical structure information of the sentence; the output layer firstly uses convolution operation to extract the features again, and finally performs feature fusion through full-connection layer operation to obtain fusion features;
and identifying the fusion characteristics by using the pre-trained emotional state identification network model of the old people to obtain an emotion analysis result.
3. The elderly people living alone indoor security protection method of claim 1, wherein performing face detection and face recognition on image data comprises:
inputting an image sequence in a monitoring video into a YOLOv3 face detection network, and calibrating face region coordinates;
determining the number of people in the home environment according to the calibration result;
when the number of people is 1, aligning the calibrated face areas, and extracting features by using the trained ResNet;
extracting the characteristic vectors of a local face library by using ResNet, and calculating the cosine similarity of the characteristic vectors of a detection target and a local image;
and determining whether the recognized face is the monitored solitary old man or not based on the calculation result, and performing falling behavior recognition when the recognized face is determined to be the monitored solitary old man.
4. The indoor security protection method for the elderly living alone according to claim 1, wherein the comprehensive analysis of the state of the elderly is performed by combining the number of people in the living environment, the identity authentication, the fall recognition result of the elderly and the emotion analysis result, and the method comprises:
and performing weighted fusion on the voice judgment, the fall detection judgment and the identity identification, wherein the weighted fusion is represented by the following formula:
Figure FDA0003700053910000021
in the formula (1), e (k) represents the confidence level of whether the weighted fused old people need help or not, e i (k) Confidence coefficient, omega, of emotion analysis, identity recognition and fall detection at the moment k i Weights ω representing weights, including situational analysis, fall identification, and identity authentication 1 ,ω 2 ,ω 3 The ratio of the weights of the three parameters from large to small is omega 1 ,ω 2 ,ω 3 And ω is 123 =1。
5. The elderly people living alone indoor safety protection method according to claim 4,
when more than one person is detected indoors and the people are identified as the relatives of the old people, the state analysis of the old people is not carried out.
6. The elderly people living alone indoor security protection method according to claim 4,
when more than one person is detected in a room and there are no authenticated relatives, then a weighted fusion identification is performed, at which pointe i (k) Confidence, omega, at time k including sentiment analysis and identity recognition i Weight ω representing weight, including weight of emotion analysis 1 And weight ω for fall identification 2 At this time, ω 1 =0.3,ω 2 =0.7,ω 3 =0, if the result shows that the old people meet strangers with malicious intentions, the situation is sent to the relative terminal.
7. The elderly people living alone indoor safety protection method according to claim 4,
when only the old people are detected indoors, the old people are subjected to weighted fusion recognition, e i (k) Confidence coefficient, omega, at moment k including emotion analysis, identity recognition and fall detection i Weights ω representing weights, including emotion analysis, fall recognition and identity authentication 1 ,ω 2 ,ω 3 At this time, ω 1 =0.3,ω 2 =0.4,ω 3 And =0.3, if the elderly people living alone are determined to fall down, the indoor situation is notified to the elderly people's family terminal and the medical service institution platform.
8. An indoor safety protection device for solitary old people is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 7.
9. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, performing the steps of the method of any one of claims 1 to 7.
CN202210687087.9A 2022-06-17 2022-06-17 Image and voice fused indoor safety protection method and device for elderly people living alone Pending CN115171335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210687087.9A CN115171335A (en) 2022-06-17 2022-06-17 Image and voice fused indoor safety protection method and device for elderly people living alone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210687087.9A CN115171335A (en) 2022-06-17 2022-06-17 Image and voice fused indoor safety protection method and device for elderly people living alone

Publications (1)

Publication Number Publication Date
CN115171335A true CN115171335A (en) 2022-10-11

Family

ID=83486292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210687087.9A Pending CN115171335A (en) 2022-06-17 2022-06-17 Image and voice fused indoor safety protection method and device for elderly people living alone

Country Status (1)

Country Link
CN (1) CN115171335A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631063A (en) * 2023-05-31 2023-08-22 武汉星巡智能科技有限公司 Intelligent nursing method, device and equipment for old people based on drug behavior identification
CN117523666A (en) * 2023-11-13 2024-02-06 深圳市金大智能创新科技有限公司 Method for monitoring active feedback based on face and limb recognition state of virtual person
CN118333260A (en) * 2023-10-20 2024-07-12 广州极数科技有限公司 Community online real-time intelligent analysis method, device and equipment based on digital intelligence
CN118430070A (en) * 2024-05-27 2024-08-02 深圳市国关智能技术有限公司 Human behavior recognition and data acquisition system based on artificial intelligence

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273864A (en) * 2017-06-22 2017-10-20 星际(重庆)智能装备技术研究院有限公司 A kind of method for detecting human face based on deep learning
CN109684987A (en) * 2018-12-19 2019-04-26 南京华科和鼎信息科技有限公司 A kind of authentication system and method based on certificate
CN112801000A (en) * 2021-02-05 2021-05-14 南京邮电大学 Household old man falling detection method and system based on multi-feature fusion
CN112949369A (en) * 2020-11-17 2021-06-11 杭州电子科技大学 Mass face gallery retrieval method based on man-machine cooperation
CN112951240A (en) * 2021-05-14 2021-06-11 北京世纪好未来教育科技有限公司 Model training method, model training device, voice recognition method, voice recognition device, electronic equipment and storage medium
CN113822192A (en) * 2021-09-18 2021-12-21 山东大学 Method, device and medium for identifying emotion of escort personnel based on Transformer multi-modal feature fusion
CN114469076A (en) * 2022-01-24 2022-05-13 南京邮电大学 Identity feature fused old solitary people falling identification method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273864A (en) * 2017-06-22 2017-10-20 星际(重庆)智能装备技术研究院有限公司 A kind of method for detecting human face based on deep learning
CN109684987A (en) * 2018-12-19 2019-04-26 南京华科和鼎信息科技有限公司 A kind of authentication system and method based on certificate
CN112949369A (en) * 2020-11-17 2021-06-11 杭州电子科技大学 Mass face gallery retrieval method based on man-machine cooperation
CN112801000A (en) * 2021-02-05 2021-05-14 南京邮电大学 Household old man falling detection method and system based on multi-feature fusion
CN112951240A (en) * 2021-05-14 2021-06-11 北京世纪好未来教育科技有限公司 Model training method, model training device, voice recognition method, voice recognition device, electronic equipment and storage medium
CN113822192A (en) * 2021-09-18 2021-12-21 山东大学 Method, device and medium for identifying emotion of escort personnel based on Transformer multi-modal feature fusion
CN114469076A (en) * 2022-01-24 2022-05-13 南京邮电大学 Identity feature fused old solitary people falling identification method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631063A (en) * 2023-05-31 2023-08-22 武汉星巡智能科技有限公司 Intelligent nursing method, device and equipment for old people based on drug behavior identification
CN116631063B (en) * 2023-05-31 2024-05-07 武汉星巡智能科技有限公司 Intelligent nursing method, device and equipment for old people based on drug behavior identification
CN118333260A (en) * 2023-10-20 2024-07-12 广州极数科技有限公司 Community online real-time intelligent analysis method, device and equipment based on digital intelligence
CN117523666A (en) * 2023-11-13 2024-02-06 深圳市金大智能创新科技有限公司 Method for monitoring active feedback based on face and limb recognition state of virtual person
CN118430070A (en) * 2024-05-27 2024-08-02 深圳市国关智能技术有限公司 Human behavior recognition and data acquisition system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN115171335A (en) Image and voice fused indoor safety protection method and device for elderly people living alone
CN112364696B (en) Method and system for improving family safety by utilizing family monitoring video
US12047355B2 (en) Machine learning techniques for mitigating aggregate exposure of identifying information
CN111241883B (en) Method and device for preventing cheating of remote tested personnel
CN104966053A (en) Face recognition method and recognition system
US11921831B2 (en) Enrollment system with continuous learning and confirmation
Hao et al. An end-to-end human abnormal behavior recognition framework for crowds with mentally disordered individuals
Weinshall et al. Beyond novelty detection: Incongruent events, when general and specific classifiers disagree
Huang et al. Detecting the instant of emotion change from speech using a martingale framework
CN117952808A (en) Campus anti-spoofing method and system based on video and voice recognition
WO2023284185A1 (en) Updating method for similarity threshold in face recognition and electronic device
JP2019053381A (en) Image processing device, information processing device, method, and program
Deshan et al. Smart snake identification system using video processing
JP7371595B2 (en) Apparatus, system, method and program
CN116912744B (en) Intelligent monitoring system and method based on Internet of things
CN117197755A (en) Community personnel identity monitoring and identifying method and device
CN109522844B (en) Social affinity determination method and system
CN109509329B (en) Drowning alarm method based on wearable device and wearable device
KR102648004B1 (en) Apparatus and Method for Detecting Violence, Smart Violence Monitoring System having the same
CN114694344B (en) Campus violence monitoring method and device and electronic equipment
WO2019187107A1 (en) Information processing device, control method, and program
Guo et al. Design of a smart art classroom system based on Internet of Things
CN109815828A (en) Realize the system and method for initiative alarming or help-seeking behavior detection control
JP6739115B1 (en) Risk judgment program and system
US20240135713A1 (en) Monitoring device, monitoring system, monitoring method, and non-transitory computer-readable medium storing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination