WO2021047190A1 - 基于残差网络的报警方法、装置、计算机设备和存储介质 - Google Patents
基于残差网络的报警方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2021047190A1 WO2021047190A1 PCT/CN2020/088046 CN2020088046W WO2021047190A1 WO 2021047190 A1 WO2021047190 A1 WO 2021047190A1 CN 2020088046 W CN2020088046 W CN 2020088046W WO 2021047190 A1 WO2021047190 A1 WO 2021047190A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- preset
- image
- micro
- voiceprint
- recognition result
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to an alarm method, device, computer equipment and storage medium based on a residual network.
- the inventor realized that traditional technology can only determine whether there is a phenomenon of being hijacked by identifying some characteristics of the transfer operator.
- MLM personnel will generally be monitored and at the same time asked not to make warning expressions, so they cannot make obvious alarm actions.
- the traditional technology only uses the method of identifying some characteristics of the transfer operator, and the accuracy of identifying whether there is a phenomenon of being hijacked is insufficient.
- the main purpose of this application is to provide an alarm method, device, computer equipment and storage medium based on a residual network, aiming to improve the accuracy of the alarm.
- this application proposes an alarm method based on a residual network, which includes the following steps:
- the suspect objects in the suspect object database are the same, and the pedestrian recognition result is a human body feature;
- a second image collected by the second camera is acquired, where the second image includes at least a first object and a second object, and the second image The distance between an object and the second object is not greater than a preset distance, and the viewing range of the second camera is within the viewing range of the first camera;
- the sound information of the first object is collected through a preset microphone, and the sound information is input into the preset voiceprint recognition model, thereby Obtaining a voiceprint recognition result, and determining whether the voiceprint recognition result is a negative voiceprint, wherein the voiceprint recognition result includes a negative voiceprint and a non-negative voiceprint;
- the voiceprint recognition result is not a negative voiceprint
- the interference degree value output by the model, and determine whether the interference degree value is in a preset interference value interval, wherein the interference value is used to measure the degree of interference of the second object with the first object;
- This application provides an alarm device based on associated objects, including:
- the pedestrian recognition result judgment unit is used to obtain the first image collected by the first camera, and input the first image into a preset pedestrian re-recognition model based on the residual network, so as to obtain the pedestrian recognition result, and judge the Whether the pedestrian recognition result is the same as the suspect object in the preset suspect object library, wherein the pedestrian recognition result is a human body feature;
- the second image acquisition unit is configured to acquire a second image collected by the second camera if the pedestrian recognition result is not the same as the suspect object in the preset suspect object library, wherein the second image includes at least the first object And a second object, the distance between the first object and the second object is not greater than a preset distance, and the viewing range of the second camera is within the viewing range of the first camera;
- the micro-expression recognition unit is used to extract the image information of the first object from the second image and input it into a preset micro-expression recognition model to obtain a micro-expression recognition result and determine the micro-expression recognition Whether the result belongs to a preset malicious micro-expression list, wherein the micro-expression recognition result is a micro-expression category;
- the voiceprint recognition unit is configured to, if the micro-expression recognition result does not belong to the preset malicious micro-expression list, collect the sound information of the first object through a preset microphone, and input the sound information into a preset In the voiceprint recognition model, obtaining a voiceprint recognition result, and determining whether the voiceprint recognition result is a negative voiceprint, wherein the voiceprint recognition result includes a negative voiceprint and a non-negative voiceprint;
- the interference degree calculation unit is configured to, if the voiceprint recognition result is not a negative voiceprint, extract the image information of the second object from the second image and input it into a preset interference degree calculation model, In this way, the interference degree value output by the interference degree calculation model is obtained, and it is judged whether the interference degree value is in the preset interference value interval, wherein the interference value is used to measure the influence of the second object on the first object.
- Degree of interference is configured to, if the voiceprint recognition result is not a negative voiceprint, extract the image information of the second object from the second image and input it into a preset interference degree calculation model, In this way, the interference degree value output by the interference degree calculation model is obtained, and it is judged whether the interference degree value is in the preset interference value interval, wherein the interference value is used to measure the influence of the second object on the first object.
- the alarm unit is configured to perform an alarm operation if the interference degree value is not within the preset interference value interval.
- the application provides a computer device, including: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be used by the one Or executed by multiple processors, and the one or more computer programs are configured to execute an alarm method based on a residual network, wherein the alarm method based on a residual network includes:
- the first image collected by the first camera and input the first image into a preset pedestrian re-recognition model based on residual network to obtain human body characteristics, and determine the pedestrian recognition result and the preset suspect object library Whether the suspects in are the same;
- a second image collected by the second camera is acquired, where the second image includes at least a first object and a second object, and the second image The distance between an object and the second object is not greater than a preset distance, and the viewing range of the second camera is within the viewing range of the first camera;
- the sound information of the first object is collected through a preset microphone, and the sound information is input into the preset voiceprint recognition model, thereby Obtaining a voiceprint recognition result, and determining whether the voiceprint recognition result is a negative voiceprint, wherein the voiceprint recognition result includes a negative voiceprint and a non-negative voiceprint;
- the voiceprint recognition result is not a negative voiceprint
- the interference degree value output by the model, and determine whether the interference degree value is in a preset interference value interval, wherein the interference value is used to measure the degree of interference of the second object with the first object;
- the present application provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, a residual network-based alarm method is implemented, wherein the residual network-based alarm method is
- the alarm method includes the following steps:
- the first image collected by the first camera and input the first image into a preset pedestrian re-recognition model based on residual network to obtain human body characteristics, and determine the pedestrian recognition result and the preset suspect object library Whether the suspects in are the same;
- a second image collected by the second camera is acquired, where the second image includes at least a first object and a second object, and the second image The distance between an object and the second object is not greater than a preset distance, and the viewing range of the second camera is within the viewing range of the first camera;
- the sound information of the first object is collected through a preset microphone, and the sound information is input into the preset voiceprint recognition model, thereby Obtaining a voiceprint recognition result, and determining whether the voiceprint recognition result is a negative voiceprint, wherein the voiceprint recognition result includes a negative voiceprint and a non-negative voiceprint;
- the voiceprint recognition result is not a negative voiceprint
- the interference degree value output by the model, and determine whether the interference degree value is in a preset interference value interval, wherein the interference value is used to measure the degree of interference of the second object with the first object;
- the alarm method, device, computer equipment, and storage medium based on the residual network of the present application improve the accuracy of the alarm by means of micro-expression recognition plus voiceprint recognition; the interference degree value is calculated by the interference degree calculation model, wherein the The interference value is used to measure the degree of interference of the second object with the first object, so as to confirm whether the second object may threaten the first object. If the degree of interference is high, an alarm operation is performed. In this way, it is determined whether an alarm is needed with the aid of the associated (second object) object, which further improves the accuracy of the alarm.
- FIG. 1 is a schematic flowchart of an alarm method based on a residual network according to an embodiment of the application
- FIG. 2 is a schematic block diagram of the structure of an alarm device based on an associated object according to an embodiment of the application;
- FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
- an embodiment of the present application provides an alarm method based on a residual network, which includes the following steps:
- the pedestrian recognition result is different from the suspect object in the preset suspect object library, acquire a second image collected by the second camera, where the second image includes at least a first object and a second object, so The distance between the first object and the second object is not greater than a preset distance, and the viewing range of the second camera is within the viewing range of the first camera;
- micro-expression recognition result does not belong to the preset malicious micro-expression list, collect the sound information of the first object through a preset microphone, and input the sound information into the preset voiceprint recognition model , So as to obtain a voiceprint recognition result, and determine whether the voiceprint recognition result is a negative voiceprint, wherein the voiceprint recognition result includes a negative voiceprint and a non-negative voiceprint;
- the voiceprint recognition result is not a negative voiceprint
- the first image collected by the first camera is acquired, and the first image is input into a preset pedestrian re-recognition model based on the residual network, so as to obtain the pedestrian recognition result, and determine the pedestrian Whether the recognition result is the same as the suspect object in the preset suspect object library, wherein the pedestrian recognition result is a human body feature.
- the residual network is, for example, resnet50, resnet101, resnet152, and the resnet50 model is preferred.
- the residual network includes first to fifth residual blocks, and each residual block includes at least one convolutional layer, and can output corresponding feature images.
- Pedestrian re-identification is a technology that uses computer vision technology to determine whether there is a specific pedestrian in an image or video sequence. Based on this, when there is an image of a person (which may not be frontal), the identity of the pedestrian in the image is recognized. Then it is determined whether the pedestrian recognition result is the same as the suspect object in the preset suspect object library.
- the suspected objects are, for example, pyramid schemes, criminals, or persons who have been pulled into pyramid schemes, held persons, and missing persons. In this way, it can be analyzed whether there is a prior conviction object or a kidnapped object in the current scene, and if it exists, the possibility of being kidnapped in the current scene is higher.
- the first camera collects a wide range of images, for example, covering the lobby of a bank business outlet.
- the second image collected by the second camera is acquired, wherein the second image includes at least the first object and the suspect object.
- a second object, the distance between the first object and the second object is not greater than a preset distance, and the viewing range of the second camera is within the viewing range of the first camera.
- the second image is used to identify whether the first object is normal and whether the second object is normal.
- the purpose of the distance between the first object and the second object is not greater than the preset distance is that when the distance between the held object and the held object is too large, the held object cannot be effectively monitored, so the held object can Self-warning, so when the distance between the first object and the second object is not greater than a preset distance, there may be a phenomenon of being held.
- the present application also analyzes the associated second object, and accordingly collects a second image including the second object.
- the image information of the first object is extracted from the second image and input into a preset micro-expression recognition model to obtain a micro-expression recognition result, and determine the micro-expression recognition Whether the result belongs to a preset malicious micro-expression list, wherein the micro-expression recognition result is a micro-expression category.
- extracting the image information of the first object from the second image and inputting it into a preset micro-expression recognition model is, for example, using a preset body contour extraction method to extract the human body of the first object The contour is extracted, and the facial area in the contour of the human body is recognized, and the image data of the facial area is input into a preset micro-expression recognition model.
- the micro expression recognition model is, for example, a micro expression recognition model based on neural network model training, wherein the micro expression recognition model is based on a face image and a sample composed of micro expression categories associated with the face image Data training.
- the neural network model can be any model, such as VGG16 model, VGG19 model, VGG-F model, ResNet152 model, ResNet50 model, DPN131 model, IXception model, AlexNet model, DenseNet model, etc.
- the DPN model is preferred.
- DPN Dual Path Network
- the above-mentioned DPN, ResNeXt and DenseNet are existing network structures and will not be repeated here.
- the identified micro-expressions can be classified into any types, preferably 54 types of micro-expressions, and further micro-expressions of fear, tension, passivity, distraction, and anxiety are recorded in the malicious micro-expressions list.
- step S4 if the micro-expression recognition result does not belong to the preset malicious micro-expression list, the sound information of the first object is collected through a preset microphone, and the sound information is input to the preset In the voiceprint recognition model, a voiceprint recognition result is obtained, and it is determined whether the voiceprint recognition result is a negative voiceprint, wherein the voiceprint recognition result includes a negative voiceprint and a non-negative voiceprint. In order to further determine whether the first object is being held, this application also uses voiceprint recognition to make the judgment.
- the process of voiceprint recognition is, for example, inputting the voice information into a preset voiceprint recognition model, and using the voiceprint recognition model to parse the voice information to obtain a specified voice feature, wherein the specified voice feature At least including the highest speaking rate, the lowest speaking rate, the number of accents, and the number of all words in the voice information; the specified voice feature is mapped into a multi-dimensional vector, wherein a sub-vector of the multi-dimensional vector is the highest speaking rate and the lowest The difference in speech rate, the other sub-vector is the ratio of the number of accents to the number of all words in the voice information; the multi-dimensional vector corresponding to the multiple standard voiceprints in the preset standard voiceprint library is calculated The distance of the vector, and the standard voiceprint with the smallest distance is recorded as the target voiceprint (where the distance is, for example, Euclidean distance, the smaller the distance, the more similar the two vectors are.
- the target voiceprint is the closest to the multi-dimensional vector); the voiceprint category corresponding to the target voiceprint is output as the voiceprint recognition result, wherein the voiceprint category includes negative voiceprints (in this application, at least Four dimensions (including at least the highest speaking rate, the lowest speaking rate, the number of accents, and the number of all words in the voice information) are measured, and the specific measurement results are reflected in the standard voiceprint library; when the target voiceprint is a negative voiceprint, Then the output result is negative voiceprint) and non-negative voiceprint.
- the voiceprint category includes negative voiceprints (in this application, at least Four dimensions (including at least the highest speaking rate, the lowest speaking rate, the number of accents, and the number of all words in the voice information) are measured, and the specific measurement results are reflected in the standard voiceprint library; when the target voiceprint is a negative voiceprint, Then the output result is negative voiceprint) and non-negative voiceprint.
- the distance is, for example, the Euclidean distance.
- the purpose of using the voiceprint category corresponding to the target voiceprint as the result of voiceprint recognition is to determine whether the first object has negative emotions (if the first object is hijacked, etc., then there will be negative emotions ). Wherein, the negative voiceprints represent negative emotions.
- step S5 if the voiceprint recognition result is not a negative voiceprint, the image information of the second object is extracted from the second image and input into the preset interference degree calculation model, In this way, the interference degree value output by the interference degree calculation model is obtained, and it is judged whether the interference degree value is in the preset interference value interval, wherein the interference value is used to measure the influence of the second object on the first object. Degree of interference.
- the calculation method of the interference degree value is, for example, using a preset human body image extraction method to extract the human body image of the second object from the second image, and extract limb features from the human body image; Filter out the specified limb feature pointing to the first object from the limb characteristics; use the video to obtain the length of time the specified limb feature exists; combine the specified limb feature and the length of time the specified limb feature exists Input the preset interference degree calculation model to obtain the interference degree value output by the interference degree calculation model.
- the designated limb features pointing to the first object are, for example, finger pointing, arm pointing, palm pointing, chin pointing, and so on.
- the interference degree calculation model can be any feasible model, such as a neural network model.
- the calculation process is, for example: extracting feature information from the image information of the second object, and obtaining the feature information corresponding to the preset weight parameter table.
- the weight parameter of uses the preset weight sum formula to calculate the interference degree value.
- step S6 if the interference degree value is not within the preset interference value interval, an alarm operation is performed. If the interference degree value is not in the preset interference value interval, it indicates that the second object has interference with the first object, but the interference is not an explicit behavior, and therefore does not belong to the interference of relatives or friends. Therefore, there may be a phenomenon of seizure. Perform an alarm operation accordingly.
- the step S1 of inputting the first image into a preset pedestrian re-recognition model based on residual network to obtain a pedestrian recognition result includes:
- this application also sets up a global recognition sub-network and a local recognition sub-network in the pedestrian re-recognition model to communicate with the fifth residual
- the difference block receives the characteristic image output by the fourth residual block in parallel.
- the global recognition sub-network and the local recognition sub-network can selectively save the global and local features of the feature image output by the fourth residual block, thereby avoiding the loss of useful data and avoiding the addition of excessive interference data.
- the output layer of the pedestrian re-recognition model can be any layer.
- the process of the fifth residual block on the feature image is a process including convolution (it may also include processes such as pooling and activation).
- the process of the global recognition sub-network preset in the pedestrian re-recognition model on the feature image is the process of extracting the global features of the feature image (features of the entire image), for example, the global color of the feature image The extraction is performed, and the global contour of the feature image is extracted.
- the process of calculating the feature image in the preset local recognition sub-network in the pedestrian re-recognition model is to perform the feature of the local area in the feature image (for example, the head area is selected in the whole image).
- the extraction process for example, extracts the local color of the characteristic image, and extracts the local contour of the characteristic image.
- the global recognition sub-network and the local recognition sub-network may adopt any neural network structure, for example, a structure based on a convolutional neural network. Accordingly, the global sub-data, the local sub-data and the main data in order to avoid loss of details are input into the fully connected layer preset in the pedestrian re-recognition model, so as to obtain the output of the fully connected layer Pedestrian re-identification results. Thereby improving the accuracy of the recognition results.
- the step S102 of inputting the characteristic image into a preset global recognition sub-network in the pedestrian re-recognition model to obtain the global sub-data output by the global recognition sub-network includes:
- S1021 extract designated data from the feature image through the global recognition sub-network, and determine whether the value of the designated data is within a preset value range, wherein the designated data includes at least a human body contour, a human skin color, or Dress color
- this application proposes global sub-data in the characteristic image output by the fourth residual block, where the value of the global sub-data is not within the preset value range, so as to achieve the preservation of data with large differences. , And avoid the interference of useless data.
- the designated data is data that can reflect the characteristics of pedestrians, for example, including human body contour, human skin color, or clothing color. Since the contours of the human body are not uniform, the skin color or the color of clothing is also likely to be different, it is extracted as the designated data accordingly. If the value of the specified data is not within the preset value range, it indicates that the specified data is available.
- the global identification sub-network selects a plurality of designated data for collection, and uses designated data whose values are not within a preset value range as global sub-data, and outputs the designated data.
- the number of designated data can be set to 2-10, preferably 6-8.
- the global recognition sub-network may include a neural network with any number of layers, for example, a neural network with 6-8 layers.
- the inputting the characteristic image into a preset partial recognition sub-network in the pedestrian re-recognition model for calculation, so as to obtain the partial sub-data output by the partial recognition sub-network includes: through the partial recognition The sub-network adopts a preset block division method to divide the characteristic image into a plurality of blocks; extract designated data from each of the blocks, and determine whether the value of the designated data is within a preset value range
- the specified data includes at least a partial contour, partial skin color, or partial clothing color; if the value of the specified data is not within the preset value range, the specified data is taken as partial sub-data and output The local sub-data. Accordingly, the partial sub-data output by the partial recognition sub-network is obtained.
- this application uses the local recognition sub-network to divide the characteristic image into multiple blocks using a preset block division method, and extracts designated data from each block. If the numerical value of the designated data is not within the preset numerical range, the designated data is taken as the local sub-data, and the local sub-data is output. In this way, valuable sub-data of the number of rounds can be saved and used as one of the basis for subsequent identification. Further, the local recognition sub-network selects a plurality of designated data for collection, and uses designated data whose values are not within a preset value range as global sub-data, and outputs it.
- the number of designated data can be set to 2-10, preferably 6-8.
- the local recognition sub-network may include a neural network with any number of layers, for example, a neural network with 8-10 layers.
- the block division method is, for example, identifying the characteristic shape in the characteristic image, and dividing the area centered on the characteristic shape as a single block (for example, if the contour of the head is recognized, the head The part contour is divided as the head block).
- the main data, the global sub-data, and the local sub-data are input into the fully connected layer preset in the pedestrian re-identification model, so as to obtain the output of the fully connected layer Step S103 of the pedestrian re-identification result includes:
- the comprehensive utilization of the main data, the global sub-data and the local sub-data is realized, so as to obtain the pedestrian re-identification result output by the fully connected layer.
- the models based on the residual network in the traditional technology all input the data of the fifth residual block into the fully connected layer, and then the fully connected layer maps the data into feature vectors.
- this application also comprehensively considers the main data output by the fifth residual block, the global sub-data output by the global recognition sub-network, and the local sub-data output by the local recognition sub-network, thereby using the fully connected layer to combine It is mapped to a fixed-length feature vector, thereby improving the recognition accuracy.
- the preset mapping method is similar to the mapping method of the fully connected layer in the traditional technology, and will not be repeated here.
- each component vector of the feature vector output by the fully connected layer represents the corresponding recognition result
- the recognition result corresponding to the component vector with the largest value is the most likely recognition result, so the recognition result corresponding to the component vector with the largest value is taken as The final output of the recognition result.
- the image information of the first object is extracted from the second image and input into a preset micro-expression recognition model to obtain a micro-expression recognition result, and determine the micro-expression Before step S3 in which the recognition result belongs to the preset malicious micro-expression list, wherein the micro-expression recognition result is a micro-expression category, the method includes:
- S21 Acquire a specified number of sample data, and divide the sample data into a training set and a test set; wherein the sample data includes a face image and a micro-expression category associated with the face image;
- the establishment of a micro-expression recognition model is achieved.
- This embodiment is based on a neural network model to train a micro-expression recognition model.
- the neural network model can be a VGG16 model, a VGG19 model, a VGG-F model, a ResNet152 model, a ResNet50 model, a DPN131 model, an IXception model, an AlexNet model, a DenseNet model, etc.
- the DPN model is preferred.
- the stochastic gradient descent method is to randomly sample some training data to replace the entire training set. If the sample size is large (for example, hundreds of thousands), then only tens of thousands or thousands of samples may be used, and iterative When the optimal solution is reached, the training speed can be improved.
- the training process can also use the reverse conduction rule to update the parameters of each layer of the neural network model.
- the reverse conduction law is based on the gradient descent method.
- the input-output relationship of the BP network is essentially a mapping relationship: the function of a BP neural network with n inputs and m outputs is from n-dimensional Euclidean A continuous mapping from space to a finite field in m-dimensional Euclidean space. This mapping is highly non-linear and facilitates the update of the parameters of each layer of the neural network model.
- the sample data of the test set is then used to verify the initial micro-expression recognition model, and if the verification is passed, the initial micro-expression recognition model is recorded as the micro-expression recognition model.
- the step S4 of inputting the voice information into a preset voiceprint recognition model to obtain a voiceprint recognition result includes:
- S401 Input the voice information into a preset voiceprint recognition model, and use the voiceprint recognition model to analyze the voice information to obtain designated voice features, where the designated voice features include at least the highest speech rate and the lowest Speaking rate, number of accents, and the number of all words in the voice information;
- S402. Map the specified voice feature to a multi-dimensional vector, wherein one sub-vector of the multi-dimensional vector is the difference between the highest speech rate and the lowest speech rate, and the other sub-vector is the number of accents and the voice information The ratio of the number of all words in
- S403 Calculate the distance between the multi-dimensional vector and the multi-dimensional vector corresponding to the multiple standard voiceprints in the preset standard voiceprint library, and record the standard voiceprint with the smallest distance as the target voiceprint;
- S404 Output a voiceprint category corresponding to the target voiceprint as a voiceprint recognition result, where the voiceprint category includes a negative voiceprint and a non-negative voiceprint.
- Voiceprint is a sound wave spectrum that carries verbal information displayed by electroacoustic instruments. Voiceprint is not only specific, but also relatively stable. Therefore, voiceprints under different emotions are different. Based on this, the first analysis can be made. The emotional state of a subject.
- This application uses the analysis of the sound information to obtain designated sound features, where the designated sound features include at least the highest speech rate, the lowest speech rate, the number of accents, and the number of all words in the sound information, and the designated sound features
- the mapping is a multi-dimensional vector, where one sub-vector of the multi-dimensional vector is the difference between the highest speech rate and the lowest speech rate, and the other sub-vector is the ratio of the number of accents to the number of all words in the sound information, thereby
- the sound information is mapped into a multi-dimensional vector, where vectors of other dimensions of the multi-dimensional vector may include other sound features.
- the distance is, for example, Euclidean distance, cosine similarity, and so on.
- the second image is a frame of image in the video captured by the second camera, and the image information of the second object is extracted from the second image, and the preset is input
- the step S5 of obtaining the interference degree value output by the interference degree calculation model in the interference degree calculation model includes:
- S501 Extract a human body image of the second object from the second image by using a preset human body image extraction method, and extract limb features from the human body image;
- S503 Use the video to obtain the length of time that the specified physical feature exists.
- S504 Input the specified limb feature and the length of time the specified limb feature exists into a preset interference degree calculation model, so as to obtain an interference degree value output by the interference degree calculation model.
- This application recognizes the specified limb feature pointing to the first object from the image, and uses the video to obtain the length of time the specified limb feature exists as the basis for calculating the interference degree value. For example, when the second object points to the first object with a finger for one second, it can be determined that the second object has a strong interference with the first object. Accordingly, the specified limb feature and the length of time that the specified limb feature exists are input into a preset interference degree calculation model, so as to obtain the interference degree value output by the interference degree calculation model.
- the alarm method based on the residual network of this application improves the accuracy of the alarm by micro-expression recognition plus voiceprint recognition; the interference degree value is calculated by the interference degree calculation model, wherein the interference value is used to measure the first The degree of interference of the two objects with the first object, so as to confirm whether there is a possibility that the second object may threaten the first object. If the degree of interference is high, an alarm operation is performed. In this way, it is determined whether an alarm is required by the associated (second object) object, and the accuracy of the alarm is further improved.
- an embodiment of the present application provides an alarm device based on an associated object, including:
- the pedestrian recognition result judgment unit 10 is used to obtain the first image collected by the first camera, and input the first image into the preset pedestrian re-recognition model based on the residual network, thereby obtaining the pedestrian recognition result, and judging all Whether the pedestrian recognition result is the same as the suspect object in the preset suspect object library, wherein the pedestrian recognition result is a human body feature;
- the second image acquisition unit 20 is configured to acquire a second image collected by a second camera if the pedestrian recognition result is not the same as the suspect object in the preset suspect object library, wherein the second image includes at least the first image.
- the micro-expression recognition unit 30 is configured to extract the image information of the first object from the second image and input it into a preset micro-expression recognition model to obtain a micro-expression recognition result and determine the micro-expression Whether the recognition result belongs to a preset malicious micro-expression list, wherein the micro-expression recognition result is a micro-expression category;
- the voiceprint recognition unit 40 is configured to, if the micro-expression recognition result does not belong to the preset malicious micro-expression list, collect the sound information of the first object through a preset microphone, and input the sound information into the preset In the voiceprint recognition model of, obtain a voiceprint recognition result, and determine whether the voiceprint recognition result is a negative voiceprint, wherein the voiceprint recognition result includes a negative voiceprint and a non-negative voiceprint;
- the interference degree calculation unit 50 is configured to, if the voiceprint recognition result is not a negative voiceprint, extract the image information of the second object from the second image and input it into a preset interference degree calculation model , So as to obtain the interference degree value output by the interference degree calculation model, and determine whether the interference degree value is in a preset interference value interval, wherein the interference value is used to measure the effect of the second object on the first object.
- the degree of interference is configured to, if the voiceprint recognition result is not a negative voiceprint, extract the image information of the second object from the second image and input it into a preset interference degree calculation model , So as to obtain the interference degree value output by the interference degree calculation model, and determine whether the interference degree value is in a preset interference value interval, wherein the interference value is used to measure the effect of the second object on the first object.
- the alarm unit 60 is configured to perform an alarm operation if the interference degree value is not within the preset interference value interval.
- the pedestrian recognition result judgment unit 10 includes:
- the feature image acquisition subunit is used to input the first image into a preset trained pedestrian re-recognition model based on the residual network for calculation, so as to obtain the output of the fourth residual block in the residual network
- a feature image wherein the pedestrian re-recognition model is trained based on the pedestrian image and the sample data of the recognition result associated with the pedestrian image, and the residual network has a total of five residual blocks;
- the data acquisition subunit is configured to input the characteristic image into the fifth residual block in the residual network for calculation, thereby obtaining the main data output by the fifth residual block; and parallelly transfer the
- the characteristic image is input into the global recognition sub-network preset in the pedestrian re-recognition model for calculation, so as to obtain the global sub-data output by the global recognition sub-network; and the characteristic image is input into the pedestrian re-recognition model in parallel Calculated in the preset partial recognition sub-network to obtain the partial sub-data output by the partial recognition sub-network;
- the pedestrian re-recognition result acquisition subunit is used to input the master data, the global sub-data, and the local sub-data into the fully connected layer preset in the pedestrian re-recognition model, so as to obtain the fully connected layer The output of pedestrian re-identification results.
- the data acquisition subunit includes:
- the designated data acquisition module is used to extract designated data from the characteristic image through the global recognition sub-network and determine whether the value of the designated data is within a preset value range, wherein the designated data includes at least a human body Contour, human skin color or clothing color;
- the global sub-data output module is configured to, if the value of the designated data is not within the preset value range, use the designated data as the global sub-data and output the global sub-data.
- the pedestrian re-identification result obtaining subunit includes:
- a mapping module configured to use a preset mapping method to map the main data, the global sub-data, and the local sub-data into a fixed-length feature vector through the fully connected layer;
- the recognition result output module is configured to output the recognition result corresponding to the component vector with the largest value in the feature vector according to the preset correspondence between the component vector and the recognition result.
- the device includes:
- the sample data acquisition unit is used to acquire a specified number of sample data, and divide the sample data into a training set and a test set; wherein the sample data includes a face image and a micro-expression category associated with the face image;
- the training unit is used to input the sample data of the training set into the preset neural network model for training to obtain the initial micro-expression recognition model, where the stochastic gradient descent method is used in the training process;
- a verification unit for verifying the initial micro-expression recognition model by using the sample data of the test set
- the marking unit is configured to record the initial micro-expression recognition model as the micro-expression recognition model if the verification is passed.
- the voiceprint recognition unit 40 includes:
- the voice information input subunit is used to input the voice information into a preset voiceprint recognition model, and use the voiceprint recognition model to parse the voice information to obtain a specified voice feature, wherein the specified voice feature is at least Including the highest speaking rate, the lowest speaking rate, the number of accents, and the number of all words in the voice information;
- the multi-dimensional vector mapping subunit is used to map the specified voice feature to a multi-dimensional vector, wherein one component of the multi-dimensional vector is the difference between the highest speech rate and the lowest speech rate, and the other component is the accent The ratio of the number to the number of all words in the voice information;
- the distance calculation subunit is used to calculate the distance between the multi-dimensional vector and the multi-dimensional vector corresponding to the multiple standard voiceprints in the preset standard voiceprint library, and record the standard voiceprint with the smallest distance as the target voiceprint;
- the voiceprint recognition result output subunit is configured to output a voiceprint category corresponding to the target voiceprint as a voiceprint recognition result, wherein the voiceprint category includes a negative voiceprint and a non-negative voiceprint.
- the interference degree calculation unit 50 includes:
- a human body image extraction subunit for extracting a human body image of the second object from the second image by using a preset human body image extraction method, and extracting limb features from the human body image;
- a designated limb feature acquiring subunit which is used to filter out a designated limb feature pointing to the first object from the limb feature;
- the time length obtaining subunit is configured to use the video to obtain the length of time that the specified body feature exists
- the interference degree value output subunit is used to input the specified limb feature and the length of time the specified limb feature exists into a preset interference degree calculation model, so as to obtain the interference degree value output by the interference degree calculation model.
- the related object-based alarm device of the present application improves the accuracy of the alarm by means of micro-expression recognition plus voiceprint recognition; the interference degree value is calculated by the interference degree calculation model, wherein the interference value is used to measure the second The degree of interference of the object with the first object, so as to confirm whether there is a possibility that the second object may threaten the first object. If the degree of interference is high, an alarm operation is performed. In this way, it is determined whether an alarm is needed with the aid of the associated (second object) object, which further improves the accuracy of the alarm.
- an embodiment of the present application also provides a computer device.
- the computer device may be a server, and its internal structure may be as shown in the figure.
- the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, a computer program, and a database.
- the memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
- the database of the computer equipment is used to store the data used in the alarm method based on the residual network.
- the network interface of the computer device is used to communicate with an external terminal through a network connection.
- the computer program is executed by the processor to realize an alarm method based on the residual network.
- the above-mentioned processor executes the above-mentioned alarm method based on the residual network, wherein the steps included in the method respectively correspond to the steps of executing the alarm method based on the residual network of the foregoing embodiment in a one-to-one correspondence, and will not be repeated here.
- the computer device of the present application improves the accuracy of the alarm by means of micro-expression recognition plus voiceprint recognition; the interference degree value is calculated by the interference degree calculation model, wherein the interference value is used to measure the influence of the second object on the The degree of interference of the first object to confirm whether there is a possibility that the second object may threaten the first object. If the degree of interference is high, an alarm operation is performed. In this way, it is determined whether an alarm is needed with the aid of the associated (second object) object, which further improves the accuracy of the alarm.
- An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
- a computer program is stored.
- the computer program is executed by a processor, a residual network-based alarm method is implemented, wherein the steps included in the method are the same as those in the previous implementation.
- the steps of the residual network-based alarm method correspond to each other, so I won’t repeat them here.
- the computer-readable storage medium of the present application improves the accuracy of the alarm by means of micro-expression recognition plus voiceprint recognition; the interference degree value is calculated by the interference degree calculation model, wherein the interference value is used to measure the second object The degree of interference with the first object to confirm whether there is a possibility that the second object may threaten the first object. If the degree of interference is high, an alarm operation is performed. In this way, it is determined whether an alarm is needed with the aid of the associated (second object) object, which further improves the accuracy of the alarm.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Traffic Control Systems (AREA)
- Alarm Systems (AREA)
- Image Analysis (AREA)
Abstract
本申请揭示了一种基于残差网络的报警方法、装置、计算机设备和存储介质,所述方法包括:获取第一摄像头采集的第一图像,并将所述第一图像输入行人再识别模型中,获得行人识别结果;若所述行人识别结果与嫌疑对象不相同,则获取第二图像;提取出第一对象的图像信息,得到微表情识别结果;若所述微表情识别结果不属于预设的恶意微表情列表,则采集声音信息,得到声纹识别结果;若声纹识别结果不为负面声纹,则从第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值;若干涉程度值不处于预设的干涉数值区间,则执行报警操作。从而提高了报警的准确性。
Description
本申请要求于2019年9月9日提交中国专利局、申请号为201910848452.8,发明名称为“基于残差网络的报警方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及到人工智能技术领域,特别是涉及到一种基于残差网络的报警方法、装置、计算机设备和存储介质。
在一些场景中,例如在传销分子威胁被传销人员在银行网点进行转账的场景中,发明人意识到传统技术仅能通过识别进行转账操作人员的一些特征来判断是否存在被挟持的现象,而在这些场景中,被传销人员一般会被监视,并且同时被要求不能做出示警表情,因此无法做出明显地报警举动。而传统技术仅通过识别进行转账操作人员的一些特征的方法,识别出是否存在被挟持的现象的准确性不足。
本申请的主要目的为提供一种基于残差网络的报警方法、装置、计算机设备和存储介质,旨在提高报警的准确性。
为了实现上述发明目的,本申请提出一种基于残差网络的报警方法,包括以下步骤:
获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,从而获得行人识别结果,并判断所述行人识别结果是否与预设的嫌疑对象库中的嫌疑对象相同,其中所述行人识别结果为人体特征;
若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;
从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,从而得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;
若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;
若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;
若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
本申请提供一种基于关联对象的报警装置,包括:
行人识别结果判断单元,用于获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,从而获得行人识别结果,并判断所述行人识别结果是否与预设的嫌疑对象库中的嫌疑对象相同,其中所述行人识别结果为人体特征;
第二图像采集单元,用于若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象, 所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;
微表情识别单元,用于从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,从而得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;
声纹识别单元,用于若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;
干涉程度计算单元,用于若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;
报警单元,用于若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
本申请提供一种计算机设备,包括:一个或多个处理器;存储器;一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于执行一种基于基于残差网络的报警方法,其中,所述基于残差网络的报警方法包括:
获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得人体特征,并判断所述行人识别结果与预设的嫌疑对象库中的嫌疑对象是否相同;
若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;
从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;
若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;
若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;
若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
本申请提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现基于残差网络的报警方法,其中,所述基于残差网络的报警方法包括以下步骤:
获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得人体特征,并判断所述行人识别结果与预设的嫌疑对象库中的嫌疑对象是否相同;
若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第 二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;
从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;
若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;
若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;
若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
本申请的基于残差网络的报警方法、装置、计算机设备和存储介质,以微表情识别加上声纹识别的方式提高报警的准确性;通过干涉程度计算模型计算得到干涉程度值,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度,从而确认是否存在第二对象对第一对象进行要挟的可能,若干涉程度高,则执行报警操作。从而借助关联(第二对象)对象判断是否需要报警,更进一步提高了报警准确性。
图1为本申请一实施例的基于残差网络的报警方法的流程示意图;
图2为本申请一实施例的基于关联对象的报警装置的结构示意框图;
图3为本申请一实施例的计算机设备的结构示意框图。
参照图1,本申请实施例提供一种基于残差网络的报警方法,包括以下步骤:
S1、获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,从而获得行人识别结果,并判断所述行人识别结果是否与预设的嫌疑对象库中的嫌疑对象相同,其中所述行人识别结果为人体特征;
S2、若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;
S3、从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,从而得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;
S4、若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;
S5、若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于 衡量所述第二对象对所述第一对象的干涉程度;
S6、若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
如上述步骤S1所述,获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,从而获得行人识别结果,并判断所述行人识别结果是否与预设的嫌疑对象库中的嫌疑对象相同,其中所述行人识别结果为人体特征。其中所述残差网络例如为resnet50、resnet101、resnet152,优选resnet50模型。其中残差网络包括第一至第五残差块,每个残差块均包括至少一层卷积层,并能输出对应的特征图像。行人再识别,是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术,据此在存在人的图像(可以不为正面)的情况下,识别出图像中的行人身份。再判断所述行人识别结果是否与预设的嫌疑对象库中的嫌疑对象相同。其中所述嫌疑对象例如为传销分子、犯罪人员或者被拉入传销的人员、被挟持的人员、失踪人员。从而分析出在当前场景中是否存在有前科的对象或者被挟持的对象,而若存在,那么当前场景中存在挟持的可能性更高。其中所述第一摄像头采集大范围的图像,例如覆盖银行营业网点大厅。
如上述步骤S2所述,若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内。所述第二图像用于鉴别所述第一对象是否正常,以及鉴别所述第二对象是否正常。其中,所述第一对象与所述第二对象之间的距离不大于预设距离的用意在于:当挟持对象与被挟持对象距离过大时,挟持对象无法进行有效监视,因此被挟持对象可以自主报警,因此当所述第一对象与所述第二对象之间的距离不大于预设距离,有可能存在挟持的现象。并且为了防止单一对象判断的不准确,本申请还对关联的第二对象进行分析,据此采集了包括第二对象的第二图像。
如上述步骤S3所述,从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,从而得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别。其中,从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中例如为:使用预设的人体轮廓提取方法,将所述第一对象的人体轮廓提取出来,并识别出所述人体轮廓中的面部区域,将所述面部区域的图像数据输入预设的微表情识别模型中。其中,所述微表情识别模型例如为基于神经网络模型训练完成的微表情识别模型,其中,所述微表情识别模型基于人脸图像,以及与所述人脸图像关联的微表情类别组成的样本数据训练而成。其中神经网络模型可以为任意模型,例如VGG16模型、VGG19模型、VGG-F模型、ResNet152模型、ResNet50模型、DPN131模型、IXception模型、AlexNet模型和DenseNet模型等,优选DPN模型。DPN(Dual Path Network)是神经网络结构,在ResNeXt的基础上引入了DenseNet的核心内容,使得模型对特征的利用更加充分。上述DPN、ResNeXt和DenseNet是现有的网络结构,在此不在赘述。其中识别出的微表情可以分为任意种类,优选分为54种微表情,进一步地将恐惧、紧张、被动、分心和不安的微表情记录入恶意微表情列表。
如上述步骤S4所述,若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹。为了进一步判断所述第一对象是否被挟持,本申请还采用声纹识别的方式进行判断。其中声纹识别的过程例如为:将所述声音信息输入预设的声纹识别模型中,并利用所述声纹识别模型解析所述声音信息,从而得到指定声音特征,其中所述指定声音特征至少包括最高语速、最低语速、重音数量和所述声音信息中所有单词数量;将所述指定声音特征映射为多维向量,其中所述多维向量的一个分向量为所述最 高语速与最低语速的差值,另一个分向量为所述重音数量与所述声音信息中所有单词数量的比值;计算所述多维向量与预设的标准声纹库中的多个标准声纹对应的多维向量的距离,并将距离最小的标准声纹记为目标声纹(其中距离例如为欧氏距离,距离越小,表明两个向量越相似,以向量(111)和向量(111)举例,向量(111)和向量(111)的欧氏距离=√[(1-1)
2+(1-1)
2+(1-1)
2]=0,且欧氏距离的最小值为0,因此目标声纹与所述多维向量最为相近);将所述目标声纹对应的声纹类别作为声纹识别结果进行输出,其中所述声纹类别包括负面声纹(在本申请中,已经过至少四个维度(至少包括最高语速、最低语速、重音数量和所述声音信息中所有单词数量)进行衡量,具体地衡量结果反应在标准声纹库中;当目标声纹为负面声纹,则输出结果为负面声纹)与非负面声纹。其中所述距离例如为欧氏距离等。其中,将目标声纹对应的声纹类别作为声纹识别结果的意义的目的在于,判断所述第一对象是否存在负面情绪(若所述第一对象存在被劫持等情况,那么会存在负面情绪)。其中,所述负面声纹代表负面情绪。
如上述步骤S5所述,若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度。其中所述干涉程度值的计算方法例如为,利用预设的人体图像提取方法,从所述第二图像中提取出所述第二对象的人体图像,并从所述人体图像中提取肢体特征;从所述肢体特征中筛选出指向所述第一对象的指定肢体特征;利用所述视频获取所述指定肢体特征存在的时间长度;将所述指定肢体特征和所述指定肢体特征存在的时间长度输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值。其中所述指向所述第一对象的指定肢体特征例如为:手指指向,手臂指向,手掌指向,下巴指向等。干涉程度计算模型可以为任意可行模型,例如采用神经网络模型,其计算过程例如为:从所述第二对象的图像信息中提取特征信息,从预设的权重参数表中获取所述特征信息对应的权重参数,利用预设的权重加和公式计算得到干涉程度值。
如上述步骤S6所述,若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。若所述干涉程度值不处于预设的干涉数值区间,表明所述第二对象对所述第一对象存在干涉现象,但干涉并非是明示的行为,因此不属于亲朋好友之类的干涉,也因此可能存在挟持的现象。据此执行报警操作。
在一个实施方式中,所述将所述第一图像输入预设的基于残差网络的行人再识别模型中,从而获得行人识别结果的步骤S1,包括:
S101、将所述第一图像输入预设的训练好的基于残差网络的行人再识别模型中计算,从而获取所述残差网络中的第四个残差块输出的特征图像,其中,所述行人再识别模型基于行人图像,以及与行人图像关联的识别结果的样本数据训练而成,所述残差网络共有五个残差块;
S102、将所述特征图像输入所述残差网络中的第五个残差块中计算,从而获得所述第五个残差块输出的主数据;以及并行地将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据;以及并行地将所述特征图像输入所述行人再识别模型中的预设的局部识别子网络中计算,从而获得所述局部识别子网络输出的局部子数据;
S103、将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果。
如上所述,实现了获得所述全连接层输出的行人再识别结果。本申请为了解决网络在经过层层处理之后,输入图像的细节特征会相应丢失的技术问题,还在行人再识别模型中设置了全局识别子网络和局部识别子网络,用以与第五个残差块一起并行接收第四个残差 块输出的特征图像。而全局识别子网络和局部识别子网络能够将第四个残差块输出的特征图像的全局特征与局部特征选择性地保存下来,从而避免了有用数据的丢失,同时避免过多干扰数据的加入。所述行人再识别模型的输出层可为任意层,本申请优选全连接层,从而利用全连接层输出映射成一个固定长度的特征向量,再根据特征向量得到识别结果。其中所述第五个残差块对特征图像的过程,即是包括卷积在内的过程(也还可以包括池化、激活等过程)。所述行人再识别模型中预设的全局识别子网络对所述特征图像的过程,即是将所述特征图像的全局特征(整幅图像的特征)提取的过程,例如将特征图像的全局颜色进行提取、特征图像的全局轮廓进行提取。所述行人再识别模型中预设的局部识别子网络中计算对所述特征图像的过程,即是对所述特征图像中的局部区域(例如在整幅图像中选取头部区域)的特征进行提取的过程,例如将特征图像的局部颜色进行提取、特征图像的局部轮廓进行提取。其中所述全局识别子网络和所述局部识别子网络可以采用任意的神经网络构造,例如采用基于卷积神经网络构造而形成。据此,将为了避免细节丢失的所述全局子数据和所述局部子数据以及所述主数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果。从而提高识别结果的准确性。
在一个实施方式中,所述将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据的步骤S102,包括:
S1021、通过所述全局识别子网络在所述特征图像中提取指定数据,并判断所述指定数据的数值是否在预设的数值范围之内,其中所述指定数据至少包括人体轮廓、人体肤色或者衣着颜色;
S1022、若所述指定数据的数值不在预设的数值范围之内,则将所述指定数据作为全局子数据,并输出所述全局子数据。
如上所述,实现了获得所述全局识别子网络输出的全局子数据。为了防止图像细节丢失,本申请在第四个残差块输出的特征图像中提出全局子数据,其中所述全局子数据的数值不在预设的数值范围之内,以实现保留差别较大的数据,而避免无用数据的干扰。指定数据为能够体现行人特征的数据,例如包括人体轮廓、人体肤色或者衣着颜色。由于人体轮廓不均一致、肤色或者衣着颜色也很可能不相同,据此将其作为指定数据进行提取。若所述指定数据的数值不在预设的数值范围之内,表明所述指定数据可用,例如要在黄种人中识别出白人,则人体肤色的数据的颜色值不在预设的数值范围之内,则可以作为有效数据输出。进一步地,所述全局识别子网络选择多个指定数据进行采集,并将数值不在预设的数值范围之内的指定数据作为全局子数据,并输出。其中,指定数据的个数可设置为2-10个,优选6-8个。进一步地,所述全局识别子网络可包括任意层数的神经网络,例如包括6-8层神经网络。
进一步地,所述将所述特征图像输入所述行人再识别模型中的预设的局部识别子网络中计算,从而获得所述局部识别子网络输出的局部子数据,包括:通过所述局部识别子网络,采用预设的区块划分方法将所述特征图像划分为多个区块;在各个所述区块中分别提取指定数据,并判断所述指定数据的数值是否在预设的数值范围之内,其中所述指定数据至少包括局部轮廓、局部肤色、或者局部衣着颜色;若所述指定数据的数值不在预设的数值范围之内,则将所述指定数据作为局部子数据,并输出所述局部子数据。据此,实现了获得所述局部识别子网络输出的局部子数据。网络在经过层层处理之后,输入图像的细节特征会相应丢失,尤其是局部的图像数据会丢失。为了保留局部的有效数据,本申请通过所述局部识别子网络,采用预设的区块划分方法将所述特征图像划分为多个区块,并在各个所述区块中分别提取指定数据,若所述指定数据的数值不在预设的数值范围之内,则将所述指定数据作为局部子数据,并输出所述局部子数据。从而实现了保存有价值的局数子数据,并作为后续识别的依据之一。进一步地,所述局部识别子网络选择多个指定数据进行采集,并将数值不在预设的数值范围之内的指定数据作为全局子数据,并输出。其中, 指定数据的个数可设置为2-10个,优选6-8个。进一步地,所述局部识别子网络可包括任意层数的神经网络,例如包括8-10层神经网络。进一步地,所述区块划分方法例如为:识别出所述特征图像中的特征形状,并将所述特征形状为中心的区域作为单个区块进行划分(例如识别出头部轮廓,则将头部轮廓作为头部区块进行划分)。
在一个实施方式中,所述将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果的步骤S103,包括:
S1031、采用预设的映射方法,通过所述全连接层将所述主数据、所述全局子数据和所述局部子数据映射为一个固定长度的特征向量;
S1032、根据预设的分向量与识别结果对应关系,输出所述特征向量中数值最大的分向量对应的识别结果。
如上所述,实现了综合利用所述主数据、所述全局子数据和所述局部子数据,从而获得所述全连接层输出的行人再识别结果。传统技术中的基于残差网络的模型,均是将第五个残差块的数据输入全连接层中,再由全连接层将数据映射为特征向量。而本申请还综合考虑了所述第五个残差块输出的主数据、所述全局识别子网络输出的全局子数据和所述局部识别子网络输出的局部子数据,从而利用全连接层将其映射为一个固定长度的特征向量,从而提高了识别准确度。其中预设的映射方法,与传统技术中全连接层的映射方法相似,在此不再赘述。其中全连接层输出的特征向量的各个分向量均代表了对应的识别结果,而数值最大的分向量对应的识别结果则是最可能的识别结果,因此将数值最大的分向量对应的识别结果作为最终输出的识别结果。
在一个实施方式中,所述从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,从而得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别的步骤S3之前,包括:
S21、获取指定数量的样本数据,并将样本数据分成训练集和测试集;其中,所述样本数据包括人脸图像,以及与所述人脸图像关联的微表情类别;
S22、将训练集的样本数据输入到预设的神经网络模型中进行训练,得到初始微表情识别模型,其中,训练的过程中采用随机梯度下降法;
S23、利用测试集的样本数据验证所述初始微表情识别模型;
S24、若验证通过,则将所述初始微表情识别模型记为所述微表情识别模型。
如上所述,实现了设置微表情识别模型。本实施方式基于神经网络模型以训练出微表情识别模型。其中神经网络模型可为VGG16模型、VGG19模型、VGG-F模型、ResNet152模型、ResNet50模型、DPN131模型、IXception模型、AlexNet模型和DenseNet模型等,优选DPN模型。其中,随机梯度下降法就是随机取样一些训练数据,替代整个训练集,如果样本量很大的情况(例如几十万),那么可能只用其中几万条或者几千条的样本,就已经迭代到最优解了,可以提高训练速度。进一步地,训练的过程还可以采用反向传导法则更新所述神经网络模型各层的参数。其中反向传导法则(BP)建立在梯度下降法的基础上,BP网络的输入输出关系实质上是一种映射关系:一个n输入m输出的BP神经网络所完成的功能是从n维欧氏空间向m维欧氏空间中一有限域的连续映射,这一映射具有高度非线性,有利于神经网络模型各层的参数的更新。从而获得初始微表情识别模型。再利用测试集的样本数据验证所述初始微表情识别模型,若验证通过,则将所述初始微表情识别模型记为所述微表情识别模型。
在一个实施方式中,所述将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果的步骤S4,包括:
S401、将所述声音信息输入预设的声纹识别模型中,并利用所述声纹识别模型解析所述声音信息,从而得到指定声音特征,其中所述指定声音特征至少包括最高语速、最低语 速、重音数量和所述声音信息中所有单词数量;
S402、将所述指定声音特征映射为多维向量,其中所述多维向量的一个分向量为所述最高语速与最低语速的差值,另一个分向量为所述重音数量与所述声音信息中所有单词数量的比值;
S403、计算所述多维向量与预设的标准声纹库中的多个标准声纹对应的多维向量的距离,并将距离最小的标准声纹记为目标声纹;
S404、将所述目标声纹对应的声纹类别作为声纹识别结果进行输出,其中所述声纹类别包括负面声纹与非负面声纹。
如上所述,实现了得到声纹识别结果。声纹是用电声学仪器显示的携带言语信息的声波频谱,声纹不仅具有特定性,而且有相对稳定性的特点,因此在不同情绪下的声纹是有差异的,据此可以分析出第一对象的情绪状态。本申请采用解析所述声音信息,从而得到指定声音特征,其中所述指定声音特征至少包括最高语速、最低语速、重音数量和所述声音信息中所有单词数量,并将所述指定声音特征映射为多维向量,其中所述多维向量的一个分向量为所述最高语速与最低语速的差值,另一个分向量为所述重音数量与所述声音信息中所有单词数量的比值,从而将声音信息映射为多维向量,其中所述多维向量的其他维度的向量可以包括其他声音特征。再计算所述多维向量与预设的标准声纹库中的多个标准声纹对应的多维向量的距离,并将距离最小的标准声纹记为目标声纹,并且由于距离最小的标准声纹与所述声音信息对应的声纹最为相近,因此将距离最小的标准声纹记为目标声纹,并将所述目标声纹对应的声纹类别作为声纹识别结果进行输出。所述距离例如为欧氏距离、余弦相似度等等。其中所述标准声纹库中预存有不同情绪下的标准声纹以及所述标准声纹对应的多维向量。
在一个实施方式中,所述第二图像是所述第二摄像头采集的视频中的一帧图像,所述从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值的步骤S5,包括:
S501、利用预设的人体图像提取方法,从所述第二图像中提取出所述第二对象的人体图像,并从所述人体图像中提取肢体特征;
S502、从所述肢体特征中筛选出指向所述第一对象的指定肢体特征;
S503、利用所述视频获取所述指定肢体特征存在的时间长度;
S504、将所述指定肢体特征和所述指定肢体特征存在的时间长度输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值。
如上所述,实现了得到所述干涉程度计算模型输出的干涉程度值。本申请通过从图像中识别出指向所述第一对象的指定肢体特征,并且利用所述视频获取所述指定肢体特征存在的时间长度作为计算干涉程度值的基础。例如,当第二对象用手指指向所述第一对象并保持了一秒钟时间,即可以确定所述第二对象对所述第一对象存在较强的干涉。据此,将所述指定肢体特征和所述指定肢体特征存在的时间长度输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值。
本申请的基于残差网络的报警方法,以微表情识别加上声纹识别的方式提高报警的准确性;通过干涉程度计算模型计算得到干涉程度值,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度,从而确认是否存在第二对象对第一对象进行要挟的可能,若干涉程度高,则执行报警操作。从而借助关联(第二对象)对象判断是否需要报警,更进一步提高了报警准确性。
参照图2,本申请实施例提供一种基于关联对象的报警装置,包括:
行人识别结果判断单元10,用于获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,从而获得行人识别结果,并判断所述行人识别结果是否与预设的嫌疑对象库中的嫌疑对象相同,其中所述行人识别结果为人体特 征;
第二图像采集单元20,用于若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;
微表情识别单元30,用于从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,从而得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;
声纹识别单元40,用于若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;
干涉程度计算单元50,用于若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;
报警单元60,用于若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
其中上述单元分别用于执行的操作与前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述行人识别结果判断单元10,包括:
特征图像获取子单元,用于将所述第一图像输入预设的训练好的基于残差网络的行人再识别模型中计算,从而获取所述残差网络中的第四个残差块输出的特征图像,其中,所述行人再识别模型基于行人图像,以及与行人图像关联的识别结果的样本数据训练而成,所述残差网络共有五个残差块;
数据获取子单元,用于将所述特征图像输入所述残差网络中的第五个残差块中计算,从而获得所述第五个残差块输出的主数据;以及并行地将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据;以及并行地将所述特征图像输入所述行人再识别模型中的预设的局部识别子网络中计算,从而获得所述局部识别子网络输出的局部子数据;
行人再识别结果获取子单元,用于将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果。
其中上述子单元分别用于执行的操作与前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述数据获取子单元,包括:
指定数据获取模块,用于通过所述全局识别子网络在所述特征图像中提取指定数据,并判断所述指定数据的数值是否在预设的数值范围之内,其中所述指定数据至少包括人体轮廓、人体肤色或者衣着颜色;
全局子数据输出模块,用于若所述指定数据的数值不在预设的数值范围之内,则将所述指定数据作为全局子数据,并输出所述全局子数据。
其中上述模块分别用于执行的操作与前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述行人再识别结果获取子单元,包括:
映射模块,用于采用预设的映射方法,通过所述全连接层将所述主数据、所述全局子数据和所述局部子数据映射为一个固定长度的特征向量;
识别结果输出模块,用于根据预设的分向量与识别结果对应关系,输出所述特征向量中数值最大的分向量对应的识别结果。
其中上述模块分别用于执行的操作与前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述装置,包括:
样本数据获取单元,用于获取指定数量的样本数据,并将样本数据分成训练集和测试集;其中,所述样本数据包括人脸图像,以及与所述人脸图像关联的微表情类别;
训练单元,用于将训练集的样本数据输入到预设的神经网络模型中进行训练,得到初始微表情识别模型,其中,训练的过程中采用随机梯度下降法;
验证单元,用于利用测试集的样本数据验证所述初始微表情识别模型;
标记单元,用于若验证通过,则将所述初始微表情识别模型记为所述微表情识别模型。
其中上述单元分别用于执行的操作与前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述声纹识别单元40,包括:
声音信息输入子单元,用于将所述声音信息输入预设的声纹识别模型中,并利用所述声纹识别模型解析所述声音信息,从而得到指定声音特征,其中所述指定声音特征至少包括最高语速、最低语速、重音数量和所述声音信息中所有单词数量;
多维向量映射子单元,用于将所述指定声音特征映射为多维向量,其中所述多维向量的一个分向量为所述最高语速与最低语速的差值,另一个分向量为所述重音数量与所述声音信息中所有单词数量的比值;
距离计算子单元,用于计算所述多维向量与预设的标准声纹库中的多个标准声纹对应的多维向量的距离,并将距离最小的标准声纹记为目标声纹;
声纹识别结果输出子单元,用于将所述目标声纹对应的声纹类别作为声纹识别结果进行输出,其中所述声纹类别包括负面声纹与非负面声纹。
其中上述子单元分别用于执行的操作与前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
在一个实施方式中,所述干涉程度计算单元50,包括:
人体图像提取子单元,用于利用预设的人体图像提取方法,从所述第二图像中提取出所述第二对象的人体图像,并从所述人体图像中提取肢体特征;
指定肢体特征获取子单元,用于从所述肢体特征中筛选出指向所述第一对象的指定肢体特征;
时间长度获取子单元,用于利用所述视频获取所述指定肢体特征存在的时间长度;
干涉程度值输出子单元,用于将所述指定肢体特征和所述指定肢体特征存在的时间长度输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值。
其中上述子单元分别用于执行的操作与前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
本申请的基于关联对象的报警装置,以微表情识别加上声纹识别的方式提高报警的准确性;通过干涉程度计算模型计算得到干涉程度值,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度,从而确认是否存在第二对象对第一对象进行要挟的可能,若干涉程度高,则执行报警操作。从而借助关联(第二对象)对象判断是否需要报警,更进一步提高了报警准确性。
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程 序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储基于残差网络的报警方法所用数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于残差网络的报警方法。
上述处理器执行上述基于残差网络的报警方法,其中所述方法包括的步骤分别与执行前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
本申请的计算机设备,以微表情识别加上声纹识别的方式提高报警的准确性;通过干涉程度计算模型计算得到干涉程度值,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度,从而确认是否存在第二对象对第一对象进行要挟的可能,若干涉程度高,则执行报警操作。从而借助关联(第二对象)对象判断是否需要报警,更进一步提高了报警准确性。
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现基于残差网络的报警方法,其中所述方法包括的步骤分别与执行前述实施方式的基于残差网络的报警方法的步骤一一对应,在此不再赘述。
本申请的计算机可读存储介质,以微表情识别加上声纹识别的方式提高报警的准确性;通过干涉程度计算模型计算得到干涉程度值,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度,从而确认是否存在第二对象对第一对象进行要挟的可能,若干涉程度高,则执行报警操作。从而借助关联(第二对象)对象判断是否需要报警,更进一步提高了报警准确性。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
Claims (20)
- 一种基于残差网络的报警方法,包括:获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得人体特征,并判断所述行人识别结果与预设的嫌疑对象库中的嫌疑对象是否相同;若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
- 根据权利要求1所述的基于残差网络的报警方法,所述将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得行人识别结果的步骤,包括:将所述第一图像输入预设的训练好的基于残差网络的行人再识别模型中计算,从而获取所述残差网络中的第四个残差块输出的特征图像,其中,所述行人再识别模型基于行人图像,以及与行人图像关联的识别结果的样本数据训练而成,所述残差网络共有五个残差块;将所述特征图像输入所述残差网络中的第五个残差块中计算,从而获得所述第五个残差块输出的主数据;以及并行地将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据;以及并行地将所述特征图像输入所述行人再识别模型中的预设的局部识别子网络中计算,从而获得所述局部识别子网络输出的局部子数据;将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果。
- 根据权利要求2所述的基于残差网络的报警方法,所述将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据的步骤,包括:通过所述全局识别子网络在所述特征图像中提取指定数据,并判断所述指定数据的数值是否在预设的数值范围之内,其中所述指定数据至少包括人体轮廓、人体肤色或者衣着颜色;若所述指定数据的数值不在预设的数值范围之内,则将所述指定数据作为全局子数据,并输出所述全局子数据。
- 根据权利要求2所述的基于残差网络的报警方法,所述将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全 连接层输出的行人再识别结果的步骤,包括:采用预设的映射方法,通过所述全连接层将所述主数据、所述全局子数据和所述局部子数据映射为一个固定长度的特征向量;根据预设的分向量与识别结果对应关系,输出所述特征向量中数值最大的分向量对应的识别结果。
- 根据权利要求1所述的基于残差网络的报警方法,所述从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别的步骤之前,包括:获取指定数量的样本数据,并将样本数据分成训练集和测试集;其中,所述样本数据包括人脸图像,以及与所述人脸图像关联的微表情类别;将训练集的样本数据输入到预设的神经网络模型中进行训练,得到初始微表情识别模型,其中,训练的过程中采用随机梯度下降法;利用测试集的样本数据验证所述初始微表情识别模型;若验证通过,则将所述初始微表情识别模型记为所述微表情识别模型。
- 根据权利要求1所述的基于残差网络的报警方法,所述将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果的步骤,包括:将所述声音信息输入预设的声纹识别模型中,并利用所述声纹识别模型解析所述声音信息,从而得到指定声音特征,其中所述指定声音特征至少包括最高语速、最低语速、重音数量和所述声音信息中所有单词数量;将所述指定声音特征映射为多维向量,其中所述多维向量的一个分向量为所述最高语速与最低语速的差值,另一个分向量为所述重音数量与所述声音信息中所有单词数量的比值;计算所述多维向量与预设的标准声纹库中的多个标准声纹对应的多维向量的距离,并将距离最小的标准声纹记为目标声纹;将所述目标声纹对应的声纹类别作为声纹识别结果进行输出,其中所述声纹类别包括负面声纹与非负面声纹。
- 根据权利要求1所述的基于残差网络的报警方法,所述第二图像是所述第二摄像头采集的视频中的一帧图像,所述从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值的步骤,包括:利用预设的人体图像提取方法,从所述第二图像中提取出所述第二对象的人体图像,并从所述人体图像中提取肢体特征;从所述肢体特征中筛选出指向所述第一对象的指定肢体特征;利用所述视频获取所述指定肢体特征存在的时间长度;将所述指定肢体特征和所述指定肢体特征存在的时间长度输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值。
- 一种基于关联对象的报警装置,包括:行人识别结果判断单元,用于获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,从而获得行人识别结果,并判断所述行人识别结果是否与预设的嫌疑对象库中的嫌疑对象相同,其中所述行人识别结果为人体特征;第二图像采集单元,用于若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;微表情识别单元,用于从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,从而得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;声纹识别单元,用于若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;干涉程度计算单元,用于若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;报警单元,用于若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
- 一种计算机设备,包括:一个或多个处理器;存储器;一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机程序配置用于执行一种基于基于残差网络的报警方法,其中,所述基于残差网络的报警方法包括:获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得人体特征,并判断所述行人识别结果与预设的嫌疑对象库中的嫌疑对象是否相同;若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
- 根据权利要求9所述的计算机设备,所述将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得行人识别结果的步骤,包括:将所述第一图像输入预设的训练好的基于残差网络的行人再识别模型中计算,从而获取所述残差网络中的第四个残差块输出的特征图像,其中,所述行人再识别模型基于行人图像,以及与行人图像关联的识别结果的样本数据训练而成,所述残差网络共有五个残差块;将所述特征图像输入所述残差网络中的第五个残差块中计算,从而获得所述第五个残差块输出的主数据;以及并行地将所述特征图像输入所述行人再识别模型中预设的全局识 别子网络中计算,从而获得所述全局识别子网络输出的全局子数据;以及并行地将所述特征图像输入所述行人再识别模型中的预设的局部识别子网络中计算,从而获得所述局部识别子网络输出的局部子数据;将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果。
- 根据权利要求10所述的计算机设备,所述将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据的步骤,包括:通过所述全局识别子网络在所述特征图像中提取指定数据,并判断所述指定数据的数值是否在预设的数值范围之内,其中所述指定数据至少包括人体轮廓、人体肤色或者衣着颜色;若所述指定数据的数值不在预设的数值范围之内,则将所述指定数据作为全局子数据,并输出所述全局子数据。
- 根据权利要求10所述的计算机设备,所述将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果的步骤,包括:采用预设的映射方法,通过所述全连接层将所述主数据、所述全局子数据和所述局部子数据映射为一个固定长度的特征向量;根据预设的分向量与识别结果对应关系,输出所述特征向量中数值最大的分向量对应的识别结果。
- 根据权利要求9所述的计算机设备,所述从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别的步骤之前,包括:获取指定数量的样本数据,并将样本数据分成训练集和测试集;其中,所述样本数据包括人脸图像,以及与所述人脸图像关联的微表情类别;将训练集的样本数据输入到预设的神经网络模型中进行训练,得到初始微表情识别模型,其中,训练的过程中采用随机梯度下降法;利用测试集的样本数据验证所述初始微表情识别模型;若验证通过,则将所述初始微表情识别模型记为所述微表情识别模型。
- 根据权利要求9所述的计算机设备,所述将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果的步骤,包括:将所述声音信息输入预设的声纹识别模型中,并利用所述声纹识别模型解析所述声音信息,从而得到指定声音特征,其中所述指定声音特征至少包括最高语速、最低语速、重音数量和所述声音信息中所有单词数量;将所述指定声音特征映射为多维向量,其中所述多维向量的一个分向量为所述最高语速与最低语速的差值,另一个分向量为所述重音数量与所述声音信息中所有单词数量的比值;计算所述多维向量与预设的标准声纹库中的多个标准声纹对应的多维向量的距离,并将距离最小的标准声纹记为目标声纹;将所述目标声纹对应的声纹类别作为声纹识别结果进行输出,其中所述声纹类别包括负面声纹与非负面声纹。
- 根据权利要求9所述的计算机设备,所述第二图像是所述第二摄像头采集的视频中的一帧图像,所述从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值的步骤,包括:利用预设的人体图像提取方法,从所述第二图像中提取出所述第二对象的人体图像,并从所述人体图像中提取肢体特征;从所述肢体特征中筛选出指向所述第一对象的指定肢体特征;利用所述视频获取所述指定肢体特征存在的时间长度;将所述指定肢体特征和所述指定肢体特征存在的时间长度输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值。
- 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现基于残差网络的报警方法,其中,所述基于残差网络的报警方法包括以下步骤:获取第一摄像头采集的第一图像,并将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得人体特征,并判断所述行人识别结果与预设的嫌疑对象库中的嫌疑对象是否相同;若所述行人识别结果与预设的嫌疑对象库中的嫌疑对象不相同,则获取第二摄像头采集的第二图像,其中所述第二图像至少包括第一对象与第二对象,所述第一对象与所述第二对象之间的距离不大于预设距离,所述第二摄像头的取景范围在所述第一摄像头的取景范围之内;从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别;若所述微表情识别结果不属于预设的恶意微表情列表,则通过预设的麦克风采集所述第一对象的声音信息,并将所述声音信息输入预设的声纹识别模型中,从而得到声纹识别结果,并判断所述声纹识别结果是否为负面声纹,其中所述声纹识别结果包括负面声纹与非负面声纹;若所述声纹识别结果不为负面声纹,则从所述第二图像中提取出所述第二对象的图像信息,并输入预设的干涉程度计算模型中,从而得到所述干涉程度计算模型输出的干涉程度值,并判断所述干涉程度值是否处于预设的干涉数值区间,其中所述干涉数值用于衡量所述第二对象对所述第一对象的干涉程度;若所述干涉程度值不处于预设的干涉数值区间,则执行报警操作。
- 根据权利要求16所述的计算机可读存储介质,所述将所述第一图像输入预设的基于残差网络的行人再识别模型中,获得行人识别结果的步骤,包括:将所述第一图像输入预设的训练好的基于残差网络的行人再识别模型中计算,从而获取所述残差网络中的第四个残差块输出的特征图像,其中,所述行人再识别模型基于行人图像,以及与行人图像关联的识别结果的样本数据训练而成,所述残差网络共有五个残差块;将所述特征图像输入所述残差网络中的第五个残差块中计算,从而获得所述第五个残差块输出的主数据;以及并行地将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据;以及并行地将所述特征图像输入所述行人再识别模型中的预设的局部识别子网络中计算,从而获得所述局部识别子网络输出的局部子数据;将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果。
- 根据权利要求17所述的计算机可读存储介质,所述将所述特征图像输入所述行人再识别模型中预设的全局识别子网络中计算,从而获得所述全局识别子网络输出的全局子数据的步骤,包括:通过所述全局识别子网络在所述特征图像中提取指定数据,并判断所述指定数据的数 值是否在预设的数值范围之内,其中所述指定数据至少包括人体轮廓、人体肤色或者衣着颜色;若所述指定数据的数值不在预设的数值范围之内,则将所述指定数据作为全局子数据,并输出所述全局子数据。
- 根据权利要求17所述的计算机可读存储介质,所述将所述主数据、所述全局子数据和所述局部子数据输入所述行人再识别模型中预设的全连接层中,从而获得所述全连接层输出的行人再识别结果的步骤,包括:采用预设的映射方法,通过所述全连接层将所述主数据、所述全局子数据和所述局部子数据映射为一个固定长度的特征向量;根据预设的分向量与识别结果对应关系,输出所述特征向量中数值最大的分向量对应的识别结果。
- 根据权利要求16所述的计算机可读存储介质,所述从所述第二图像中提取出所述第一对象的图像信息,并输入预设的微表情识别模型中,得到微表情识别结果,并判断所述微表情识别结果是否属于预设的恶意微表情列表,其中所述微表情识别结果为微表情类别的步骤之前,包括:获取指定数量的样本数据,并将样本数据分成训练集和测试集;其中,所述样本数据包括人脸图像,以及与所述人脸图像关联的微表情类别;将训练集的样本数据输入到预设的神经网络模型中进行训练,得到初始微表情识别模型,其中,训练的过程中采用随机梯度下降法;利用测试集的样本数据验证所述初始微表情识别模型;若验证通过,则将所述初始微表情识别模型记为所述微表情识别模型。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910848452.8A CN110765850A (zh) | 2019-09-09 | 2019-09-09 | 基于残差网络的报警方法、装置、计算机设备和存储介质 |
CN201910848452.8 | 2019-09-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021047190A1 true WO2021047190A1 (zh) | 2021-03-18 |
Family
ID=69329640
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/088046 WO2021047190A1 (zh) | 2019-09-09 | 2020-04-30 | 基于残差网络的报警方法、装置、计算机设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110765850A (zh) |
WO (1) | WO2021047190A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117061788A (zh) * | 2023-10-08 | 2023-11-14 | 中国地质大学(武汉) | 一种短视频自动化监管与预警方法、设备及存储设备 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765850A (zh) * | 2019-09-09 | 2020-02-07 | 深圳壹账通智能科技有限公司 | 基于残差网络的报警方法、装置、计算机设备和存储介质 |
CN112101191A (zh) * | 2020-09-11 | 2020-12-18 | 中国平安人寿保险股份有限公司 | 基于边框注意力网络的表情识别方法、装置、设备及介质 |
CN112682919A (zh) * | 2020-12-21 | 2021-04-20 | 珠海格力电器股份有限公司 | 一种空调设备及其设定温度调节系统、方法和存储介质 |
CN113327619B (zh) * | 2021-02-26 | 2022-11-04 | 山东大学 | 一种基于云—边缘协同架构的会议记录方法及系统 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101266704A (zh) * | 2008-04-24 | 2008-09-17 | 张宏志 | 基于人脸识别的atm安全认证与预警方法 |
US20160005050A1 (en) * | 2014-07-03 | 2016-01-07 | Ari Teman | Method and system for authenticating user identity and detecting fraudulent content associated with online activities |
CN106982426A (zh) * | 2017-03-30 | 2017-07-25 | 广东微模式软件股份有限公司 | 一种远程实现旧卡实名制的方法与系统 |
CN107016608A (zh) * | 2017-03-30 | 2017-08-04 | 广东微模式软件股份有限公司 | 一种基于身份信息验证的远程开户方法及系统 |
AU2018100321A4 (en) * | 2018-03-15 | 2018-04-26 | Chen, Jinghan Mr | Person ReID method based on metric learning with hard mining |
CN109063649A (zh) * | 2018-08-03 | 2018-12-21 | 中国矿业大学 | 基于孪生行人对齐残差网络的行人重识别方法 |
GB2566762A (en) * | 2017-09-25 | 2019-03-27 | Thirdeye Labs Ltd | Personal identification across multiple captured images |
CN109977893A (zh) * | 2019-04-01 | 2019-07-05 | 厦门大学 | 基于层次显著性通道学习的深度多任务行人再识别方法 |
CN110765850A (zh) * | 2019-09-09 | 2020-02-07 | 深圳壹账通智能科技有限公司 | 基于残差网络的报警方法、装置、计算机设备和存储介质 |
-
2019
- 2019-09-09 CN CN201910848452.8A patent/CN110765850A/zh active Pending
-
2020
- 2020-04-30 WO PCT/CN2020/088046 patent/WO2021047190A1/zh unknown
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101266704A (zh) * | 2008-04-24 | 2008-09-17 | 张宏志 | 基于人脸识别的atm安全认证与预警方法 |
US20160005050A1 (en) * | 2014-07-03 | 2016-01-07 | Ari Teman | Method and system for authenticating user identity and detecting fraudulent content associated with online activities |
CN106982426A (zh) * | 2017-03-30 | 2017-07-25 | 广东微模式软件股份有限公司 | 一种远程实现旧卡实名制的方法与系统 |
CN107016608A (zh) * | 2017-03-30 | 2017-08-04 | 广东微模式软件股份有限公司 | 一种基于身份信息验证的远程开户方法及系统 |
GB2566762A (en) * | 2017-09-25 | 2019-03-27 | Thirdeye Labs Ltd | Personal identification across multiple captured images |
AU2018100321A4 (en) * | 2018-03-15 | 2018-04-26 | Chen, Jinghan Mr | Person ReID method based on metric learning with hard mining |
CN109063649A (zh) * | 2018-08-03 | 2018-12-21 | 中国矿业大学 | 基于孪生行人对齐残差网络的行人重识别方法 |
CN109977893A (zh) * | 2019-04-01 | 2019-07-05 | 厦门大学 | 基于层次显著性通道学习的深度多任务行人再识别方法 |
CN110765850A (zh) * | 2019-09-09 | 2020-02-07 | 深圳壹账通智能科技有限公司 | 基于残差网络的报警方法、装置、计算机设备和存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117061788A (zh) * | 2023-10-08 | 2023-11-14 | 中国地质大学(武汉) | 一种短视频自动化监管与预警方法、设备及存储设备 |
CN117061788B (zh) * | 2023-10-08 | 2023-12-19 | 中国地质大学(武汉) | 一种短视频自动化监管与预警方法、设备及存储设备 |
Also Published As
Publication number | Publication date |
---|---|
CN110765850A (zh) | 2020-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021047190A1 (zh) | 基于残差网络的报警方法、装置、计算机设备和存储介质 | |
WO2020177380A1 (zh) | 基于短文本的声纹检测方法、装置、设备及存储介质 | |
EP3469582B1 (en) | Neural network-based voiceprint information extraction method and apparatus | |
CN111311327A (zh) | 基于人工智能的服务评价方法、装置、设备及存储介质 | |
CN110378228A (zh) | 面审视频数据处理方法、装置、计算机设备和存储介质 | |
CN110299142B (zh) | 一种基于网络融合的声纹识别方法及装置 | |
CN109461073A (zh) | 智能识别的风险管理方法、装置、计算机设备及存储介质 | |
CN111160275B (zh) | 行人重识别模型训练方法、装置、计算机设备和存储介质 | |
KR102314572B1 (ko) | 인공지능 기반의 언어 능력 평가 방법 및 시스템 | |
CN110543823B (zh) | 基于残差网络的行人再识别方法、装置和计算机设备 | |
CN117198468B (zh) | 基于行为识别和数据分析的干预方案智慧化管理系统 | |
CN108364662A (zh) | 基于成对鉴别任务的语音情感识别方法与系统 | |
CN111401105B (zh) | 一种视频表情识别方法、装置及设备 | |
CN110705428B (zh) | 一种基于脉冲神经网络的脸部年龄识别系统及方法 | |
CN110427881A (zh) | 基于人脸局部区域特征学习的跨数据库微表情识别方法及装置 | |
CN111932056A (zh) | 客服质量评分方法、装置、计算机设备和存储介质 | |
CN115035438A (zh) | 情绪分析方法、装置及电子设备 | |
CN110556098B (zh) | 语音识别结果测试方法、装置、计算机设备和介质 | |
US20140025624A1 (en) | System and method for demographic analytics based on multimodal information | |
CN113869212B (zh) | 多模态活体检测方法、装置、计算机设备及存储介质 | |
CN114639175A (zh) | 考试作弊行为预测方法、装置、设备及存储介质 | |
Neelima et al. | Mimicry voice detection using convolutional neural networks | |
KR100845634B1 (ko) | 데이터 대조 방법, 데이터 대조 장치 및 데이터 대조 프로그램을 기록한 기록 매체 | |
CN114420313A (zh) | 目标对象匹配方法、装置、计算机设备和存储介质 | |
Safie et al. | Deep Learning Evaluation Using Receiver Operating Curve (ROC) for Footprint Biometric Authentication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20862166 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25.07.2022) |