CN110443161A

CN110443161A - Monitoring method based on artificial intelligence under a kind of scene towards bank

Info

Publication number: CN110443161A
Application number: CN201910652598.5A
Authority: CN
Inventors: 何金保; 安鹏
Original assignee: Ningbo University of Technology
Current assignee: Ningbo University of Technology
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2019-11-12
Anticipated expiration: 2039-07-19
Also published as: CN110443161B

Abstract

The present invention provides the monitoring methods based on artificial intelligence under a kind of scene towards bank, in order to improve early warning accuracy, by the way of video image and voice two-stage early warning, video image carries out early warning to suspected target first, then it is directed to suspected target, using speech Separation technology, further confirm that whether suspected target needs early warning.During speech Separation, the optimization object function weighted using space encoding is conducive to the neighbouring relations on seeker's object space, is weighted by space encoding, the reliability of voice flow separation can be improved when coding by the way of Gray code.The present invention realizes simply, meets the needs of practical application.

Description

Monitoring method based on artificial intelligence under a kind of scene towards bank

Technical field

The present invention relates to the monitoring methods based on artificial intelligence under a kind of scene towards bank.

Background technique

Bank be manage deposit, make loans, exchange, the business such as savings, undertake the financial institution of credit intermediary, be state key Safety precaution unit, with scale diversification, financial services equipment is numerous, it is complicated to enter and leave personnel, it is wide etc. to manage coverage Feature.But the miscellaneous criminal activity in recent years, within the scope of financial industry is commonplace.

The Activity recognition of people is always an important research direction of computer vision field, different in detection and identification video Chang Hangwei has become a challenging hot research problem at present.It is traditional that rely primarily on security work person artificial The random emergency event of monitor full time and suspicious event, need a large amount of manpower.Due to the template in video monitoring system Classifier has no idea to construct all people's body posture, so, only by video image detect potential threat have compared with It is big difficult.

Summary of the invention

The shortcomings that in view of the prior art, the present invention propose the monitoring side based on artificial intelligence under a kind of scene towards bank Method, this method are applied under bank's scene, are equipped with video monitoring camera and sound pick-up, according to video and voice signal into Row two-stage early warning, it is characterised in that:

Level-one video early warning: according to video image, suspected target is extracted, the specific steps are as follows:

1. detecting the human body in video based on the Background difference of gauss hybrid models, image background is removed；

2. extracting the target interbehavior feature of video using convolutional neural networks ALexNet for target, behavior is obtained Characteristic probability value；

3. two hidden-layer network area partial objectives for normal behaviour and abnormal behaviour are utilized, to suspected target early warning；

Second level phonetic warning: being directed to suspected target, extracts speaker's voice, the specific steps are as follows:

1. mixing voice is resolved into time frequency unit；

2. marking by pitch tracking and time frequency unit, the pitch contour and corresponding voice flow of input signal are obtained；

3. extracting frequency cepstral coefficient (GFCC) eigenmatrix of mixing voice；

4. dividing region according to video image, encoded using Gray code, the optimization aim of design space coding weighting Function, objective function form are as follows:

Wherein, L is voice flow quantity to be extracted, G_rIt is gray encoding, g is class vector, and F is GFCC feature square Battle array, W_kIt (g) is that kth ties up component in class vector g, F value range is W_k(g), NUM_k(g) and V_kIt (g) is that kth is tieed up in g respectively The element number and mean value of component GFCC eigenmatrix, C are the mean value of GFCC eigenmatrix, (*)^TRepresenting matrix transposition；

5. the voice flow of cluster seeking majorized function maximum value combines；

6. according to voice flow, early warning.

In conclusion the present invention forecasts accuracy to improve alert under bank's scene, using video image and voice two The mode of grade early warning, during voice flow separation, the optimization object function weighted using space encoding uses lattice when coding The mode of thunder code is conducive to the neighbouring relations on seeker's object space, is weighted by space encoding, and voice flow separation can be improved Reliability.Moreover, the present invention does not need to be trained voice data collection acquisition priori knowledge during speech Separation, it is real It is now simple, high reliablity.

Detailed description of the invention

Fig. 1 is the flow chart of the embodiment of the present invention.

Specific embodiment

Illustrate embodiments of the present invention below by way of specific specific example, those skilled in the art can be by this specification Disclosed content is implemented easily.

The present invention proposes that the monitoring method based on artificial intelligence under a kind of scene towards bank, this method are applied in bank Under scene, video monitoring camera and sound pick-up are installed, two-stage early warning is carried out according to video and voice signal, the method is as follows:

Level-one video early warning: according to video image, extracting suspected target, using Ubuntu platform, by OpenCV and The library TensorFlow, the specific steps are as follows:

1. detecting the human body in video based on the Background difference of gauss hybrid models, image background is removed, is utilized The background subtraction of BackgroundSubtrctorMOG2 function realization video image；

2. extracting the target interbehavior feature of video using convolutional neural networks ALexNet for target, behavior is obtained Characteristic probability value.ALexNet includes that 5 convolutional layers and 3 full articulamentums, the output of the last one full articulamentum are sent to In softmax layers, behavioural characteristic probability value is the value of float type.Neural network convolution operation utilizes function conv_2d () It realizes.

3. two hidden-layer network area partial objectives for normal behaviour and abnormal behaviour are utilized, to suspected target early warning.It utilizes TensorFlow deep learning platform realizes two hidden-layer network, completes the realization of two hidden-layer network and training pattern.

1. mixing voice is resolved into time frequency unit.By 64 Gammatone Superimposed Filters at bandpass filter group, The centre frequency equidistantly distributed of each filter, the frequency coverage of entire filter group are 50Hz~5000Hz.Then, with 40ms is frame length, 20ms is that frame moves, and accordingly does time domain sub-frame processing to the filtering of each frequency channel.

2. marking by pitch tracking and time frequency unit, the pitch contour and corresponding voice flow of input signal are obtained.Base Sound tracking uses viterbi algorithm, and fundamental tone observation probability is calculated by the significance of every frame candidate fundamental frequency, fundamental tone transition probability Pitch variation rate by counting voice data set obtains, and probability is the observation probability of first frame in each voiced segments.Base Sound tracking carries out in each voiced segments, finds out an optimal fundamental tone sequence.It is marked, is obtained by pitch tracking and time frequency unit The pitch contour of input signal and voice flow while correspondence.Wherein, while voice flow is indicated with two-value mask, and 1 represents correspondence Time frequency unit is labeled, and 0 indicates unmarked.

3. extracting frequency cepstral coefficient (GFCC) eigenmatrix of mixing voice.Language while by two-value mask and correspondence Sound flows through filter feature unit, obtains by the unit of 1 label, the unit not being labeled is removed.For each frame, by acquisition It is converted by the unit of 1 label by discrete cosine transform operation, ultimately forms the GFCC eigenmatrix of voice signal.

Wherein, L is voice flow quantity to be extracted, G_rIt is gray encoding, g is class vector, and F is GFCC feature square Battle array, W_kIt (g) is that kth ties up component in class vector g, F value range is W_k(g), NUM_k(g) and V_kIt (g) is that kth is tieed up in g respectively The element number and mean value of component GFCC eigenmatrix, C are the mean value of GFCC eigenmatrix, (*)^TRepresenting matrix transposition.

Voice flow quantity L to be extracted is determined according to Gray code adjacency on geometry number, former during gray encoding Then upper each personage corresponds to an individual Gray code, and it is reasonable that this requires image-region to divide.

5. by the method for exhaustion, the voice flow combination of cluster seeking majorized function maximum value.System starts first to choose at random L unit in choosing while voice flow, is assigned in L classification, is then ranked up to the voice flow unit not being selected.

6. according to voice flow, early warning.

In conclusion the present invention forecasts accuracy to improve alert under bank's scene, using video image and voice two The mode of grade early warning, during voice flow separation, the optimization object function weighted using space encoding uses lattice when coding The mode of thunder code is conducive to the neighbouring relations on seeker's object space, is weighted by space encoding, and voice flow separation can be improved Reliability.Moreover, the present invention does not need to be trained voice data collection acquisition priori knowledge during speech Separation, it is real It is now simple, high reliablity.The present invention effectively overcomes various shortcoming in the prior art and has height application value.

Claims

1. the monitoring method based on artificial intelligence under a kind of scene towards bank, this method is applied under bank's scene, installation There are video monitoring camera and sound pick-up, two-stage early warning carried out according to video and voice signal, it is characterised in that:

2. extracting the target interbehavior feature of video using convolutional neural networks ALexNet for target, behavioural characteristic is obtained Probability value；

1. mixing voice is resolved into time frequency unit；

3. extracting the frequency cepstral coefficient eigenmatrix of mixing voice；

4. dividing region according to video image, encoded using Gray code, the optimization object function of design space coding weighting, Objective function form are as follows:

Wherein, L is voice flow quantity to be extracted, G_rIt is gray encoding, g is class vector, and F is frequency cepstral coefficient feature Matrix, W_kIt (g) is that kth ties up component in class vector g, F value range is W_k(g), NUM_k(g) and V_kIt (g) is that kth is tieed up in g respectively The element number and mean value of component frequencies cepstrum coefficient eigenmatrix, C are the mean value of frequency cepstral coefficient eigenmatrix, (*)^T Representing matrix transposition；

6. according to voice flow, early warning.