CN108921830A

CN108921830A - A kind of demographic method based on image retrieval

Info

Publication number: CN108921830A
Application number: CN201810639977.6A
Authority: CN
Inventors: 吕学强; 张鑫; 高五峰
Original assignee: CHINA FILM SCIENCE AND TECHNOLOGY INST; Beijing Information Science and Technology University
Current assignee: CHINA FILM SCIENCE AND TECHNOLOGY INST; Beijing Information Science and Technology University
Priority date: 2018-06-21
Filing date: 2018-06-21
Publication date: 2018-11-30

Abstract

The present invention relates to a kind of demographic methods based on image retrieval, including：Original image is divided into multiple different scale subimage blocks with identical transparent effect；Room for improvement pyramid pond network model uses training data training improved spatial pyramid pond network model；The feature of different sized images is extracted using improved spatial pyramid pond network；Calculate testing image between known image at a distance from, find apart from nearest image, obtain its label, obtain the number of subimage block to be measured；The different subimage block numbers that testing image is divided add up.Demographic method provided by the invention based on image retrieval, traditional spatial pyramid pond network model is improved, using improved spatial pyramid pond network model, avoid characteristic loss caused by picture size normalization, number recognition accuracy is high, can meet the needs of practical application well.

Description

A kind of demographic method based on image retrieval

Technical field

The present invention relates to a kind of demographic methods based on image retrieval.

Background technique

Crowd in monitor video counts automatically important researching value and social application prospect, utilizes artificial intelligence The number information for obtaining scene can not only instruct for the security protection of public domain, can also save a large amount of manpower object Power.The problems such as video camera transparent effect, image background, crowd density are unevenly distributed at present restricts the development that crowd counts research And application, therefore the research and discovery of people counting algorithm has extremely important value.

In recent years, with the development of computer vision technique, a large amount of number detection algorithm is by each accomplished expert, Xue Zheti Out.These algorithms are broadly divided into direct method and indirect method according to the difference of test object, and the test object of direct method is generally Individual, such as shape information, header information, the head and shoulder information, motion information of human body, when the crowd is dense, due between individual It blocks, such methods tend not to obtain good effect, human body occlusion issue need to be solved by adhesion human body segmentation's technology, and The cutting techniques of adhesion human body under complex scene are at present also and immature, are capable of providing limited to the support of direct Detection Method. The test object of indirect method is mostly group, such as textural characteristics, pixel characteristic, the corner feature of crowd, indirect method pass through analysis Population characteristic establishes the corresponding relationship between group's number and population characteristic, in recent years, and has scholar according to indirect method detection image Indirect method has been divided into global indirect method and local indirect method by the different of scale, global indirect method using each frame in video as Global characteristics of image detects in digit, and original image is carried out piecemeal by local indirect method, and video camera perspective is considered in piecemeal Influence, original image is divided into multiple subimage blocks with identical transparent effect, detects the characteristics of image of subgraph, is established and people The statistical result of several corresponding relationships, the subimage block that finally adds up obtains total number of persons；However, expression of the indirect method by feature Ability, crowd such as block at the serious restriction of factors.Demographics are carried out using above method, obtained result number identification is accurate Rate is too low, cannot meet the needs of practical application well.

Summary of the invention

For above-mentioned problems of the prior art, it can avoid above-mentioned skill occur the purpose of the present invention is to provide one kind The demographic method based on image retrieval of art defect.

In order to achieve the above-mentioned object of the invention, technical solution provided by the invention is as follows：

A kind of demographic method based on image retrieval, includes the following steps：

Original image is divided into multiple different scale subimage blocks with identical transparent effect by step 1)；

Step 2) room for improvement pyramid pond network model uses the improved spatial pyramid pond of training data training Network model；

Step 3) extracts the feature of different sized images using improved spatial pyramid pond network；

Step 4) calculate testing image between known image at a distance from, find apart from nearest image, obtain its label, obtain To the number of subimage block to be measured；

The different subimage block numbers that step 5) is divided testing image add up.

Further, in the step 2), the step of improving to spatial pyramid pond network model, includes：It protects Original window set-up mode is stayed, original step sizes Provisioning Policy is changed, so that the value of a ' and a is closest, so that Characteristic loss caused by the layer is reduced to minimum；

Wherein, in the network model of former spatial pyramid pond, the size that upper layer exports dimension is a*a, obtained pond Change result sizes are n*n；

The selection strategy formula of step-length strides is：

The definition of a ' is a=window+strides × (n-1), and window represents moving window.

Further, as a ' > a, Boundary filling is carried out with 0 pair of a '-a layer former feature vector；As a ' < a, give up original A-a ' the row at feature vector end, a-a ' column feature.

Further, to former feature vector carry out Boundary filling when, such as need number of fillers be even number when, then will be to be filled A '-a layer 0 be evenly distributed on before and after former feature vector；When such as to need number of fillers be odd number, using it is preceding it is few after more principles fill out It fills, filling principle formula is：

pad_Afterwards=a '-a-pad_Before。

Further, image is operated into the high-rise characteristics of image extracted by multilayer convolution pondization, by connecting entirely Layer carries out Feature Dimension Reduction, extracts the feature that image is exported in this layer.

Further, in step 4), using Euclidean distance function calculate testing image between known image at a distance from, Europe The defined formula of formula distance function is：

Wherein, Euclidean distance of the d between two vectors, a, b are respectively two feature vectors to be asked, and T is to seek matrix transposition.

Demographic method provided by the invention based on image retrieval, to traditional spatial pyramid pond network model It is improved, using improved spatial pyramid pond network model, avoids spy caused by picture size normalization Sign loss, while traditional spatial pyramid pond network model is analyzed, it is made in spatial pyramid pond layer At characteristics of image loss improved, using the present invention in more fixed fields of personnel positions such as similar cinema, classrooms The accuracy rate that scape carries out number identification can achieve 98% or more, and to the number recognition accuracy of sparse crowd and dense population It is all very high, it is all very helpful to classroom, the meeting rate of attendance, movie theatre box office statistics etc., number recognition accuracy is high, can be very Meet the needs of practical application well.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is the scene photo of the auditorium in movie theatre under viewing state；

Fig. 3 is the figure carried out after image block to Fig. 2；

Fig. 4 is the spatial pyramid pond network model figure of the prior art；

Fig. 5 is Boundary filling method schematic diagram；

Fig. 6 is that method schematic diagram is given up on boundary.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawing and specific implementation The present invention will be further described for example.It should be appreciated that described herein, specific examples are only used to explain the present invention, and does not have to It is of the invention in limiting.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

As shown in Figure 1, a kind of demographic method based on image retrieval, includes the following steps：

Step 4) using distance function calculate testing image between known image at a distance from, find apart from nearest image, Its label is obtained, the number of subimage block to be measured is obtained；

The different subimage block numbers that step 5) is divided testing image add up to arrive testing image number.

It is illustrated in figure 2 the scene photo of the auditorium in movie theatre under viewing state, to count concrete scene is in movie theatre Seat can be carried out scene areas piecemeal as natural object of reference by the number under viewing state.The seat size in actual scene It is all equally big, but by after video camera imaging, foring apparent near big and far smaller phenomenon.Therefore prosposition of fetching water only is needed Set identical, the seat of identical quantity is as subimage block, so that it may solve video camera perspective well in image segmentation process Influence.The process handled the number in Fig. 2 is as described below.

Take the region 2*2 in space as subimage block, by original image segmentation at 20 subimage blocks, image block As a result as shown in Figure 3.Since the present embodiment institute's research contents is the demographics under viewing state, reference number is only studied 20 regions, ignore other scattered regions.So, this 20 regions 1-20 are all the region comprising 4 seat sizes, It solves since video camera perspective bring influences.

Expressing preferable feature to crowd's number at present has pixel characteristic, corner feature etc., but good feature extraction More rely on the foreground segmentation of image.The studied image of the present embodiment is infrared image, and infrared image signal-to-noise ratio is low and easy production The third contact of a total solar or lunar eclipse is dizzy, more sensitive to the light and shade variation of ambient enviroment, therefore in foreground extraction to achieving the effect that relatively good needs A series of pretreatment is carried out to reduce interference, therefore, the present embodiment passes through multilayer convolution-pond by convolutional neural networks Operation extracts the high-level characteristic of image in the case where not depending on foreground extraction, and high-level characteristic differentiates compared to low-level image feature Ability and robustness are stronger.

Traditional convolutional neural networks cause model that can only receive since full articulamentum can only receive fixed-size input Fixed-size input picture.The input picture of different scale can only be generally handled by the method for dimension normalization, however This method will lead to the loss of image information, to solve this problem, spatial pyramid pond (Spatial pyramid Pooling, SPP) method is suggested, and SPP makes various sizes of image that can be converted into fixation after through pond layer The feature vector of size.The convolutional neural networks of spatial pyramid pond layer have been used to be commonly known as spatial pyramid pond net Network, network model are as shown in Figure 4.

Wherein, the pondization that spatial pyramid pond layer carries out different scale to the feature vector after convolution operates, will be every The characteristic value that secondary pondization operation obtains forms the feature vector of a regular length after being combined, such as spatial in Fig. 4 Shown in pyramid pooling layer, using different size of scale, original image has successively from left to right been divided into 16,4,1 A sub- characteristic pattern, then traditional pondization operation is carried out to each subcharacter figure respectively, the Chi Huajie of each subcharacter figure can be obtained Fruit is sequentially spliced to obtain the feature vector after pondization operation, so, the spy no matter obtained after convolution operation Sign vector be it is great, by one group of fixed length feature vector, feature vector in Fig. 4 can be obtained after spatial pyramid convolution One 21 (16+4+1) dimensional feature vector is obtained after by spatial pyramid pond.

Although spatial pyramid pond network solves input picture scale inconsistence problems, but specifically carrying out pond When operation, due to the complexity of spatial pyramid pond layer, moving window size, step-length, selected can for pond mode in layer Characteristic value loss can be will lead to.

Traditional convolutional neural networks pond size is mostly 2*2, no matter is carried out using which kind of mode in the zonule of 2*2 Activation value is chosen lost feature and is all limited, but when the multiple dimensioned pond of progress spatial pyramid, if Chi Huaqu The excessive either maximum value pond method in domain or mean value pond method can not all describe the feature in this big region well, because The limitation when carrying out the operation of spatial pyramid pondization of this present embodiment enters the dimension of spatial pyramid pond layer input feature value Degree should not be too big, and using the strategy of feature combination, maximum value pondization is respectively adopted and mean value Chi Hualiangzhongchiization mode is distinguished Activation value is carried out to pond domain to choose to obtain two groups of feature vectors, obtained feature vector is carried out horizontally-spliced to carry out spy Sign combination, joint indicate the characteristics of image of this layer output to reduce characteristic loss.

In traditional spatial pyramid pond method, it is assumed that the size that upper layer exports dimension is a*a, needs to obtain n*n big Moving window window as a result, be usually sized to by small pondStep-length strides is WithRespectively It rounds up and downward floor operation.Such setting equally will cause biggish characteristic loss under certain conditions, it is assumed that on Layer output dimension is 59*59, needs to obtain the pond of 10*10 size as a result, then traditionally, moving window is sized to 6*6, step-length 5*5, so, it is only necessary to be moved to the 21st row and the 21st column are obtained with required 10*10 size Pond as a result, rear 9 row and it is rear 9 column feature be not calculated in, thus can bring biggish characteristic loss.To by this The characteristic loss that kind of situation causes is reduced to minimum, the design principle of window and step-length be so that the value of a ' and the value of a are closest, Shown in the definition of a ' such as formula (1).Therefore, the present invention retains original window set-up mode, changes original step sizes and sets Strategy is set, so that the value of a ' and a is closest, so that characteristic loss caused by the layer is reduced to minimum.Step-length strides Selection strategy such as formula (2) shown in.

A '=window+strides × (n-1) (1)；

A ' is not of uniform size with a, it is necessary to handle the boundary of upper layer output feature.As a ' > a, with a '-a layers 0 pair of former feature vector carries out Boundary filling, so that the ranks number of a is equal with the ranks number of a ', as shown in figure 5, such as needing to fill When digit is even number, before and after a '-a layer 0 to be filled is evenly distributed on former feature vector, such as needing number of fillers is odd number When, it is filled using preceding few rear more principles, shown in filling principle such as formula (3), formula (4).As a ' < a, give up former feature to A-a ' row, a-a ' the column feature for measuring end, again such that the ranks number of a is equal with the ranks number of a ', as shown in Figure 6.

pad_Afterwards=a '-a-pad_Before(4)；

Fig. 5, Fig. 6 show the feature vector of 13 dimensions, are in the case that 6 moving step lengths are 5, to adopt respectively in window size With different BORDER PROCESSING strategy schematic diagrames.

The detailed construction of improved spatial pyramid pond network model is as shown in table 1.

Table 1

Wherein, the meaning of [47*60-69*130] is that input dimension size is differed from 47*60 to 69*130,Contain Justice is the operation that rounds up to w/1, and the scale that the spatial pyramid pond layer in model uses is respectively 1,5,10, by original image Maximum value pond method and mean value pond are utilized respectively in each subcharacter figure as being divided into 1,25,100 sub- characteristic pattern Method obtains two sampling activation values, and the feature vector so inputted obtains one 252 dimension after by this layer Output vector.Using the training data training improved spatial pyramid pond network largely marked, it is optimal model. Improved network reduces the characteristic loss in Chi Huashi, solves the characteristic loss caused when picture size normalization.

Image is operated into the high-rise characteristics of image extracted by multilayer convolution pondization, carries out feature by full articulamentum Dimensionality reduction, extracts the feature that image is exported in this layer, is 84 dimension characteristics of image in the present embodiment.

Compare the similitude between two string feature vectors using Euclidean distance, Euclidean distance definition is as shown in formula (5).Away from From small, similitude is big.Be ranked up according to its similarity size, take label corresponding to the highest picture of similarity be to Examine picture number.

Crowd's counter model that the present embodiment proposes is a system end to end, and the model is directly using picture frame as defeated Enter, after image block, different subimage blocks respectively obtain phase therewith by the retrieval of spatial pyramid pond network model Like a highest image is spent, label corresponding to image is each subgraph number, the number conduct for each subgraph that finally adds up The output of the frame image.Training data 1-20 work song image block is integrated into 20 differences according to serial number in training by the model Inlet flow, when training, successively carries out, by using the trained model of image of former scale as a kind of lower scale training The mode of the pre-training model of model shares training parameter, and this method compensates for that each scale image block training data is insufficient to ask Topic, while accelerating the speed of models fitting.

Embodiments of the present invention above described embodiment only expresses, the description thereof is more specific and detailed, but can not Therefore limitations on the scope of the patent of the present invention are interpreted as.It should be pointed out that for those of ordinary skill in the art, Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection model of the invention It encloses.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of demographic method based on image retrieval, which is characterized in that include the following steps：

Step 2) room for improvement pyramid pond network model uses training data training improved spatial pyramid pond network Model；

Step 4) calculate testing image between known image at a distance from, find apart from nearest image, obtain its label, obtain to Survey the number of subimage block；

2. demographic method according to claim 1, which is characterized in that in the step 2), to spatial pyramid The step of pond network model improves include：Retain original window set-up mode, changes original step sizes setting Strategy, so that the value of a ' and a is closest, so that characteristic loss caused by the layer is reduced to minimum；

Wherein, in the network model of former spatial pyramid pond, the size that upper layer exports dimension is a*a, obtained Chi Huajie Fruit size is n*n；

The selection strategy formula of step-length strides is：

The definition of a ' is a '=window+strides × (n-1), and window represents moving window.

3. demographic method according to claim 1 to 2, which is characterized in that former special with 0 pair of a '-a layer as a ' > a It levies vector and carries out Boundary filling；As a ' < a, give up a-a ' row, a-a ' the column feature at former feature vector end.

4. demographic method according to claim 1 to 3, which is characterized in that carry out Boundary filling to former feature vector When, when such as to need number of fillers be even number, then a '-a layer 0 to be filled is evenly distributed on former feature vector front and back；As needed When number of fillers is odd number, filled using preceding few rear more principles, filling principle formula is：

pad_Afterwards=a '-a-pad_Before。

5. demographic method described in -4 according to claim 1, which is characterized in that operate image by multilayer convolution pondization The high-rise characteristics of image extracted carries out Feature Dimension Reduction by full articulamentum, extracts the feature that image is exported in this layer.

6. demographic method described in -5 according to claim 1, which is characterized in that in step 4), use Euclidean distance letter Number calculate testing images between known image at a distance from, the defined formula of Euclidean distance function is：