CN110399895A

CN110399895A - The method and apparatus of image recognition

Info

Publication number: CN110399895A
Application number: CN201910235745.9A
Authority: CN
Inventors: 宋晓东
Original assignee: Shanghai Hao Ling Technology Co Ltd
Current assignee: Shanghai Hao Ling Technology Co Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2019-11-01

Abstract

Image-recognizing method and device are proposed, technical field of image processing is belonged to.This method comprises: obtaining images to be recognized, and invocation target image recognition model；The images to be recognized is identified using the target image identification model, obtains recognition result；Training obtains the target image identification model in the following way: obtaining pre-training model parameter file, it parses the pre-training model parameter file and obtains the parameter of the pre-training model parameter file, the parameter of the pre-training model parameter file includes at least candidate frame parameter；Initialization process is carried out to the parameter of the pre-training model parameter file；It is trained using the parameters on target image library after initialization process, obtains the target image identification model.This programme is initialized using parameter of the data set to be applied to pre-training model, and the speed and accuracy of image procossing can be improved.

Description

The method and apparatus of image recognition

Technical field

The present invention relates to technical field of image processing more particularly to a kind of method and apparatus of image recognition.

Background technique

In recent years, widely available with digital image apparatus, the quantity of digital picture is more and more, and user is often desirable to Required image is retrieved from immense image library.Retrieval mode includes text search and picture search.Text search is benefit Keyword match, which is carried out, with the relevant verbal description of image obtains required image, and picture search is carried out to the image of input It analyzes and retrieves matched image from database.Likewise, image recognition is also a kind of common requirement, such as according to face Picture search personal information etc..A variety of image retrievals, identification technology exist in the prior art, for example pass through neural network, depth The artificial intelligence technologys such as study.

The calculation power that the fast development of artificial intelligence has benefited from hardware device is promoted and growing data volume, as artificial The leader Google company of smart field has increased income Tensorflow deep learning frame.But application is done with deep learning Landing needs very big cost, such as mass data amount demand and higher calculating device hardware demand, so by deep learning Being applied to different scenes can not start from scratch, and carrying out transfer learning using pre-training model is that a comparison is efficient and feasible Scheme, so-called transfer learning be exactly trained on huge data set using one come Effective model parameter is carried out it is initial Change, and applies in new scene.Pre-training model has powerful ability in feature extraction, calculates in corresponding convolutional neural networks In method initiation parameter and with new data set come training pattern, trained time cost can be shortened and obtain better effect Fruit, but, the data set of pre-training model and the data set of new scene are different, lead to not directly apply pre-training model In the data set of new scene.

Summary of the invention

It is an object of the invention to overcome the above-mentioned problems in the prior art, provide a kind of image recognition method and Device.

In order to achieve the above object, a kind of method that the present invention proposes image recognition, which comprises

Obtain images to be recognized, and invocation target image recognition model；

The images to be recognized is identified using the target image identification model, obtains recognition result；

Training obtains the target image identification model in the following way:

Pre-training model parameter file is obtained, the pre-training model parameter file is parsed and obtains the pre-training model ginseng The parameter of the parameter of number file, the pre-training model parameter file includes at least candidate frame parameter；

Initialization process is carried out to the parameter of the pre-training model；

It is trained using the parameters on target image library after initialization process, obtains the target image identification mould Type.

Optionally, the parameter to the pre-training model parameter file carries out initialization process, comprising:

Delete and/or modify the parameter of the pre-training model parameter file.

Optionally, the parameter of the pre-training model parameter file further includes convolutional layer weight parameter, described in the modification The parameter of pre-training model parameter file, comprising:

Modify the parameter value of the convolutional layer weight parameter；And/or

Modify the parameter value of the candidate frame parameter.

Optionally, the parameter value of the modification candidate frame parameter, comprising:

It is clustered using parameter value of the target image library to the candidate frame parameter, obtains cluster result；

Using each cluster centre in the cluster result as the modified parameter value of candidate frame parameter.

Optionally, the parameters on target image library using after initialization process is trained, and obtains the mesh Logo image identification model, comprising:

Using the parameter after the initialization process as the initiation parameter of model training, depth convolutional Neural is used Network and goal regression network carry out model training to the target image library；Wherein, depth convolutional neural networks the last one Input of the output of convolutional layer as goal regression network, forms feature pyramid, the pyramidal multiple convolutional layers of feature With the candidate frame parameter.

Optionally, described that model is carried out to the target image library using depth convolutional neural networks and goal regression network Training, further includes:

It is updated using parameter of the moving average model to each convolutional layer in the depth convolutional neural networks.

The parameter is optimized using at least one optimizer.

Optionally, described that the images to be recognized is identified using the target image identification model, it is identified As a result before, further includes:

Image enhancement processing is carried out to the images to be recognized, and extracts object-image region from the images to be recognized Domain.

Optionally, described that the images to be recognized is identified using the target image identification model, it is identified As a result, comprising:

Characteristic vector pickup is carried out to the object region of the images to be recognized；

Using described eigenvector as the input of the target image identification model, pass through the target image identification model Obtain recognition result.

The embodiment of the present invention also provides a kind of device of image recognition, and described device includes:

Image collection module, for obtaining images to be recognized；

Model calling module is used for invocation target image recognition model；

Picture recognition module is obtained for being identified using the target image identification model to the images to be recognized To recognition result；

Model training module parses the pre-training model parameter file and obtains for obtaining pre-training model parameter file Parameter to the parameter of the pre-training model parameter file, the pre-training model parameter file is joined including at least candidate frame Number；Initialization process is carried out to the parameter of the pre-training model；Utilize the parameters on target image after initialization process Library is trained, and obtains the target image identification model.

Optionally, in order to which the parameter to the pre-training model parameter file carries out initialization process, the model training Module is used for:

Delete and/or modify the parameter of the pre-training model parameter file.

Optionally, the parameter of the pre-training model parameter file further includes convolutional layer weight parameter, described in order to modify The parameter of pre-training model parameter file, the model training module are used for:

Modify the parameter value of the convolutional layer weight parameter；And/or

Modify the parameter value of the candidate frame parameter.

Optionally, in order to modify the parameter value of the candidate frame parameter, the model training module is used for:

Optionally, in order to be trained using the parameters on target image library after initialization process, the mesh is obtained Logo image identification model, the model training module are used for:

Optionally, in order to use depth convolutional neural networks and goal regression network to the target image library carry out model Training, the model training module are also used to:

The parameter is optimized using at least one optimizer.

Optionally, described image identification module is also used to: using the target image identification model to the figure to be identified As being identified, before obtaining recognition result, image enhancement processing carried out to the images to be recognized, and from the figure to be identified Object region is extracted as in.

Optionally, it in order to be identified using the target image identification model to the images to be recognized, is identified As a result, described image identification module is used for:

The present invention used target image identification model when the image to new scene identifies, is to pre-training mould Training obtains after the parameter of type is initialized, and especially to candidate frame parameter, by initialization, it is made to be adapted to new scene Image library improves the accuracy of identification and training effectiveness of target image identification model.

Further, consider application environment to be applied during initialization, and use moving average model and excellent Change device, to improve candidate frame accuracy of gauge in the pyramidal different layers of feature generated and generate candidate frame Speed, to correspondingly increase the speed and accuracy of image procossing.

Detailed description of the invention

Fig. 1 shows the form of expression of a Zhang San channel picture in a computer；

Fig. 2 shows the calculating processes of convolution on a channel；

Fig. 3 shows the trellis diagram of 1 × 1 × 3 convolution kernel；

Fig. 4 shows the candidate frame of SSD detection network and the relative position of true frame；

Fig. 5 shows the flow diagram of target image identification model training proposed by the present invention；

Fig. 6 shows the flow diagram of image-recognizing method proposed by the present invention；

Fig. 7 shows the block diagram of pattern recognition device proposed by the present invention.

Specific embodiment

As described below is preferable embodiment of the invention, is not intended to limit the scope of the present invention.

As mentioned hereinabove, it is desirable to be initialized by the parameter to pre-training model to adapt to new scene, Jin Erxun The target image identification model for getting new scene is carried out using images to be recognized of the target image identification model to new scene Identification.Specifically, adapting to specific image procossing scene to realize the identification to image, therefore, first to pre-training mould The parameter initialization of type and the training method of target image identification model are introduced.

As shown in figure 5, proposing the process of target image identification method according to an embodiment of the invention.

Step 501 obtains pre-training model parameter file, parses the pre-training model parameter file and obtains pre-training model The parameter of the parameter of Parameter File, pre-training model parameter file includes at least candidate frame parameter.

As described above, pre-training model is that the Effective model come, the data set are trained on huge data set COCO database or other sample data sets can be used, however since these databases are not the data set of new scene, thus When being applied to new scene, the parameter of pre-training model may be not appropriate for the new scene.Therefore, in step 701, first Parsing obtains the parameter in pre-training model parameter file, to initialize to it.

Wherein, there are many implementations of analytic parameter file, for example, by way of Keywords matching, by parameter name Referred to as keyword is matched, parameter name and corresponding parameter value in extracting parameter file.Correspondingly, parameter packet Include parameter name and parameter value.

Step 502 carries out initialization process to the parameter of above-mentioned pre-training model parameter file.

By initiation parameter, especially initialization candidate frame parameter makes that it is suitable for new scenes.

Wherein, to parameter carry out initialization process, can with but be not limited only to be the ginseng in pre-training model parameter file Number is accepted or rejected and/or is modified, that is, parameter therein is deleted and/or modify, to adapt to new scene.

Step 503 is trained using the parameters on target image library after initialization process, obtains target image identification mould Type.

In the embodiment of the present invention, target image library refers to the corresponding image library of target new scene, target image identification model Refer to the corresponding image recognition model of target new scene.

In the embodiment of the present invention, in Parameter Initialization procedure, modification is taken to operate candidate frame parameter.Modification is candidate There are many implementations of frame parameter, for example and without limitation, enumerates one of modification mode below: utilizing above-mentioned target Image library clusters the parameter value of candidate frame parameter, obtains cluster result；Each cluster centre in cluster result is made For the modified parameter value of candidate frame parameter.More specifically, can by but be not limited only to K-means cluster in a manner of to target figure As the size (as wide high) of the candidate frame in library is clustered to obtain mean value (i.e. cluster centre), there are several different dimension combinations will Several groups of high candidate frames of difference width are obtained, using obtained multiple groups candidate frame as the parameter value alternative parameter of modified candidate frame Original parameter value in file.

In the embodiment of the present invention, the parameter of pre-training model parameter file can also include convolutional layer weight parameter, accordingly , modification operation is initialized as the parameter.

For carrying out the scene of model training based on Tensorflow, it is pre- to parse that Tensorflow provides relevant interface Training pattern Parameter File extracts convolutional layer weight parameter and candidate frame parameter therein and modifies；Wherein, using above-mentioned poly- Class algorithm obtains the size and length-width ratio of modified candidate frame, and modified candidate frame parameter is substituted into model training and is used Configuration file in.

In above-mentioned any means embodiment, there are many implementations of above-mentioned steps 703.A kind of implementation wherein In, using the above-mentioned parameter after initialization process as the initiation parameter of model training, use depth convolutional neural networks and mesh It marks Recurrent networks and model training is carried out to target image library；Wherein, the output of the last one convolutional layer of depth convolutional neural networks As the input of goal regression network, feature pyramid is formed, the pyramidal multiple convolutional layers of feature have the candidate Frame parameter.

According to one embodiment, depth convolutional neural networks used in model training are Mobile-netV2, and target is returned Returning network is SSD.Wherein, in one embodiment, it is 300 × 300 × 3 that Mobile-netV2, which inputs the size of picture, altogether Including 19 convolutional layers, 15 times of down-sampling, i.e. the final abstract characteristics layer of original image is 19 (about 20), wherein 15 convolutional layers are all It applies depth and separates convolution, the last layer of Mobile-netV2 and layer second from the bottom output are used as SSD goal regression net Network be detect network input, form a feature pyramid, this feature pyramid altogether there are six convolutional layer ([19,19], [10,10],[5,5],[3,3],[2,2],[1,1]).The pyramidal multiple convolutional layers of this feature are respectively intended to detection different size Object, have the candidate frame of initializing set in this six detection layers, the value of this candidate frame is according in target image library What the callout box of different objects was initialized with K-means clustering algorithm.

Convolution process is illustrated by Fig. 1-3.Fig. 1 is the form of expression of a Zhang San channel picture in a computer.Fig. 2 be The calculating process of convolution on one channel, one 3 × 3 convolution kernel slides on one 5 × 5 characteristic pattern in Fig. 2,3 × 3 Convolution nuclear parameter can be expressed as W₁₁,W₁₂,W₁₃,W₂₁,W₂₂,W₂₃,W₃₁,W₃₂,W₃₃, new value after convolution is complete are as follows:

X₁₁=W₁₁*X₁₁+W₁₂*X₁₂+W₁₃*X₁₃+W₂₁*X₂₁+W₂₂*X₂₂+W₂₃*X₂₃+W₃₁*X₃₁+W₃₂*X₃₂+W₃₃*X₃₃

It is indicated to be exactly Y=W with a linear representation^TX, W^TConvolution kernel parameter matrix, X represent pixel matrix, this The process that single channel convolution does convolution algorithm on a channel, if it is triple channel, then the size of a convolution kernel be 3 × 3 × 3, then the number of parameters of convolution kernel is exactly 3 × 3 × 3=27, the parameter of this convolution kernel is shared in this channel , the value of the pixel on finally formed characteristic pattern is then three channels while doing convolution and forming result weighted sum newly Pixel.The output channel of each convolution kernel represents a kind of feature.Fig. 3 shows the trellis diagram of 1 × 1 × 3 convolution kernel. Three layers of top represent the different value of three color channels on characteristic pattern in Fig. 3, and bottom represents new characteristic pattern, a length For the 3 new characteristic pattern that is formed to pixel value dot product weighted sum of vector, dimensionality reduction indicates 21 × 1 × 3 convolution kernels, rises Dimension represents the convolution kernel for having 41 × 1 × 3.Dimensionality reduction and liter dimension are substantially the mistakes of across channel carry out Fusion Features and information fusion Journey.

Illustrate that SSD detects network by Fig. 4.Fig. 4 shows the candidate frame of SSD detection network and the opposite position of true frame It sets, different length-width ratios and various sizes of candidate frame of the dotted portion for Initialize installation, on this 8 × 8 characteristic pattern, often For one grid all there are four types of different candidate frames, red block is relative position of the true frame on 8 × 8 this characteristic pattern, is needed Find out the regressive object with true frame degree of overlapping (the intersection union ratio of two frames) that highest candidate frame as prediction block Frame, t_x=(x_center-x_{center_a})/w_a, t_y=(y_center-y_{center_a})/h_a, t_w=log (w/w_a), t_h=log (h/h_a), w_aWith h_aFor the length and width of candidate frame, x_{center_a}And y_{center_a}For the centre coordinate of candidate frame, w and h are the width height for detecting network output, x_centerAnd y_centerFor the prediction coordinate value that detection network exports, the only prediction prediction block of detection network actual prediction and candidate Frame coordinate shift amount and length-width ratio, finally obtained t_x,t_y,t_h,t_wCoordinate information for the frame for needing to export, w=tf.exp (t_w)*w_a, h=tf.exp (t_h)*h_a, y_center=t_y*h_a+y_{center_a}, x_center=t_x*w_a+x_{center_a}It, will when for model prediction Prediction coordinate is converted to true coordinate, then is scaled to original image size to get final visual box information is arrived.

Wherein, optionally, the parameter of each convolutional layer in depth convolutional neural networks is updated using moving average model, The especially parameter of candidate frame.

Shadow variable=decay × shadow_variable+ (I-decay) × variable

Above-mentioned formula is the more new formula of moving average model variable, and variable is the parameter of convolution kernel, decay actually For rate of decay, the calculation formula that general initialization is set as 0.9, decay be min init_decay, (1+num_update)/ (10+num_update) }, wherein init_decay is the initial attenuation rate of setting, and num_update is model parameter update time Number, it can be seen that, with the increase of num_update update times, (1+num_update)/(10+num_update this Calculated result closer to 1, wherein shadow_variable is the numerical value before variable update, after variable is variable update Numerical value, if x1 is shadow variable, x is variable, if decay at this time is equal to 0.5, updated x1 value is 0.5*0+ (1-0.5) * 1=0.5, by above formula it can be found that with model the number of iterations increase, (1+num_update)/(10 + num_update) this calculated result is closer to 1, that is, (1-decay) * variable is closer to 0, mould at this time Shape parameter amplitude of variation reduces, that is, shadow_variable==decay*shadow_variable equation is more set up. The update amplitude of variable is controlled by smooth averaging model so that model training just period parameters update it is very fast, close to optimal Parameter updates slower at value, and amplitude is smaller.Mainly the update amplitude of variable is controlled by constantly updating attenuation rate.Only instruct It is main still to use original variable when white silk, model prediction (when running model), if setting uses moving average model, Shadow variable can be used to carry out substitute variable as parameter, otherwise or use original variable.

Specifically, variable is initial parameter, and the parameter of moving average model can replace with shadow_ Variable, above-mentioned formula express the relationship of shadow_variable and variable, all convolutional layers (such as above The each convolutional layer for the depth convolutional neural networks mentioned) parameter all can replace with new value by such conversion；Return net Parameter in network and detection layers is same, and the value of the output of last several convolutional layers will be sent in detection layers, with corresponding inspection Method of determining and calculating obtains final result, and in each iterative calculation, predicting candidate frame coordinate value can all update Recurrent networks.

Wherein, optionally, it is optimized using training process of the optimizer to pre-training model.

In optimizer selection, according to one embodiment, pre-training model uses batch- MomentumOptimizer (batch gradient descent method), and in the case where Sparse, it is excellent using rms_prop_optimizer Change device and restrains better effect than batch gradient descent method.

Criticizing gradient descent method is a kind of a kind of trade-off algorithm for improving model accuracy rate, if inquiry learning of selection is entire Data set, then carrying out computing resource required for an iteration is huge, and inefficiency, then can be from entire number A certain number of samples are taken to carry out an iteration at random according to concentrating, the feature that recursive neural network each in this way is acquired is filled enough Point, if the sample size of an iteration is too small, algorithm will not restrain, and the concussion of loss curve is severe, not excessive single iteration When sample size reaches certain value, then training effect will not change again, this value is usually arranged by experience, rms_ Prop_optimizer optimizer be after gradient descent method every time how a kind of algorithm of undated parameter value, have a large amount of opinions at present Text proves that this optimization algorithm is preferable to the more new effects of sparse data, i.e., above effect can be more in the distribution of fitting truthful data for model It is good.

Above-mentioned formula is the parameter update mode of rms_prop_optimizer optimizer, wherein Loss is defined as:

Wherein Lconf represents Classification Loss, and Lloc represents coordinate and returns loss, and N represents the size of present lot, Loss table Show the mean error of a batch, Lconf is using intersection entropy loss, C^pRepresent the probability obtained by softmax function Value, X^pI-th of positive example sample true probability value (for 1) is represented, the specific algorithm of smooth is mean square error function in Lloc, i.e., Respective coordinates subtract each other square and multiplied by half (facilitating derivation multiplied by half), l^mRepresent the pre- of i-th of positive example sample Coordinate is surveyed, g represents the tx=(xcenter-xcenter_ that a relative displacement i.e. front of true frame and candidate frame is talked about A)/wa, ty=(ycenter-ycenter_a)/ha, tw=log (w/wa), th=log (h/ha), ∑ represent summation symbol, Indicate the sum of the Loss of all positive example samples of present lot.g_tIndicating that L (x, c, l, g) seeks local derviation to c and smooth (is pair actually The parameter of convolution kernel seeks local derviation), obtained gradient, θ indicates that the value of the convolution nuclear parameter of one layer of convolution (is four dimensions Group), the amplitude that rms_prop_optimizer optimizer meeting accumulated history gradient declines as gradient, η indicates learning rate, 0.9 indicates rate of decay, and ε indicates small constant, and preventing denominator is the next time trained value (ginseng that 0, θ t+1 is convolution nuclear parameter See the convolution algorithm of Fig. 2), E [g²]_tIndicate accumulation gradient value, rms_prop_optimizer can in training pattern early period, Parameter updates faster, and preferable for sparse matrix effect.

It can be seen that in Parameter Initialization procedure of the invention, the initiation parameter for having reset candidate frame it The model come is trained afterwards, and the frame of prediction is than directly more accurate, the obtained probability of classifying using confining for pre-training model Also bigger than previous probability value, but accuracy is not promoted too much, then replaces rms_prop_ on this basis Optimizer and addition moving average model, train the model come, not only confine more accurate, probability value and accuracy rate 5 percentage points, while training identical step number are higher by than before, error of coordinate and error in classification are smaller.

Described above is the processes that the parameter to pre-training model is initialized, below to use target image identify mould The process that type carries out image recognition is described.

As shown in fig. 6, image-recognizing method provided in an embodiment of the present invention includes following operation:

Step 601, images to be recognized, and invocation target image recognition model are obtained.

Step 602, above-mentioned images to be recognized is identified using above-mentioned target image identification model, obtains identification knot Fruit.

Wherein, the specific implementation of step 602 can with but be not limited only to: to the object-image region of the images to be recognized Domain carries out characteristic vector pickup；

It can also include that pretreated process is carried out to images to be recognized before step 602, according to one embodiment, Image enhancement is carried out to the image first, then extracts interested object region in image.

Based on the target image identification model that the parameter optimisation procedure according to the pre-training model of specific application environment obtains, It is greatly improved the speed and accuracy of image procossing.

The above method proposed by the present invention can be realized by way of any software, hardware or firmware.Such as it executes above-mentioned The computer or server of method have processor, memory, and the computer that wherein storage executes the above method in memory refers to It enables.Therefore, the invention also provides corresponding image processing apparatus.

Which kind of, no matter using the image library under scene, require to be labeled image pattern therein.It is all made of at present Artificial mark means, it is irregular to mark effect that is time-consuming and laborious and marking, greatly reduces development efficiency.With it is common based on For the image recognition of Tensorflow, mainly passes through labelImg-master software and manually marked.For example, image The target of mark is labeled to the face in image, then selects the face in image by mouse frame by user, and add Label is face or eyes, and nose, mouth etc. feature is labeled.Tagged operation is selected and adds by the frame of user, Tensorflow can determine the position of face in image by the position and size of detecting candidate frame, and pass through detection label substance Determine the recognition result (as face) in candidate frame；Then Tensorflow can gradually extract these marks by neural network The feature of note carries out deep learning.Because more accurately model needs an a large amount of data set sample to carry out for training one Speculate, all data sets are required by mark, this is an a large amount of and very long process, can also be concentrated with user Degree reduces the deviation for leading to mark.

Therefore, on the basis of above-mentioned each embodiment, the embodiment of the present invention also improves mask method, thus Sample can be marked more quickly when training image identification model, and exploitation image recognition side can be lifted at highly significant The efficiency in face reduces human cost.Specifically using some pictures (such as 1000) are manually marked on a small quantity, one is then trained A biggish parameter area, then the range for passing through the biggish parameter area estimation model input and output (is exactly to mark out face Frame), make the parameter area reach aimed at precision by manual modification, the people marked completely can be greatlyd save repeatedly Power cost.

In annotation process of the invention, at present if it is desired to 7 kinds of different things such as eyes of identification, nose, mouth Etc., 30 pictures can be marked within 1 second through the invention in the equipment of a GTX1060 video card, but if use manpower If marking, then marking 1 picture is -1 minute 30 seconds.

The embodiment of the present invention also provides a kind of pattern recognition device, as shown in fig. 7, comprises:

Image collection module 701, for obtaining images to be recognized；

Model calling module 702 is used for invocation target image recognition model；

Picture recognition module 703, for being identified using the target image identification model to the images to be recognized, Obtain recognition result；

Model training module 704 parses the pre-training model parameter file for obtaining pre-training model parameter file The parameter of the pre-training model parameter file is obtained, the parameter of the pre-training model parameter file is joined including at least candidate frame Number；Initialization process is carried out to the parameter of the pre-training model；Utilize the parameters on target image after initialization process Library is trained, and obtains the target image identification model.

Wherein, the function that each module is realized is same as above, and which is not described herein again.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.The apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation, in another example, multiple units or components can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be through some communication interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme should all cover within the scope of the claims and the description of the invention.

Claims

1. a kind of method of image recognition, which is characterized in that the described method includes:

Training obtains the target image identification model in the following way:

Pre-training model parameter file is obtained, the pre-training model parameter file is parsed and obtains the pre-training model parameter text The parameter of the parameter of part, the pre-training model parameter file includes at least candidate frame parameter；

Initialization process is carried out to the parameter of the pre-training model parameter file；

It is trained using the parameters on target image library after initialization process, obtains the target image identification model.

2. the method according to claim 1, wherein the parameter to the pre-training model parameter file into Row initialization process, comprising:

Delete and/or modify the parameter of the pre-training model parameter file.

3. according to the method described in claim 2, it is characterized in that, the parameter of the pre-training model parameter file further includes volume Lamination weight parameter, the parameter of the modification pre-training model parameter file, comprising:

Modify the parameter value of the convolutional layer weight parameter；And/or

Modify the parameter value of the candidate frame parameter.

4. according to the method described in claim 3, it is characterized in that, the parameter value of the modification candidate frame parameter, comprising:

5. method according to any one of claims 1 to 4, which is characterized in that described using described in after initialization process Parameters on target image library is trained, and obtains the target image identification model, comprising:

Using the parameter after the initialization process as the initiation parameter of model training, depth convolutional neural networks are used Model training is carried out to the target image library with goal regression network；Wherein, the last one convolution of depth convolutional neural networks Input of the output of layer as goal regression network, forms feature pyramid, the pyramidal multiple convolutional layers of feature have The candidate frame parameter.

6. according to the method described in claim 5, it is characterized in that, described use depth convolutional neural networks and goal regression net Network carries out model training to the target image library, further includes:

7. according to the method described in claim 5, it is characterized in that, described use depth convolutional neural networks and goal regression net Network carries out model training to the target image library, further includes:

The parameter is optimized using at least one optimizer.

8. the method according to claim 1, wherein it is described using the target image identification model to it is described to Identification image is identified, before obtaining recognition result, further includes:

Image enhancement processing is carried out to the images to be recognized, and extracts object region from the images to be recognized.

9. according to the method described in claim 8, it is characterized in that, it is described using the target image identification model to it is described to Identification image is identified, recognition result is obtained, comprising:

Using described eigenvector as the input of the target image identification model, obtained by the target image identification model Recognition result.

10. a kind of device of image recognition, which is characterized in that described device includes:

Image collection module, for obtaining images to be recognized；

Model calling module is used for invocation target image recognition model；

Picture recognition module is known for being identified using the target image identification model to the images to be recognized Other result；

Model training module parses the pre-training model parameter file and obtains institute for obtaining pre-training model parameter file The parameter of pre-training model parameter file is stated, the parameter of the pre-training model parameter file includes at least candidate frame parameter；It is right The parameter of the pre-training model carries out initialization process；It is carried out using the parameters on target image library after initialization process Training, obtains the target image identification model.