CN109165654A - The training method and object localization method and device of a kind of target location model - Google Patents

The training method and object localization method and device of a kind of target location model Download PDF

Info

Publication number
CN109165654A
CN109165654A CN201810992851.7A CN201810992851A CN109165654A CN 109165654 A CN109165654 A CN 109165654A CN 201810992851 A CN201810992851 A CN 201810992851A CN 109165654 A CN109165654 A CN 109165654A
Authority
CN
China
Prior art keywords
model
image
loss function
coordinate
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810992851.7A
Other languages
Chinese (zh)
Other versions
CN109165654B (en
Inventor
叶锦宇
刘玉明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiuhu Times Intelligent Technology Co Ltd
Original Assignee
Beijing Jiuhu Times Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiuhu Times Intelligent Technology Co Ltd filed Critical Beijing Jiuhu Times Intelligent Technology Co Ltd
Priority to CN201810992851.7A priority Critical patent/CN109165654B/en
Publication of CN109165654A publication Critical patent/CN109165654A/en
Application granted granted Critical
Publication of CN109165654B publication Critical patent/CN109165654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

This application provides a kind of training methods of target location model, comprising: image pattern is input to convolution model, to extract the first characteristics of image of image pattern;The first image feature is input to parted pattern, to generate the first prospect coordinate of image pattern;The first image feature is input to regression model, to generate the second prospect coordinate of image pattern;According to the first prospect coordinate and the second prospect coordinate computation model loss function;Convolution model, parted pattern and regression model are trained simultaneously according to model loss function, to generate the target location model being made of convolution model and regression model.Using the above scheme, by training convolutional model, parted pattern and regression model, so that the convolution model in actual prediction stage obtains better image feature, meanwhile, framing speed is promoted using regression model.

Description

The training method and object localization method and device of a kind of target location model
Technical field
This application involves image identification technical fields, more particularly, to the training method and target of a kind of target location model Localization method and device.
Background technique
The fixation and recognition technology of image is widely used in life production, is especially believed in the business of examining in finance, is The images to be recognized of reply magnanimity, believe the personnel of examining would generally be completed by framing identification technology intelligence believe examine it is (general It is that the data such as the identity card, bank card and business license of user are audited), to save human cost, and promote production effect Rate.
Existing framing identification technology is the technology that developed based on OCR identification technology.But current OCR Identification technology is not perfect.
Summary of the invention
In view of this, a kind of training method and object localization method for being designed to provide target location model of the application And device.
In a first aspect, the embodiment of the present application provides a kind of training method of target location model, which comprises
Image pattern is input to convolution model, to extract the first characteristics of image of image pattern;
The first image feature is input to parted pattern, to generate the first prospect coordinate of image pattern;
The first image feature is input to regression model, to generate the second prospect coordinate of image pattern;
According to the first prospect coordinate and the second prospect coordinate computation model loss function;
Convolution model, parted pattern and regression model are trained simultaneously according to model loss function, to generate by rolling up The target location model of product module type and regression model composition.
With reference to first aspect, the embodiment of the present application provides the first possible embodiment of first aspect, wherein step It is rapid described according to the first prospect coordinate and the second prospect coordinate computation model loss function, comprising:
According to the difference of the actual coordinate of target in the first prospect coordinate and image pattern, first-loss letter is determined Number;
According to the difference of the actual coordinate of target in the second prospect coordinate and image pattern, the second loss letter is determined Number;
According to the first-loss function and the second loss function, model loss function is determined.
The possible embodiment of with reference to first aspect the first, the embodiment of the present application provide second of first aspect Possible embodiment, wherein step carries out convolution model, parted pattern and regression model according to model loss function simultaneously Training, comprising:
Whether judgment models loss function meets preset output requirement;
If model loss function does not meet preset output requirement, according to model loss function to convolution model, segmentation Model and regression model are trained simultaneously, and re-execute the steps image pattern being input to convolution model, to extract image First characteristics of image of sample.
The possible embodiment of with reference to first aspect the first or second of possible embodiment, the embodiment of the present application Provide the third possible embodiment of first aspect, wherein step is according to model loss function to convolution model, segmentation Model and regression model are trained simultaneously, further includes:
Whether judgment models loss function meets preset output requirement;
If model loss function meets preset output requirement, the target being made of convolution model and regression model is generated Location model.
Second aspect, the embodiment of the present application also provide a kind of object localization method, wherein target image is input to target Convolution model in location model, to extract the second characteristics of image of target image;
Second characteristics of image is input to the regression model in the target location model, to generate target image Prospect coordinate.
The third aspect, the embodiment of the present application also provide a kind of training device of target location model, wherein mention including first Modulus block, for image pattern to be input to convolution model, to extract the first characteristics of image of image pattern;
First processing module, for the first image feature to be input to parted pattern, to generate the of image pattern One prospect coordinate;
Second image processing module, for the first image feature to be input to regression model, to generate image pattern The second prospect coordinate;
First analysis module, for according to the first prospect coordinate and the second prospect coordinate computation model loss function;
First generation module, for according to model loss function to convolution model, parted pattern and regression model simultaneously into Row training, to generate the target location model being made of convolution model and regression model.
In conjunction with the third aspect, the embodiment of the present application provides the first possible embodiment of second aspect, wherein institute Stating the first analysis module includes: the first analytical unit, the second analytical unit and the first determination unit;
First analytical unit, for according to the actual coordinate of target in the first prospect coordinate and image pattern Difference determines first-loss function;
Second analytical unit, for according to the actual coordinate of target in the second prospect coordinate and image pattern Difference determines the second loss function;
First determination unit, for determining that model loses according to the first-loss function and the second loss function Function.
In conjunction with the first possible embodiment of the third aspect, the embodiment of the present application provides second of the third aspect Possible embodiment, wherein first generation module includes: the first judging unit, the first generation unit and the first processing Unit;
Whether first judging unit meets preset requirement for judgment models loss function;
The first processing units, when model loss function, which does not meet preset output, to be required, then for according to model Loss function is trained convolution model, parted pattern and regression model simultaneously, and the first extraction module is driven to rework.
In conjunction with the first possible embodiment of the third aspect, the embodiment of the present application provides the third of the third aspect Possible embodiment, wherein first generation module further include: second judgment unit and the second generation unit;
Whether second judgment unit meets preset output requirement for judgment models loss function;
Second generation unit, if meeting preset output requirement for model loss function, generate by convolution model and The target location model of regression model composition.
Fourth aspect, the embodiment of the present application also provide a kind of device of target positioning, wherein including the second extraction module and Second analysis module;
Second extraction module, the convolution model for being input to target image in target location model, to extract Second characteristics of image of target image;
Second analysis module, the recurrence for being input to second characteristics of image in the target location model Model, to generate the prospect coordinate of target image.
The training method of a kind of target location model provided by the embodiments of the present application, comprising: image pattern is input to volume Product module type, to extract the first characteristics of image of image pattern;The first image feature is input to parted pattern, to generate figure Decent the first prospect coordinate;The first image feature is input to regression model, before second to generate image pattern Scape coordinate;According to the first prospect coordinate and the second prospect coordinate computation model loss function;According to model loss function pair Convolution model, parted pattern and regression model are trained simultaneously, to generate the target being made of convolution model and regression model Location model.That is, helping convolution model and/or regression model to be trained using parted pattern, and final in the training stage After the completion of training, convolution model and regression model are formed into target location model, to improve the case where using parted pattern The slower situation of lower locating speed.
To enable the above objects, features, and advantages of the application to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of basic flow chart of the training method of target location model provided by the embodiment of the present application;
Fig. 2 shows in a kind of training method of target location model provided by the embodiment of the present application, use Training process in model schematic diagram;
Fig. 3 shows a kind of optimized flow chart of the training method of target location model provided by the embodiment of the present application;
Fig. 4 shows the Optimizing Flow of the training method of another kind target location model provided by the embodiment of the present application Figure;
Fig. 5 shows a kind of flow chart of object localization method provided by the embodiment of the present application;
Fig. 6 is shown in a kind of object localization method provided by the embodiment of the present application, and the model that training is completed shows It is intended to;
Fig. 7 shows a kind of structural schematic diagram of the training device of target location model provided by the embodiment of the present application;
Fig. 8 shows a kind of training method and target positioning of progress target location model provided by the embodiment of the present application The structural schematic diagram of the calculating equipment of method.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application Middle attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, instead of all the embodiments.The application being usually described and illustrated herein in the accompanying drawings is real The component for applying example can be arranged and be designed with a variety of different configurations.Therefore, below to the application's provided in the accompanying drawings The detailed description of embodiment is not intended to limit claimed scope of the present application, but is merely representative of the selected reality of the application Apply example.Based on embodiments herein, those skilled in the art institute obtained without making creative work There are other embodiments, shall fall in the protection scope of this application.
The fixation and recognition technical application of image is extensive, especially in financial audit business.Believe that the personnel of examining need pair daily User is uploaded to the data such as the magnanimity identity card, bank card, business license of the page or the end APP and audits.Recently as The promotion of the development of the hardware acceleration devices such as GPU, TPU and all kinds of image recognition algorithms in terms of accuracy rate, starts with people Work intellectual technology carries out the fixation and recognition of image, to save human cost and improving production efficiency.
Believe in the business of examining in finance at present, mainly uses OCR identification technology.It can be automatic by OCR identification technology Identify the text information in the certificate picture of user's upload.Finance believes the audit object in the business of examining, whether identity card, still Bank card is substantially rectangular outer frame, and text filed relatively fixed, and text size is unified.It is positioned in entire OCR identification technology Technology is to carry out framing by the positioning respectively to picture prospect and character area.In order to realize come from page end and The picture at the end APP positions, it will usually in the preposition frame for defining picture size and length-width ratio fixation at one, it is desirable that user uploads Then picture carries out the OCR identification of picture.The wherein restriction of preposition frame, is equivalent to and has filtered out Background, greatly reduces The difficulty of prospect positioning and text location.In addition, under the premise of preposition frame is not added, using traditional border detection and in recent years Zone location is carried out come the neural network risen, after navigating to the coordinate of foreground area, takes foreground picture, then carry out subsequent Character area positioning.
For adding the processing method of preposition frame, it is generally used for cell phone application end, the camera of calling and obtaining user mobile phone carries out scene Shooting.The picture that the program is limited to user's upload must be taken on site, and be not available the history picture stored in photograph album, separately The setting of outer preposition frame also improves the difficulty that user takes pictures, and reduces user experience.In addition, when preposition frame is not one strong rule When then, accounting of the background area in entire picture can only be limited to a certain extent, before subsequent step still be can't do without Scape positioning.When the limitation of preposition frame is not added, picture may be from being taken on site, the history picture being also possible in photograph album.Work as benefit With traditional border detection come when carrying out prospect positioning, border detection is affected by picture quality first, robustness is not By force, it when the unintelligible boundary characteristic of picture is unobvious or background is excessively complicated, will be unable to obtain positioning result, or Position error is very big.
In view of the above problem, the embodiment of the present application provides the training method and target positioning of a kind of target location model Method and device is described below by embodiment.
For convenient for understanding the present embodiment, first to a kind of target location model disclosed in the embodiment of the present application Training method is introduced, as shown in Figure 1, this method comprises the following steps:
Image pattern is input to convolution model by S101, to extract the first characteristics of image of image pattern;
The first image feature is input to parted pattern by S102, to generate the first prospect coordinate of image pattern;
The first image feature is input to regression model by S103, to generate the second prospect coordinate of image pattern;
S104, according to the first prospect coordinate and the second prospect coordinate computation model loss function;
S105 is trained convolution model, parted pattern and regression model according to model loss function simultaneously, to generate The target location model being made of convolution model and regression model.
As shown in Fig. 2, the model in the training that above-mentioned steps S101-S105 is used is shown, the mould in the training Type is made of convolution model 201, parted pattern 202 and regression model 203.Wherein, parted pattern 202 and regression model 203 are made For two models of the output result of reception convolution model, the first image of the image pattern by convolution model output is received respectively Feature.
Such as step S101, in training, first by as the image pattern of training set (such as picture element matrix) bi-directional scaling extremely The input size of convolution model, is then enter into convolution model.This allows for the full articulamentum in convolution model to defeated The picture size entered requires.Specifically, although the convolutional layer in convolution model does not have the limitation of size for image It is required that but linking layer needs to input the image of fixed size entirely.Unified tune is carried out therefore, it is necessary to the size to image pattern It is whole, that is, it is adjusted to fixed size.More specifically, the dimension (size for having reacted the image of input) of full linking layer input vector It is corresponding with the weighting parameter number of full linking layer, if the dimension of input vector is not fixed, the weight of full linking layer is joined Several numbers be also it is unfixed, in this case, convolution model will persistently change during training, may cause convolution Model can not finally train success.So should be inputted here using the picture element matrix of the image of fixed size as image pattern Into convolution model, so that convolution model can extract the first characteristics of image of image pattern.
Two steps that S102 and S103 can be while carry out, step S102 is in specific execute, by using segmentation Model is split identification to the first characteristics of image of image pattern, to determine the first prospect coordinate of image pattern.Herein, divide Cutting model is using various image segmentation algorithms, accurately to position the prospect coordinate of image.Image used herein point Algorithm is cut, mainly has the dividing method based on threshold value, the dividing method based on edge, the dividing method based on region, be based on gathering The image partition method of alanysis, the dividing method based on wavelet transformation, the dividing method based on mathematical morphology and be based on people The dividing method etc. of artificial neural networks.Wherein, using the partitioning algorithm of artificial neural network be by training multi-layer perception (MLP) come Linear decision function is obtained, is then classified to pixel with decision function to achieve the purpose that segmentation.This parted pattern needs A large amount of training data is wanted, and there are the connections of flood tide for neural network, are readily incorporated spatial information, can preferably solve in image Noise and problem of non-uniform.Therefore, above-mentioned parted pattern is preferably the model for utilizing the partitioning algorithm of artificial network.
In addition, step S103 in specific execute, mainly by using regression model, carries out the first characteristics of image linear Positioning is returned, to obtain the second prospect coordinate of image pattern.Regression model to object vector to be positioned mainly by finding Corresponding mapping, so that object vector and the actual position vector error of target are minimum.That is, the first image of given input The feature vector of feature, then by study one group of parameter so that input regression model and after linear regression about figure Decent the second prospect coordinate value is very close with actual value.Here be by the feature of the first characteristics of image of image pattern to Amount translates the feature vector of the first characteristics of image to be positioned, then carries out scaling again, to obtain as input The predicted value of second prospect coordinate of image pattern.And the functional relation between predicted value and actual value is calculated, obtain optimization ginseng Number.Finally by the study of Optimal Parameters, so that predicted value is close to true value, to obtain the second prospect coordinate of image pattern.
Step S104 is the first-loss function (first for obtaining parted pattern and regression model respectively in the training process Loss function is determined according to the first prospect coordinate) and the second loss function (the second loss function be according to the second prospect sit Mark determination) it is calculated, to obtain model loss function.
In step S105, convolution function, segmentation function and regression model are trained simultaneously using model loss function. The each section of training pattern can be made to be optimized using model loss function, to generate finally by convolution model and recurrence mould The target location model of type composition.When specific implementation, can be using model loss function to parted pattern and regression model into Row training, is also possible to be trained convolution model and regression model, or with model loss function to convolution model, segmentation Model and loss model are trained.
The training method of target location model in the embodiment of the present application is to extract image pattern using convolution model first The first characteristics of image, then by convolution model output as a result, i.e. the first characteristics of image be separately input to parted pattern and return Return model, and according to the output result of parted pattern and regression model to convolution model, parted pattern and regression model simultaneously into Row training, so that the target location model being made of convolution model and regression model is obtained, to improve using parted pattern In the case of the slower situation of locating speed.
Further, step S104 can be realized according to the following steps, as shown in Figure 3:
S301 determines the first damage according to the difference of the actual coordinate of target in the first prospect coordinate and image pattern Lose function;
S302 determines the second damage according to the difference of the actual coordinate of target in the second prospect coordinate and image pattern Lose function;
S303 determines model loss function according to the first-loss function and the second loss function.
Step S301 is that the actual coordinate of target in the first prospect coordinate and image pattern for obtaining parted pattern carries out Compare, to determine first-loss function.Wherein, during model training, the reality of target in image pattern can be determined in advance Border coordinate, and be labeled, to determine first-loss function.
In step S302, the actual coordinate by calculating target in the second prospect coordinate, that is, predicted value and image pattern is The difference of true value determines the second loss function.When specific execution, step S301 and step S302 can be and be performed simultaneously , it is also possible to execute respectively.
In step S303, according to the first-loss function and the second loss function obtained in abovementioned steps, it is calculated most Whole model loss function.Here first-loss function and the second loss function is as model loss function is respectively to convolution Model, parted pattern and regression model are optimized and are continuously generated, that is, before generating different the first prospect coordinates and second Scape coordinate, and the first prospect coordinate and the second prospect coordinate are compared with actual coordinate respectively, it is corresponding to generate respectively First-loss function and the second loss function.By lasting generation first-loss function and the second loss function, and to first Loss function and the second loss function are calculated, and determine final model loss function.
Further, as shown in figure 4, step S105 can be realized according to the following steps, step S105 includes two kinds of situations, The first situation is specific as follows:
Whether S401, judgment models loss function meet preset output requirement;
S402, if model loss function does not meet preset requirement, according to model loss function to convolution model, segmentation Model and regression model are trained simultaneously, and re-execute the steps image pattern being input to convolution model, to extract image First characteristics of image of sample.
After determining model loss function according to first-loss function and the second loss function, to the model loss function of generation Judged, when final model loss function, which does not meet preset output, to be required, then should be re-execute the steps image Sample is input to convolution model, to extract the first characteristics of image of image pattern;The first image feature is input to segmentation Model, to generate the first prospect coordinate of image pattern;The first image feature is input to regression model, to generate image Second prospect coordinate of sample;According to the first prospect coordinate and the second prospect coordinate computation model loss function;Further according to To convolution model, parted pattern and regression model, model is trained the model loss function of generation simultaneously.Preset output is wanted Seeking Truth refers to when the error minimum of result and legitimate reading that convolution model, parted pattern and regression model export respectively, then will It is corresponding, final model is determined as according to the model loss function that first-loss function and the second loss function determine and is damaged Lose function.
Second situation is as follows:
Whether S403, judgment models loss function meet preset output requirement;
S404, if model loss function meets preset output requirement, generation is made of convolution model and regression model Target location model.
Meet the convolution model that trains of model loss function that preset output requires and regression model is then should be most Determining target location model eventually.When judge the model loss function meet it is preset output require when, that is, determine model loss When function is optimal model loss function, using the gradient value of gradient descent algorithm computation model loss function, calculates and correspond to Optimal Parameters, and using model parameter training convolutional model, parted pattern and regression model, and generate finally by convolution model With the target location model of regression model composition.Using parted pattern training convolutional model so that convolution model can answered actually With the higher characteristics of image of middle acquisition precision, meanwhile, the regression model after training has better locating speed.
In conclusion a kind of training method of provided target location model, by the way that convolution model, segmentation is respectively trained Model and regression model, so that the convolution model in actual prediction stage obtains better image feature, meanwhile, training regression model Promote framing speed.
Corresponding with the training method of above-mentioned target location model, the application also provides a kind of object localization method, such as Shown in Fig. 5:
Target image is input to the convolution model in target location model by S501, to extract the second figure of target image As feature;
Second characteristics of image is input to the regression model in the target location model, to generate target by S502 The prospect coordinate of image.
Step S501 and S502 are to touch type using the determining target positioning of above-mentioned steps training to carry out the mistake of actual prediction Journey.By the convolution model being input to target image in target location model, to extract the second characteristics of image of target image. Target location model herein is the target location model determining according to above-mentioned steps S101 to S105 training.Target location model In convolution model be optimized according to the parameter of parted pattern, the second image that can more accurately extract target image is special Sign.Second characteristics of image is input to the regression model in target location model, before generating the target image finally needed Scape coordinate.The target location model that actual prediction uses can both have higher positioning accuracy to target image, while also have The locating speed of more block.
In the practical application of framing, give up parted pattern, the mesh formed using convolution model and regression model is fixed Bit model can improve the slower problem of parted pattern locating speed, more preferable to obtain framing result faster.
Method described in above-mentioned Fig. 5 is carried out using target location model, i.e., as shown in fig. 6, the reality ultimately produced Target location model be made of convolution model 601 and regression model 602.Utilize the convolution model in target location model The precision of target positioning is not only increased with regression model, while also improving the speed of positioning.
To sum up, specific step is as follows for the embodiment of the application preceding method:
Step 1, image pattern is input to convolution model, to extract the first characteristics of image of image pattern;
Step 2, the first image feature is input to parted pattern, to generate the first prospect coordinate of image pattern;
Step 3, according to the difference of the actual coordinate of target in the first prospect coordinate and image pattern, the first damage is determined Lose function;
Step 4, the first image feature is input to regression model, to generate the second prospect coordinate of image pattern;
Step 5, according to the difference of the actual coordinate of target in the second prospect coordinate and image pattern, the second damage is determined Lose function;
Step 6, according to first-loss function and the second loss function, model loss function is determined;
Step 7, whether judgment models loss function meets preset output requirement;
Step 8, if model loss function does not meet preset output requirement, according to model loss function to convolution mould Type, parted pattern and regression model are trained simultaneously, and re-execute the steps image pattern being input to convolution model, to mention Take the first characteristics of image of image pattern;
Step 9, if model loss function meets preset output requirement, generation is made of convolution model and regression model Target location model;
Step 10, target image is input to the convolution model in target location model, to extract the second of target image Characteristics of image;
Step 11, second characteristics of image is input to the regression model in the target location model, to generate mesh The prospect coordinate of logo image.
1-9 realizes the training method of target location model through the above steps, and 10-11 realizes mesh through the above steps Mark localization method.
In the embodiment of the present application, the model structure being made of training convolution model, regressive model and parted pattern, and In the training process, the model loss function constructed using regression model and parted pattern, to update shared convolutional layer, return layer With the parameter of dividing layer.In practical stage, give up parted pattern, by the output of regression model as final result.It is comprehensive On, because convolution model carries out regression model positioning after carrying out feature abstraction, the training speed of model is very fast, but positioning accuracy is not Height can obtain very high positioning accuracy, but model is complicated, the training time is long if positioned with segmentation.And pass through regression model Convolutional layer is shared with parted pattern, obtains the better aspect of model using parted pattern in the training stage, forecast period is used back Return model, to accelerate predetermined speed, while also improving the precision of prediction.
In addition, as shown in Figure 7: first mentions the embodiment of the present application also provides a kind of training device of target location model Modulus block 701, for image pattern to be input to convolution model, to extract the first characteristics of image of image pattern;
First processing module 702, for the first image feature to be input to parted pattern, to generate image pattern First prospect coordinate;
Second processing module 703, for the first image feature to be input to regression model, to generate image pattern Second prospect coordinate;
First analysis module 704, for losing letter according to the first prospect coordinate and the second prospect coordinate computation model Number;
First generation module 705, for according to model loss function to convolution model, parted pattern and regression model simultaneously It is trained, to generate the target location model being made of convolution model and regression model.
Wherein, the first analysis module 704 includes: the first analytical unit, the second analytical unit and the first determination unit;
First analytical unit, for the difference according to the actual coordinate of target in the first prospect coordinate and image pattern Not, first-loss function is determined;
Second analytical unit, for the difference according to the actual coordinate of target in the second prospect coordinate and image pattern Not, the second loss function is determined;
First determination unit is calculated according to the first-loss function and the second loss function, to determine that model damages Lose function.
Wherein, the first generation module 705 includes: the first judging unit and first processing units;
Whether the first judging unit meets preset output requirement for judgment models loss function;
First processing units, when model loss function does not meet preset output requirement, for according to model loss function Convolution model, parted pattern and regression model are trained simultaneously, and the first extraction module is driven to rework.
Wherein, the first generation module 705 further include: second judgment unit and the second generation unit;
Whether second judgment unit meets preset output requirement for judgment models loss function;
Second generation unit, if meeting preset output requirement for model loss function, generate by convolution model and The target location model of regression model composition.
The embodiment of the present application further includes a kind of device of target positioning, including the second extraction module and the second analysis module;
Second extraction module, the convolution model for being input to target image in target location model, to extract target Second characteristics of image of image;
Second analysis module, the recurrence mould for being input to second characteristics of image in the target location model Type, to generate the prospect coordinate of target image.
As shown in figure 8, to calculate equipment schematic diagram provided by the embodiment of the present application, which includes: processing Device 81, memory 82 and bus 83, memory 82, which is stored with, to be executed instruction, when calculating equipment operation, processor 81 and storage It is communicated between device 82 by bus 83, processor 81 executes the training side as carried out target location model stored in memory 82 The step of method and object localization method.
The embodiment of the present application also provides a kind of computer readable storage medium, stored on the computer readable storage medium There is computer program, which executes the instruction that any of the above-described embodiment carries out target location model when being run by processor The step of practicing method and object localization method.
Specifically, which can be general storage medium, such as mobile disk, hard disk, on the storage medium Computer program when being run, be able to carry out the training method and object localization method of above-mentioned carry out target location model, from And by the way that convolution model, parted pattern and regression model is respectively trained, so that the convolution model in actual prediction stage obtains more preferably Characteristics of image, meanwhile, utilize regression model promoted framing speed.
The training method of target location model and the computer of object localization method are carried out provided by the embodiment of the present application Program product, the computer readable storage medium including storing program code, the instruction that program code includes can be used for executing Method in previous methods embodiment, specific implementation can be found in embodiment of the method, and details are not described herein.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in the executable non-volatile computer-readable storage medium of a processor.Based on this understanding, the application Technical solution substantially the part of the part that contributes to existing technology or the technical solution can be with software in other words The form of product embodies, which is stored in a storage medium, including some instructions use so that One computer equipment (can be personal computer, server or the network equipment etc.) executes each embodiment institute of the application State all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.
Finally, it should be noted that embodiment described above, the only specific embodiment of the application, to illustrate the application Technical solution, rather than its limitations, the protection scope of the application is not limited thereto, although with reference to the foregoing embodiments to this Shen It please be described in detail, those skilled in the art should understand that: anyone skilled in the art Within the technical scope of the present application, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of the embodiment of the present application technical solution, should all cover the protection in the application Within the scope of.Therefore, the protection scope of the application shall be subject to the protection scope of the claim.

Claims (10)

1. a kind of training method of target location model characterized by comprising
Image pattern is input to convolution model, to extract the first characteristics of image of image pattern;
The first image feature is input to parted pattern, to generate the first prospect coordinate of image pattern;
The first image feature is input to regression model, to generate the second prospect coordinate of image pattern;
According to the first prospect coordinate and the second prospect coordinate computation model loss function;
Convolution model, parted pattern and regression model are trained simultaneously according to model loss function, to generate by convolution mould The target location model of type and regression model composition.
2. the method according to claim 1, wherein according to the first prospect coordinate and the second prospect coordinate meter Calculate model loss function, comprising:
According to the difference of the actual coordinate of target in the first prospect coordinate and image pattern, first-loss function is determined;
According to the difference of the actual coordinate of target in the second prospect coordinate and image pattern, the second loss function is determined;
According to the first-loss function and the second loss function, model loss function is determined.
3. the method according to claim 1, wherein according to model loss function to convolution model, parted pattern It is trained simultaneously with regression model, comprising:
Whether judgment models loss function meets preset output requirement;
If model loss function does not meet preset output requirement, according to model loss function to convolution model, parted pattern Be trained simultaneously in regression model, and re-execute the steps and image pattern be input to convolution model, to extract image sample This first characteristics of image.
4. according to the method described in claim 3, it is characterized in that, according to model loss function to convolution model, parted pattern It is trained simultaneously with regression model, further includes:
Whether judgment models loss function meets preset output requirement;
If model loss function meets preset output requirement, generates and positioned by the target that convolution model and regression model form Model.
5. a kind of object localization method, which is characterized in that be based on method according to any of claims 1-4, comprising:
Target image is input to the convolution model in target location model, to extract the second characteristics of image of target image;
Second characteristics of image is input to the regression model in the target location model, to generate the prospect of target image Coordinate.
6. a kind of training device of target location model characterized by comprising
First extraction module, for image pattern to be input to convolution model, to extract the first characteristics of image of image pattern;
First processing module, for the first image feature to be input to parted pattern, to generate image pattern first before Scape coordinate;
Second processing module, for the first image feature to be input to regression model, to generate image pattern second before Scape coordinate;
First analysis module, for according to the first prospect coordinate and the second prospect coordinate computation model loss function;
First generation module, for being carried out simultaneously according to model loss function in convolution model, parted pattern and regression model Training, to generate the target location model being made of convolution model and regression model.
7. a kind of device according to claim 6, which is characterized in that first analysis module includes: the first analysis list Member, the second analytical unit and the first determination unit;
First analytical unit, for the difference according to the actual coordinate of target in the first prospect coordinate and image pattern Not, first-loss function is determined;
Second analytical unit, for the difference according to the actual coordinate of target in the second prospect coordinate and image pattern Not, the second loss function is determined;
First determination unit, for determining model loss function according to the first-loss function and the second loss function.
8. a kind of device according to claim 6, which is characterized in that first generation module includes: the first judgement list Member, the first generation unit and first processing units;
Whether first judging unit meets preset output requirement for judgment models loss function;
The first processing units, when model loss function, which does not meet preset output, to be required, then for being lost according to model Function is trained convolution model, parted pattern and regression model simultaneously, and the first extraction module is driven to rework.
9. a kind of device according to claim 8, which is characterized in that first generation module further include: the second judgement Unit and the second generation unit;
Whether second judgment unit meets preset output requirement for judgment models loss function;
Second generation unit generates if meeting preset output requirement for model loss function by convolution model and recurrence The target location model of model composition.
10. a kind of device of target positioning characterized by comprising the second extraction module and the second analysis module;
Second extraction module, the convolution model for being input to target image in target location model, to extract target Second characteristics of image of image;
Second analysis module, the recurrence mould for being input to second characteristics of image in the target location model Type, to generate the prospect coordinate of target image.
CN201810992851.7A 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device Active CN109165654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810992851.7A CN109165654B (en) 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810992851.7A CN109165654B (en) 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device

Publications (2)

Publication Number Publication Date
CN109165654A true CN109165654A (en) 2019-01-08
CN109165654B CN109165654B (en) 2021-03-30

Family

ID=64893338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810992851.7A Active CN109165654B (en) 2018-08-23 2018-08-23 Training method of target positioning model and target positioning method and device

Country Status (1)

Country Link
CN (1) CN109165654B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675453A (en) * 2019-10-16 2020-01-10 北京天睿空间科技股份有限公司 Self-positioning method for moving target in known scene
CN111080694A (en) * 2019-12-20 2020-04-28 上海眼控科技股份有限公司 Training and positioning method, device, equipment and storage medium of positioning model
CN111179628A (en) * 2020-01-09 2020-05-19 北京三快在线科技有限公司 Positioning method and device for automatic driving vehicle, electronic equipment and storage medium
CN113469172A (en) * 2020-03-30 2021-10-01 阿里巴巴集团控股有限公司 Target positioning method, model training method, interface interaction method and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550746A (en) * 2015-12-08 2016-05-04 北京旷视科技有限公司 Training method and training device of machine learning model
CN107730514A (en) * 2017-09-29 2018-02-23 北京奇虎科技有限公司 Scene cut network training method, device, computing device and storage medium
CN108133186A (en) * 2017-12-21 2018-06-08 东北林业大学 A kind of plant leaf identification method based on deep learning
CN108416412A (en) * 2018-01-23 2018-08-17 浙江瀚镪自动化设备股份有限公司 A kind of logistics compound key recognition methods based on multitask deep learning
CN108416378A (en) * 2018-02-28 2018-08-17 电子科技大学 A kind of large scene SAR target identification methods based on deep neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550746A (en) * 2015-12-08 2016-05-04 北京旷视科技有限公司 Training method and training device of machine learning model
CN107730514A (en) * 2017-09-29 2018-02-23 北京奇虎科技有限公司 Scene cut network training method, device, computing device and storage medium
CN108133186A (en) * 2017-12-21 2018-06-08 东北林业大学 A kind of plant leaf identification method based on deep learning
CN108416412A (en) * 2018-01-23 2018-08-17 浙江瀚镪自动化设备股份有限公司 A kind of logistics compound key recognition methods based on multitask deep learning
CN108416378A (en) * 2018-02-28 2018-08-17 电子科技大学 A kind of large scene SAR target identification methods based on deep neural network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
JOSEPH REDMON等: "You Only Look Once: Unified, Real-Time Object Detection", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
KAIMING HE等: "Mask R-CNN", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
SHAOQING REN等: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
SHELLCOLLECTOR: "深度学习剪枝", 《HTTPS://BLOG.CSDN.NET/JACKE121/ARTICLE/DETAILS/79450321》 *
STEFAN P NICULESCU: "Artificial neural networks and genetic algorithms in QSAR", 《JOURNAL OF MOLECULAR STRUCTURE: THEOCHEM》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675453A (en) * 2019-10-16 2020-01-10 北京天睿空间科技股份有限公司 Self-positioning method for moving target in known scene
CN110675453B (en) * 2019-10-16 2021-04-13 北京天睿空间科技股份有限公司 Self-positioning method for moving target in known scene
CN111080694A (en) * 2019-12-20 2020-04-28 上海眼控科技股份有限公司 Training and positioning method, device, equipment and storage medium of positioning model
CN111179628A (en) * 2020-01-09 2020-05-19 北京三快在线科技有限公司 Positioning method and device for automatic driving vehicle, electronic equipment and storage medium
CN111179628B (en) * 2020-01-09 2021-09-28 北京三快在线科技有限公司 Positioning method and device for automatic driving vehicle, electronic equipment and storage medium
CN113469172A (en) * 2020-03-30 2021-10-01 阿里巴巴集团控股有限公司 Target positioning method, model training method, interface interaction method and equipment
CN113469172B (en) * 2020-03-30 2022-07-01 阿里巴巴集团控股有限公司 Target positioning method, model training method, interface interaction method and equipment

Also Published As

Publication number Publication date
CN109165654B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
US11030471B2 (en) Text detection method, storage medium, and computer device
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN110852447B (en) Meta learning method and apparatus, initializing method, computing device, and storage medium
CN109165654A (en) The training method and object localization method and device of a kind of target location model
CN109934847B (en) Method and device for estimating posture of weak texture three-dimensional object
CN110517278A (en) Image segmentation and the training method of image segmentation network, device and computer equipment
CN111598998A (en) Three-dimensional virtual model reconstruction method and device, computer equipment and storage medium
CN107330439A (en) A kind of determination method, client and the server of objects in images posture
CN110047095A (en) Tracking, device and terminal device based on target detection
CN110852257B (en) Method and device for detecting key points of human face and storage medium
CN112464912B (en) Robot end face detection method based on YOLO-RGGNet
CN111274999B (en) Data processing method, image processing device and electronic equipment
CN112836756B (en) Image recognition model training method, system and computer equipment
CN111401192B (en) Model training method and related device based on artificial intelligence
CN110298281A (en) Video structural method, apparatus, electronic equipment and storage medium
CN111008631A (en) Image association method and device, storage medium and electronic device
CN112651333A (en) Silence living body detection method and device, terminal equipment and storage medium
CN115830449A (en) Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
WO2021042544A1 (en) Facial verification method and apparatus based on mesh removal model, and computer device and storage medium
CN112749576A (en) Image recognition method and device, computing equipment and computer storage medium
CN104915641A (en) Method for obtaining face image light source orientation based on android platform
CN111429414A (en) Artificial intelligence-based focus image sample determination method and related device
CN116468702A (en) Chloasma assessment method, device, electronic equipment and computer readable storage medium
CN115115947A (en) Remote sensing image detection method and device, electronic equipment and storage medium
CN111967579A (en) Method and apparatus for performing convolution calculation on image using convolution neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant