CN110110715A

CN110110715A - Text detection model training method, text filed, content determine method and apparatus

Info

Publication number: CN110110715A
Application number: CN201910367675.2A
Authority: CN
Inventors: 苏驰; 李凯; 刘弘也; 赵志明
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd; Beijing Kingsoft Cloud Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd; Beijing Kingsoft Cloud Technology Co Ltd
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2019-08-09
Also published as: WO2020221298A1

Abstract

The present invention provides a kind of text detection model training method, text filed, contents to determine method and apparatus；Wherein, text detection model training method includes: to extract multiple initial characteristics figures that network extracts target training image by fisrt feature；Fusion treatment is carried out to multiple initial characteristics figures by Fusion Features network, obtains fusion feature figure；Fusion feature figure is input to the first output network, exports the probability value of candidate region and each candidate region text filed in target training image；First-loss value is determined by preset Detectability loss function；The first initial model is trained according to first-loss value, until the parameter convergence in the first initial model, obtains text detection model.The present invention quickly can all-sidedly and accurately detect each class text in image under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes, and then be also beneficial to the accuracy of follow-up text identification, improve the effect of text identification.

Description

Text detection model training method, text filed, content determine method and apparatus

Technical field

The present invention relates to technical field of image processing, more particularly, to a kind of text detection model training method, text area Domain, content determine method and apparatus.

Background technique

In the related technology, the detection and identification of text can be realized by character cutting mode or deep learning mode. But in the simple scenarios such as these modes are commonly available to, and font size is single, background is simple, text alignment direction is single；In complexity Under scene, such as a variety of font sizes, multiple fonts, various shapes, a variety of directions, the changeable scene of background, above-mentioned text detection identification The effect of mode is poor.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of text detection model training method, text filed, content is true Method and apparatus are determined, quickly all-sidedly and accurately to detect under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes Each class text in image out, and then it is also beneficial to the accuracy of follow-up text identification, improve the effect of text identification.

In a first aspect, the embodiment of the invention provides a kind of text detection model training methods, this method comprises: based on pre- If training gather determine target training image；Target training image is input to the first initial model；First initial model packet It includes fisrt feature and extracts network, Fusion Features network and the first output network；Network, which is extracted, by fisrt feature extracts target instruction Practice multiple initial characteristics figures of image；Scale between multiple initial characteristics figures is different；By Fusion Features network to multiple first Beginning characteristic pattern carries out fusion treatment, obtains fusion feature figure；Fusion feature figure is input to the first output network, output target instruction Practice the probability value of candidate region and each candidate region text filed in image；It is determined by preset Detectability loss function The first-loss value of the probability value of candidate region and each candidate region；The first initial model is carried out according to first-loss value Training, until the parameter convergence in the first initial model, obtains text detection model.

In some embodiments, it includes sequentially connected the first convolutional network of multiple groups that above-mentioned fisrt feature, which extracts network,；Often The first convolutional network of group includes sequentially connected convolutional layer, batch normalization layer and activation primitive layer.

In some embodiments, fusion treatment is carried out to multiple initial characteristics figures above by Fusion Features network, obtained The step of fusion feature figure, comprising: according to the scale of initial characteristics figure, multiple initial characteristics figures are arranged successively；Wherein, it most pushes up The scale of the initial characteristics figure of level is minimum；The scale of the initial characteristics figure of bottom grade is maximum；By the initial spy of top grade Sign figure is determined as the fusion feature figure of top grade；In addition to top grade, by the initial characteristics figure and current layer of current level The fusion feature figure of a upper level for grade is merged, and the fusion feature figure of current level is obtained；The fusion of lowest hierarchical level is special Sign figure is determined as final fusion feature figure.

In some embodiments, above-mentioned first output network includes the first convolutional layer and the second convolutional layer；It is above-mentioned to merge Characteristic pattern is input to the first output network, exports candidate region and each candidate region text filed in target training image Probability value the step of, comprising: fusion feature figure is separately input into the first convolutional layer and the second convolutional layer；Pass through the first convolution Layer carries out the first convolution algorithm, output coordinate matrix to fusion feature figure；Coordinates matrix includes text area in target training image The apex coordinate of the candidate region in domain；The second convolution algorithm, output probability square are carried out to fusion feature figure by the second convolutional layer Battle array；Probability matrix includes the probability value of each candidate region.

In some embodiments, above-mentioned Detectability loss function includes first function and second function；Above-mentioned first function is L₁=| G^*-G|；Wherein, G^*For coordinates matrix text filed in the target training image that marks in advance；G is the first output network The coordinates matrix of text filed candidate region in the target training image of output；Above-mentioned second function is L₂=-Y^*logY-(1- Y^*)log(1-Y)；Wherein, Y^*For probability matrix text filed in the target training image that marks in advance；Y is the first output net The probability matrix of text filed candidate region in the target training image of network output；Log indicates logarithm operation；Above-mentioned candidate regions The first-loss value L=L of the probability value in domain and each candidate region₁+L₂。

In some embodiments, above-mentioned that the first initial model is trained according to first-loss value, until first is initial The step of parameter in model restrains, obtains text detection model, comprising: updated in the first initial model according to first-loss value Parameter；Judge whether updated parameter restrains；It, will be at the beginning of parameter updated first if updated parameter restrains Beginning model is determined as detection model；If updated parameter does not restrain, continue to execute true based on preset training set Set the goal training image the step of, until updated parameter restrains.

In some embodiments, the step of above-mentioned parameter updated according to first-loss value in the first initial model, comprising: According to preset rules, parameter to be updated is determined from the first initial model；Calculate first-loss value in the first initial model to more The derivative of new parameterWherein, L is first-loss value；W is parameter to be updated；Parameter to be updated is updated, is obtained updated Parameter to be updated Wherein, α is predetermined coefficient.

Second aspect, the embodiment of the invention provides a kind of text filed determining methods, this method comprises: obtaining to be detected Image；Image to be detected is input to the text detection model that training is completed in advance, is exported text filed in image to be detected The probability value of multiple candidate regions and each candidate region；The training that text detection model passes through above-mentioned text detection model Method training obtains；According to the overlapping degree between the probability value of candidate region and multiple candidate regions, from multiple candidate regions It is determined in domain text filed in image to be detected.

In some embodiments, the overlapping journey between above-mentioned probability value and multiple candidate regions according to candidate region Degree, from the text filed step determined in multiple candidate regions in image to be detected, comprising: according to the probability of candidate region Value, multiple candidate regions are arranged successively；Wherein, the probability value of first candidate region is maximum, the last one candidate region Probability value is minimum；Using first candidate region as current candidate region, current candidate region is calculated one by one and removes current candidate The overlapping degree of candidate region other than region；By in the candidate region in addition to current candidate region, overlapping degree is greater than pre- If anti-eclipse threshold candidate region reject；Using next candidate region in current candidate region as new current candidate area Domain continues to execute the step for calculating current candidate region with the overlapping degree of the candidate region in addition to current candidate region one by one Suddenly, until reaching the last one candidate region；Remaining candidate region after rejecting is determined as the text in image to be detected Region.

In some embodiments, the above-mentioned probability value according to candidate region, the step of multiple candidate regions are arranged successively Before, this method further include: by multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained To final multiple candidate regions.

The third aspect, the embodiment of the invention provides a kind of content of text to determine method, this method comprises: passing through above-mentioned text One's respective area determines method, obtains text filed in image；The text identification mould that training is completed in advance is input to by text filed Type exports text filed recognition result；The content of text in text filed is determined according to recognition result.

In some embodiments, it is above-mentioned by it is text filed be input in advance training complete identification model the step of before, The above method further include: according to pre-set dimension, be normalized to text filed.

In some embodiments, above-mentioned text identification model is completed by following manner training: being based on preset training set It closes and determines target training text image；Target training text image is input to the second initial model；Second initial model includes Second feature extracts network, feature splits network, second exports network and classification function；Network is extracted by second feature to extract The characteristic pattern of target training text image；Network is split by feature, and characteristic pattern is split into at least one subcharacter figure；It will be sub Characteristic pattern is separately input into the second output network, exports the corresponding output matrix of each subcharacter figure；By each subcharacter figure pair The output matrix answered is separately input into classification function, exports the corresponding probability matrix of each subcharacter figure；Pass through preset identification Loss function determines the second penalty values of probability matrix；The second initial model is trained according to the second penalty values, until the Parameter convergence in two initial models, obtains text identification model.

In some embodiments, it includes sequentially connected the second convolutional network of multiple groups that above-mentioned second feature, which extracts network,；Often The second convolutional network of group includes sequentially connected convolutional layer, pond layer and activation primitive layer.

In some embodiments, characteristic pattern is split into the step of at least one subcharacter figure above by feature fractionation network Suddenly, comprising: along the column direction of characteristic pattern, characteristic pattern is split into at least one subcharacter figure；The column direction of characteristic pattern is text The vertical direction of this line direction.

In some embodiments, above-mentioned second output network includes multiple full articulamentums；The quantity and Zi Te of full articulamentum The quantity for levying figure is corresponding；Subcharacter figure is separately input into the second output network, exports the corresponding output square of each subcharacter figure The step of battle array, comprising: each subcharacter figure is separately input into corresponding full articulamentum, so that each full articulamentum output The corresponding output matrix of characteristic pattern.

In some embodiments, above-mentioned classification function includes Softmax function；The Softmax function isWherein, e indicates natural constant；T indicates t-th of probability matrix；K indicates the target instruction of the training set Practice the number for the kinds of characters that text image is included；M is indicated from 1 to K+1；∑ indicates summation operation；For the output square I-th of element in battle array；It is describedFor the probability matrix p^tIn i-th of element.

In some embodiments, above-mentioned identification loss function include L=-log p (y | { p_t}_{T=1 ... T})；Wherein, y is preparatory The probability matrix of the target training text image of mark；T indicates t-th of probability matrix；p^tFor classification function output Each of the corresponding probability matrix of the subcharacter figure；T is the total quantity of the probability matrix；P indicates to calculate probability；Log table Show logarithm operation.

In some embodiments, above-mentioned that the second initial model is trained according to the second penalty values, until second is initial The step of parameter in model restrains, obtains text identification model, comprising: updated in the second initial model according to the second penalty values Parameter；Judge whether updated parameter restrains；It, will be at the beginning of parameter updated second if updated parameter restrains Beginning model is determined as text identification model；If updated parameter does not restrain, continue to execute based on preset training set The step of determining target training text image is closed, until updated parameters are restrained.

In some embodiments, above-mentioned the step of updating parameters in the second initial model according to the second penalty values, packet It includes: according to preset rules, determining parameter to be updated from the second initial model；Calculate the derivative that the second penalty values treat undated parameterWherein, L ' is the penalty values of probability matrix；W ' is parameter to be updated；Parameter to be updated is updated, is obtained updated to more New parameterWherein, α ' is predetermined coefficient.

In some embodiments, above-mentioned text filed recognition result includes text filed corresponding multiple probability matrixs； The step of content of text in text filed is determined according to recognition result, comprising: determine the maximum probability in each probability matrix The position of value；From in the corresponding relationship of position each in pre-set probability matrix and character, the position of most probable value is obtained Set corresponding character；According to putting in order for multiple probability matrixs, the character got is arranged；It is determined according to the character after arrangement Content of text in text filed.

In some embodiments, the step of above-mentioned character according to after arrangement determines the content of text in text filed, packet It includes: according to preset rules, deleting the repeat character (RPT) and null character in the character after arranging, obtain in the text in text filed Hold.

In some embodiments, after the step of above-mentioned content of text determined according to recognition result in text filed, side Method further include: if include in image it is multiple text filed, obtain it is each it is text filed in content of text；By building in advance Whether it includes sensitive information that vertical sensitive dictionary determines in the corresponding content of text of image.

In some embodiments, above by the sensitive dictionary pre-established determine in the corresponding content of text of image whether The step of including sensitive information, comprising: participle operation is carried out to the content of text got；It will be obtained after participle operation one by one Sensitive with what is pre-established the dictionary of participle matched；If at least one participle successful match, determines the corresponding text of image It include sensitive information in this content.

It in some embodiments, include above-mentioned side after sensitive information in the corresponding content of text of above-mentioned determining image Method further include: obtain it is text filed belonging to the participle of successful match, identify in the picture get it is text filed, or The participle of successful match.

Fourth aspect, the embodiment of the invention provides a kind of text detection model training apparatus, which includes: trained figure As determining module, target training image is determined for gathering based on preset training；Training image input module is used for target Training image is input to the first initial model；First initial model includes that fisrt feature extracts network, Fusion Features network and the One output network；Characteristic extracting module, for extracting multiple initial spies that network extracts target training image by fisrt feature Sign figure；Scale between multiple initial characteristics figures is different；Fusion Features module, for passing through Fusion Features network to multiple initial Characteristic pattern carries out fusion treatment, obtains fusion feature figure；Output module, for fusion feature figure to be input to the first output net Network exports the probability value of candidate region and each candidate region text filed in target training image；Penalty values determine and Training module, first of the probability value for determining candidate region and each candidate region by preset Detectability loss function Penalty values；The first initial model is trained according to first-loss value, until the parameter convergence in the first initial model, obtains Text detection model.

In some embodiments, features described above Fusion Module is also used to:, will be multiple initial according to the scale of initial characteristics figure Characteristic pattern is arranged successively；Wherein, the scale of the initial characteristics figure of top grade is minimum；The scale of the initial characteristics figure of bottom grade It is maximum；The initial characteristics figure of top grade is determined as to the fusion feature figure of top grade；In addition to top grade, by current layer The fusion feature figure of a upper level for the initial characteristics figure and current level of grade is merged, and the fusion feature of current level is obtained Figure；The fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.

In some embodiments, above-mentioned first output network includes the first convolutional layer and the second convolutional layer；Above-mentioned output mould Block is also used to: fusion feature figure is separately input into the first convolutional layer and the second convolutional layer；It is special to fusion by the first convolutional layer Sign figure carries out the first convolution algorithm, output coordinate matrix；Coordinates matrix includes candidate regions text filed in target training image The apex coordinate in domain；The second convolution algorithm, output probability matrix are carried out to fusion feature figure by the second convolutional layer；Probability matrix Probability value including each candidate region.

In some embodiments, above-mentioned Detectability loss function includes first function and second function；The first function is L₁ =| G^*-G|；Wherein, G^*For coordinates matrix text filed in the target training image that marks in advance；G is that the first output network is defeated The coordinates matrix of text filed candidate region in target training image out；The second function is L₂=-Y^*logY-(1-Y^*) log(1-Y)；Wherein, Y^*For probability matrix text filed in the target training image that marks in advance；Y is the first output network The probability matrix of text filed candidate region in the target training image of output；Log indicates logarithm operation；Above-mentioned candidate region And the first-loss value L=L of the probability value of each candidate region₁+L₂。

In some embodiments, above-mentioned penalty values are determining and training module is also used to: updating first according to first-loss value Parameter in initial model；Judge whether updated parameter restrains；If updated parameter restrains, parameter is updated The first initial model afterwards is determined as detection model；If updated parameter does not restrain, continue to execute based on preset The step of determining target training image, is gathered in training, until updated parameter restrains.

In some embodiments, above-mentioned penalty values are determining and training module is also used to: initial from first according to preset rules Model determines parameter to be updated；First-loss value is calculated to the derivative of parameter to be updated in the first initial modelWherein, L is First-loss value；W is parameter to be updated；Parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α is predetermined coefficient.

5th aspect, the embodiment of the invention provides a kind of text filed determining device, which includes: that image obtains mould Block, for obtaining image to be detected；Detection module, for image to be detected to be input to the text detection mould that training is completed in advance Type exports the probability value of multiple candidate regions and each candidate region text filed in image to be detected；Text detection mould Type is obtained by the training method training of above-mentioned text detection model；Text filed determining module, for according to candidate region Overlapping degree between probability value and multiple candidate regions, from the text area determined in multiple candidate regions in image to be detected Domain.

In some embodiments, above-mentioned text filed determining module is also used to:, will be multiple according to the probability value of candidate region Candidate region is arranged successively；Wherein, the probability value of first candidate region is maximum, and the probability value of the last one candidate region is most It is small；Using first candidate region as current candidate region, current candidate region is calculated one by one and in addition to current candidate region Candidate region overlapping degree；By in the candidate region in addition to current candidate region, overlapping degree is greater than preset overlapping It rejects the candidate region of threshold value；Using next candidate region in current candidate region as new current candidate region, continue to hold Row calculates the step of overlapping degree in current candidate region and the candidate region in addition to current candidate region one by one, until reaching The last one candidate region；Remaining candidate region after rejecting is determined as text filed in image to be detected.

In some embodiments, above-mentioned apparatus further include: module is rejected in region, for by multiple candidate regions, probability Value is rejected lower than the candidate region of preset probability threshold value, obtains final multiple candidate regions.

6th aspect, the embodiment of the invention provides a kind of content of text determining device, which includes: that region obtains mould Block, for obtaining text filed in image by above-mentioned text filed determining method；Identification module, being used for will be text filed It is input to the text identification model that training is completed in advance, exports text filed recognition result；Content of text determining module, is used for The content of text in text filed is determined according to recognition result.

In some embodiments, above-mentioned apparatus further include: normalization module is used for according to pre-set dimension, to text filed It is normalized.

In some embodiments, above-mentioned apparatus further includes text identification model training module, for making text identification model It is completed by following manner training: being gathered based on preset training and determine target training text image；By target training text figure As being input to the second initial model；Second initial model includes that second feature extracts network, the second output network and classification function； The characteristic pattern that network extracts target training text image is extracted by second feature；Characteristic pattern is split by the second initial model At at least one subcharacter figure；Subcharacter figure is separately input into the second output network, it is corresponding defeated to export each subcharacter figure Matrix out；The corresponding output matrix of each subcharacter figure is separately input into classification function, it is corresponding to export each subcharacter figure Probability matrix；The second penalty values of probability matrix are determined by preset identification loss function；According to the second penalty values to second Initial model is trained, until the parameter convergence in the second initial model, obtains text identification model.

In some embodiments, above-mentioned identification model training module is also used to: along the column direction of characteristic pattern, by characteristic pattern Split at least one subcharacter figure；The column direction of characteristic pattern is the vertical direction of text line direction.

In some embodiments, above-mentioned second output network includes multiple full articulamentums；The quantity and Zi Te of full articulamentum The quantity for levying figure is corresponding；Identification model training module is also used to: each subcharacter figure is separately input into corresponding full articulamentum In, so that the corresponding output matrix of each full articulamentum output subcharacter figure.

In some embodiments, above-mentioned classification function includes Softmax function；Softmax function isWherein, e indicates natural constant；T indicates t-th of probability matrix；K indicates the target instruction of the training set Practice the number for the kinds of characters that text image is included；M is indicated from 1 to K+1；∑ indicates summation operation；For the output square I-th of element in battle array；It is describedFor the probability matrix p^tIn i-th of element.

In some embodiments, above-mentioned identification model training module is also used to: it is initial to update second according to the second penalty values Parameter in model；Judge whether updated parameters restrain；If updated parameters are restrained, by parameter Updated second initial model is determined as text identification model；If updated parameters are not restrained, continue to hold Row gathers the step of determining target training text image based on preset training, until updated parameters are restrained.

In some embodiments, above-mentioned identification model training module is also used to: according to preset rules, from the second initial model Determine parameter to be updated；Calculate the derivative that the second penalty values treat undated parameterWherein, L ' is the loss of probability matrix Value；W ' is parameter to be updated；Parameter to be updated is updated, updated parameter to be updated is obtainedIts In, α ' is predetermined coefficient.

In some embodiments, above-mentioned text filed recognition result includes text filed corresponding multiple probability matrixs； Above-mentioned content of text determining module is also used to: determining the position of the most probable value in each probability matrix；From pre-set In probability matrix in the corresponding relationship of each position and character, the corresponding character in position of most probable value is obtained；According to multiple Probability matrix puts in order, and arranges the character got；The content of text in text filed is determined according to the character after arrangement.

In some embodiments, above-mentioned content of text determining module is also used to: the word according to preset rules, after deleting arrangement Repeat character (RPT) and null character in symbol, obtain the content of text in text filed.

In some embodiments, above-mentioned apparatus further include: data obtaining module, if for including multiple texts in image One's respective area, obtain it is each it is text filed in content of text；Sensitive information determining module, for the sensitive word by pre-establishing Whether it includes sensitive information that library determines in the corresponding content of text of image.

In some embodiments, above-mentioned sensitive information determining module is also used to: being segmented to the content of text got Operation；The participle dictionary sensitive with what is pre-established obtained after participle operation is matched one by one；If at least one is segmented Successful match determines that in the corresponding content of text of image include sensitive information.

In some embodiments, above-mentioned apparatus further include: area identification module, for obtaining belonging to the participle of successful match It is text filed, identify in the picture get it is text filed.

7th aspect, the embodiment of the invention provides a kind of electronic equipment, including processor and memory, memory storages There is the machine-executable instruction that can be executed by processor, processor executes machine-executable instruction to realize above-mentioned text detection The step of model training method, above-mentioned text filed determining method or above-mentioned content of text determine method.

Eighth aspect, the embodiment of the invention provides a kind of machine readable storage medium, which is deposited Machine-executable instruction is contained, when being called and being executed by processor, machine-executable instruction promotes the machine-executable instruction Processor realizes that above-mentioned text detection model training method, above-mentioned text filed determining method or above-mentioned content of text determine The step of method.

The embodiment of the present invention bring it is following the utility model has the advantages that

The scale of text detection model training method provided in an embodiment of the present invention, first extraction target training image is mutual Different multiple initial characteristics figures；Fusion treatment is carried out to multiple initial characteristics figures again, obtains fusion feature figure；And then it will fusion Characteristic pattern is input to the first output network, exports candidate region and each candidate region text filed in target training image Probability value；After determining first-loss value by preset Detectability loss function, according to the first-loss value to the first introductory die Type is trained, and obtains detection model.In which, feature extraction network can automatically extract the feature of different scale, thus Text detection model, it is only necessary to which inputting an image can be obtained the text filed candidate regions of various scales in the image Domain no longer needs to artificial changing image scale, and it is convenient to operate, especially in a variety of font sizes, multiple fonts, various shapes, a variety of directions Under scene, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to the standard of follow-up text identification True property, improves the effect of text identification.

Other features and advantages of the present invention will illustrate in the following description, alternatively, Partial Feature and advantage can be with Deduce from specification or unambiguously determine, or by implementing above-mentioned technology of the invention it can be learnt that.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, better embodiment is cited below particularly, and match Appended attached drawing is closed, is described in detail below.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of flow chart of text detection model training method provided in an embodiment of the present invention；

Fig. 2 is the structural schematic diagram that a kind of fisrt feature provided in an embodiment of the present invention extracts network；

Fig. 3 is a kind of schematic diagram that multiple initial characteristics figures are carried out with fusion treatment provided in an embodiment of the present invention；

Fig. 4 is the flow chart of the text filed determining method of one kind provided in an embodiment of the present invention；

Fig. 5 is the flow chart of the text filed determining method of another kind provided in an embodiment of the present invention；

Fig. 6 is the flow chart that a kind of content of text provided in an embodiment of the present invention determines method；

Fig. 7 is a kind of flow chart of the training method of text identification model provided in an embodiment of the present invention；

Fig. 8 is the structural schematic diagram that a kind of second feature provided in an embodiment of the present invention extracts network；

Fig. 9 is the flow chart that another content of text provided in an embodiment of the present invention determines method；

Figure 10 is the flow chart that another content of text provided in an embodiment of the present invention determines method；

Figure 11 is a kind of structural schematic diagram of text detection model training apparatus provided in an embodiment of the present invention；

Figure 12 is a kind of structural schematic diagram of text filed determining device provided in an embodiment of the present invention；

Figure 13 is a kind of structural schematic diagram of content of text determining device provided in an embodiment of the present invention；

Figure 14 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

In traditional text recognition technique, the text there may be text is detected from picture by the rule of artificial settings One's respective area, then to the text filed carry out character cutting detected, the corresponding image block of each character is obtained, by training in advance Classifier each image block is identified, and then obtain final text identification result.In which, due to being manually set Regular limited amount, cause to detect it is text filed be mostly regular shape region, be of limited application, it is difficult to be suitable for Text detection identification under complex scene, such as a variety of font sizes, multiple fonts, various shapes, a variety of directions, the changeable field of background Scape, and which is the identification to single character, does not consider the relevance between character, leads to the detection under complex scene Recognition effect is poor.

Furthermore it is also possible to realize text identification by way of deep learning；It is instructed firstly the need of by Recognition with Recurrent Neural Network Practice identification model；Picture to be detected is transformed to a variety of scales again, be input in identification model one by one detect it is text filed simultaneously Identify text；In which, artificial changing image scale is needed, the image of a variety of scales is separately input into identification model, So that identification model identifies different size of text, operate relatively complicated, it is difficult to meet the needs of identifying in real time, in addition, due to Recognition with Recurrent Neural Network needs to defer to time series and carries out recursive operation, it is difficult to which parallel processing, arithmetic speed are slower.Also, the knowledge Other model is text filed usually using the detection of hough transform frame, thus is only capable of detecting and identifying the text of horizontal direction, for appointing The text identification effect for angle of anticipating is poor, leads to the text detection for being difficult to be suitable under complex scene identification.

To sum up, the effect under complex scene of text detection identification method in the related technology is poor；Based on this, the present invention is real Apply that example provides a kind of text detection model training method, text filed, content determines method and apparatus；The technology can answer extensively For the text detection and text identification under various scenes, it is particularly possible to be applied to network direct broadcasting, cable television live streaming, game, Text detection and text identification under the complex scenes such as video.

To be instructed to a kind of text detection model disclosed in the embodiment of the present invention first convenient for understanding the present embodiment Practice method to describe in detail, text detection model can be used for text detection, and this document detects it is to be understood that from image In orient include text image-region.As shown in Figure 1, this method comprises the following steps:

Step S102 is gathered based on preset training and determines target training image.

It is can wrap in the training set containing multiple images, in order to improve the widespread popularity of detection model, training set In image may include the image under various scenes, for example, live scene image, scene of game image, Outdoor Scene image, Indoor scene image etc.；Image in training set also may include the line of text of a variety of font sizes, shape, font, language, so that The detection model trained is able to detect all kinds of line of text.

It include by the text filed of the line of text that manually marks in every image, this article one's respective area can pass through rectangle etc. Quadrilateral frame mark, can also be labeled by other polygon frames；The text filed of mark usually can completely cover Entire line of text, and text filed can be fitted closely with line of text.Furthermore it is also possible to by multiple figures in above-mentioned training set As being divided into training subset and test subset according to preset ratio.In the training process, can from training subset from obtain target Training image.After the completion of training, target detection image can be obtained from test subset, for testing the performance of detection model.

Target training image is input to the first initial model by step S104；First initial model includes fisrt feature Extract network, Fusion Features network and the first output network.

Before being input to the first initial model, target training image can be adjusted to default size, such as 512*512.

Step S106 extracts multiple initial characteristics figures that network extracts target training image by fisrt feature；It is multiple first Scale between beginning characteristic pattern is different.

Wherein, fisrt feature is extracted network and can be realized by multilayer convolutional layer, in general, multilayer convolutional layer is sequentially connected, Every layer of convolutional layer is by being arranged different convolution kernels, to extract the characteristic pattern of different scale.Target training image it is multiple initial In characteristic pattern, each initial characteristics figure can carry out convolutional calculation by corresponding convolutional layer and obtain.By taking four layers of convolutional layer as an example, often Layer convolutional layer can export an initial characteristics figure；Different size of convolution kernel can be set in every layer of convolutional layer, so that every layer of volume The scale of the initial characteristics figure of lamination output is different.In actual implementation, the convolutional layer of input target training image can be set The scale of the initial characteristics figure of output is maximum, and the scale of the initial characteristics figure of subsequent every layer of convolutional layer output is gradually reduced.

Step S108 carries out fusion treatment to multiple initial characteristics figures by the Fusion Features network, is melted Close characteristic pattern.

In general, lesser convolution kernel can use the convolutional network of lesser convolution kernel with the high-frequency characteristic in sensed image The line of text feature of small scale is carried in the initial characteristics figure of output；Biggish convolution kernel can be special with the low frequency in sensed image It levies, the line of text feature of large scale is carried in the initial characteristics figure exported using the convolutional layer of biggish convolutional network；It is based on This, carries the line of text feature of various scales in the initial characteristics figure of multiple and different scales, carry out to multiple initial characteristics figures Also the line of text feature of various scales is carried in the fusion feature figure obtained after fusion treatment.By this way, detection model The line of text that can detecte various scales, without artificially carrying out image scale transform before testing.

It in actual implementation,, can will smaller ruler before being merged since the scale of multiple initial characteristics figures is different The initial characteristics figure of degree carries out interpolation arithmetic and is allowed to the initial spy with large scale to extend the initial characteristics figure of smaller scale Sign figure matches.In fusion process, between different initial characteristics figures, the characteristic point of same position can be multiplied or be added fortune It calculates, to obtain final fusion feature figure.

Fusion feature figure is input to the first output network by step S110, is exported text filed in target training image The probability value of candidate region and each candidate region.

The first output network is used to extract the feature of needs from fusion feature figure, obtains output result；If detection The output result of model is unique as a result, then the first output network generally comprises one group of network；If detection model is defeated Out result be it is a variety of as a result, then this first output network generally comprise multiple groups network, be set side by side between multiple groups network, every group of network Correspondence exports a kind of result.It can be made of convolutional layer or full articulamentum in the first output network.In above-mentioned steps, first is defeated Out network need to export candidate region and candidate region two kinds of probability value as a result, thus may include in the first output network Two groups of networks, every group of network can be convolutional network or fully-connected network.

Step S112 determines the probability of above-mentioned candidate region and each candidate region by preset Detectability loss function The first-loss value of value；The first initial model is trained according to the first-loss value, until the ginseng in the first initial model Number convergence, obtains text detection model.

Normative text region is labeled in target training image in advance, the text filed position based on mark can give birth to At text filed coordinates matrix and text filed probability matrix；It wherein, include mark in text filed coordinates matrix Quasi- text filed apex coordinate；Text filed probability matrix includes text filed probability value, and the probability value is usual It is 1.

Detectability loss function can compare the area of the coordinates matrix of candidate region and the coordinates matrix in normative text region Not and the difference of the probability value in the probability value of candidate region and normative text region, usually difference is bigger, above-mentioned first damage Mistake value is bigger.Based on the parameter of various pieces in adjustable above-mentioned first initial model of the first-loss value, to reach trained Purpose.When parameters are restrained in model, training terminates, and obtains detection model.

The embodiment of the present invention also provides another text detection model training method, this method side described in above-described embodiment It is realized on the basis of method；This method emphasis describes the specific implementation process of each step in above-mentioned training method；This method includes Following steps:

Step 202, gathered based on preset training and determine target training image.

Step 204, target training image is input to the first initial model；First initial model includes that fisrt feature mentions Take network, Fusion Features network and the first output network.

Step 206, multiple initial characteristics figures that network extracts target training image are extracted by fisrt feature；It is multiple initial Scale between characteristic pattern is different.

In actual implementation, in order to improve the performance that fisrt feature extracts network, which extracts network and can wrap Include sequentially connected the first convolutional network of multiple groups；Every group of first convolutional network includes sequentially connected convolutional layer, batch normalization layer With activation primitive layer.Fig. 2 shows the structural schematic diagrams that a kind of fisrt feature extracts network；With four group of first convolution net in Fig. 2 It is illustrated for network, the activation primitive layer of convolutional layer connection the first convolutional network of previous group of the first convolutional network of later group. In addition, fisrt feature extracts the first convolutional network that can also include more multiple groups in network or less organize.

The characteristic pattern that batch normalization layer in first convolutional network is used to export convolutional layer is normalized, the mistake Journey can accelerate fisrt feature and extract the convergence rate of network and detection model, and can alleviate in multilayer convolutional network The problem of gradient disperse, so that fisrt feature extraction network is more stable.Activation primitive layer in first convolutional network can be right Characteristic pattern after normalized carries out functional transformation, which breaks the linear combination of convolutional layer input, can be improved The feature representation ability of first convolutional network.The activation primitive layer is specifically as follows Sigmoid function, tanh function, Relu letter Number etc..

Step 208, fusion treatment is carried out to multiple initial characteristics figures by features described above converged network, is merged Characteristic pattern.

Following step 02-08 provides a kind of concrete implementation mode of step 208, in which, is with pyramid feature Example is illustrated, i.e., the scale of the initial characteristics figure of each convolutional layer output is sequentially reduced:

Step 02, according to the scale of initial characteristics figure, multiple initial characteristics figures are arranged successively；Wherein, top grade The scale of initial characteristics figure is minimum；The scale of the initial characteristics figure of bottom grade is maximum；

Step 04, the initial characteristics figure of top grade is determined as to the fusion feature figure of top grade；

Step 06, in addition to top grade, by melting for a upper level for the initial characteristics figure of current level and current level It closes characteristic pattern to be merged, obtains the fusion feature figure of current level；

Since the scale of the fusion feature figure of a upper level for current level is less than the initial characteristics figure of current level, the two It, can be by interpolation arithmetic, extremely by the scale expansion of the fusion feature figure of a upper level for current level before being merged Fusion treatment identical, and then being added or be multiplied point by point again point by point as the scale of initial characteristics figure of current level, obtains The fusion feature figure of current level.

Step 08, the fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.

Fig. 3 shows a kind of schematic diagram that multiple initial characteristics figures are carried out with fusion treatment；Target training image is through first Feature extraction network obtains four layers of initial characteristics figure after carrying out process of convolution；The initial characteristics figure of top grade is as top grade Fusion feature figure；The fusion feature figure of top grade is merged with the initial characteristics figure of the second level, obtains the second level Fusion feature figure；The fusion feature figure of second level is merged with the initial characteristics figure of third level, obtains third level Fusion feature figure；The fusion feature figure of third level is merged with the initial characteristics figure of the 4th level, obtains the 4th level Fusion feature figure；The fusion feature figure of the fusion feature figure of 4th level, that is, final.

Step 210, fusion feature figure is input to the first output network, exports time text filed in target training image The probability value of favored area and each candidate region.

By taking convolutional network as an example, above-mentioned first output network includes the first convolutional layer and the second convolutional layer；Wherein, the first volume Lamination and the second convolutional layer are set side by side, the first convolutional layer and the second convolutional layer be respectively used to output favored area apex coordinate and The probability value of candidate region, above-mentioned steps 210 can also be realized by following step 12-16:

Step 12, fusion feature figure is separately input into the first convolutional layer and the second convolutional layer；

Step 14, the first convolution algorithm, output coordinate matrix are carried out to fusion feature figure by the first convolutional layer；The coordinate Matrix includes the apex coordinate of candidate region text filed in target training image；

For example, the coordinates matrix can be expressed as n*H*W, wherein H and W is respectively the height and width of coordinates matrix, and n is The dimension of coordinates matrix；For example, one candidate region needs true by four apex coordinates when candidate region is quadrangle It is fixed, thus n is 8；When candidate region is other polygons, then the numerical value of n is usually twice of candidate region number of edges.

Step 16, the second convolution algorithm, output probability matrix are carried out to fusion feature figure by the second convolutional layer；The probability Matrix includes the probability value of each candidate region.

The probability value of each candidate region is referred to as the score of each candidate region, and probability value can be used for characterizing time Favored area can complete packet contain the probability of line of text.

Step 212, the probability of above-mentioned candidate region and each candidate region is determined by preset Detectability loss function The first-loss value of value；The first initial model is trained according to the first-loss value, until the ginseng in the first initial model Number convergence, obtains text detection model.

In actual implementation, above-mentioned Detectability loss function includes first function and second function, is respectively used to calculate candidate The penalty values of the apex coordinate in region and the probability value of each candidate region；Wherein, first function L₁=| G^*-G|；Wherein, G^*For coordinates matrix text filed in the target training image that marks in advance；G is the target training of the first output network output The coordinates matrix of text filed candidate region in image；Second function is L₂=-Y^*logY-(1-Y^*)log(1-Y)；Wherein, Y^*For probability matrix text filed in the target training image that marks in advance；Y is the target training of the first output network output The probability matrix of text filed candidate region in image；Log indicates logarithm operation.The apex coordinate of above-mentioned candidate region and The first-loss value of the probability value of each candidate region is the sum of above-mentioned first function and second function, i.e. L=L₁+L₂。

Based on the above-mentioned description to first-loss value, in above-mentioned steps, according to the first-loss value to the first initial model The process being trained can also be realized by following step 22-28:

Step 22, the parameter in the first initial model is updated according to first-loss value；

In actual implementation, Function Mapping relationship can be preset, initial parameter and first-loss value are input to this In Function Mapping relationship, the parameter of update can be calculated.The Function Mapping relationship of different parameters can be identical, can also not Together.

Specifically, can determine parameter to be updated first, in accordance with preset rules；The parameter to be updated can be at the beginning of first All parameters in beginning model, the partial parameters that can also be determined from the first initial model at random；First-loss value is calculated again To the derivative of parameter to be updated in the first initial modelWherein, L is first-loss value；W is parameter to be updated；This is to be updated Parameter is referred to as the weight of each neuron.The process is referred to as back-propagation algorithm；If first-loss value is larger, The output and desired output result for then illustrating the first current initial model are not inconsistent, then find out above-mentioned first-loss value at the beginning of first The derivative of parameter to be updated, the derivative can be used as the foundation for adjusting parameter to be updated in beginning model.

After obtaining the derivative of each parameter to be updated, then each parameter to be updated is updated, obtains updated ginseng to be updated NumberWherein, α is predetermined coefficient.The process is referred to as stochastic gradient descent algorithm；It is each to more For the derivative of new parameter it can be appreciated that relative to parameter current, first-loss value declines most fast direction, passes through direction tune Whole parameter can be such that first-loss value quickly reduces, and restrain the parameter.In addition, when the first initial model is after primary training, Obtain a first-loss value, at this time can from randomly choosed in parameters in the first initial model one or more parameters into The above-mentioned renewal process of row, the model training time of which is shorter, and algorithm is very fast；It can certainly be in the first initial model All parameters carry out above-mentioned renewal process, and the model training of which is more accurate.

Step 24, judge whether updated parameters restrain；If updated parameters are restrained, execute Step 26；If updated parameters are not restrained, step 28 is executed；

Step 26, updated first initial model of parameter is determined as detection model；Terminate.

Step 28, it continues to execute and the step of determining target training image is gathered based on preset training, until updated Parameters are restrained.

Specifically, can from training set in reacquire new image as target training image, can continue to by Current target training image is trained as target training image.

In aforesaid way, feature extraction network can automatically extract the characteristic pattern of different scale, and then again by different scale Characteristic pattern carry out fusion treatment, the text filed candidate regions of various scales in image are obtained based on obtained fusion feature figure Domain.The detection model, it is only necessary to inputting an image can be obtained the text filed candidate region of various scales in the image, Artificial changing image scale is no longer needed to, it is convenient to operate, especially in a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes Under, it quickly can all-sidedly and accurately detect each class text in image, and then be also beneficial to the accuracy of follow-up text identification, Improve the effect of text identification.

The text detection model training method provided based on the above embodiment, the embodiment of the present invention also provide a kind of text area Domain determines method, and text detection model training method of this method described in above-described embodiment on the basis of is realized；Such as Fig. 4 institute Show, this method comprises the following steps:

Step S402 obtains image to be detected；The image to be detected can be picture, be also possible to from video file or straight Broadcast the video frame etc. intercepted in video.

Image to be detected is input to the text detection model that training is completed in advance, exports image to be detected by step S404 In text filed multiple candidate regions and each candidate region probability value；Text detection model passes through above-mentioned text The training method training of detection model obtains；

Step S406, according to the overlapping degree between the probability value of candidate region and multiple candidate regions, from multiple times It is determined in favored area text filed in image to be detected.

In the candidate region of above-mentioned text detection model output, there may be multiple candidate regions to correspond to the same text Row；In order to found out from multiple candidate regions with the most matched region of line of text, need to screen multiple candidate regions.Greatly In more situations, the overlapped higher multiple candidate regions of degree usually correspond to the same line of text, and then further according to phase mutual respect The probability value of the folded higher multiple candidate regions of degree, can therefrom determine that this article current row is corresponding text filed；For example, by phase In the mutual higher multiple candidate regions of overlapping degree, the maximum candidate region of probability value is determined as text filed.If in image There are multiple line of text, then usually finally determine multiple text filed.

Above-mentioned text filed determining method provided in an embodiment of the present invention, the image to be detected that will acquire are input to text Detection model exports the probability value of multiple candidate regions and each candidate region text filed in image to be detected；In turn According to the overlapping degree between the probability value of candidate region and multiple candidate regions, determination is to be detected from multiple candidate regions It is text filed in image.In which, text detection model can automatically extract the feature of different scale, thus only need defeated Entering an image to the model can be obtained the text filed candidate region of various scales in the image, no longer need to manually convert Graphical rule, it is convenient to operate, can be quickly complete especially under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes Face accurately detects each class text in image, and then is also beneficial to the accuracy of follow-up text identification, improves text knowledge Other effect.

The embodiment of the present invention also provides another text filed determining method, and this method is in above-described embodiment the method On the basis of realize；The description of this method emphasis is general according to the apex coordinate and candidate region for detecting the candidate region that network exports Rate value determines the text filed detailed process in image to be detected；As shown in figure 5, this method comprises the following steps:

Step S502 obtains image to be detected.

Image to be detected is input to the text detection model that training is completed in advance, exports image to be detected by step S504 In text filed multiple candidate regions and each candidate region probability value；

Step S506, by multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained Final multiple candidate regions.

Step S506 is optional step, i.e. in following step S508, each candidate regions that detection model can be exported Domain is arranged, and can also first will test the candidate regions that probability value in the candidate region of model output is lower than preset probability threshold value Domain is rejected, then is arranged remaining candidate region.Above-mentioned preset probability threshold value can be preset, such as 0.2,0.1； The candidate region for being lower than preset probability threshold value by rejecting probability value, advantageously reduces the text in subsequent determining image to be detected The operand of one's respective area improves arithmetic speed.

Multiple candidate regions are arranged successively by step S508 according to the probability value of candidate region；Wherein, first candidate The probability value in region is maximum, and the probability value of the last one candidate region is minimum；

Step S510 calculates current candidate region one by one and works as with removing using first candidate region as current candidate region The overlapping degree of candidate region other than preceding candidate region；

Candidate region in addition to current candidate region can also be referred to as other candidate regions, calculate current candidate area When the overlapping degree of domain and other each candidate regions, the friendship of two candidate regions and ratio, the friendship and ratio etc. can be specifically calculated In the area size of two candidate region intersections and the area size of two candidate region unions.It is appreciated that friendship and ratio is bigger, The overlapping degree of two candidate regions is bigger.It is larger with the current candidate region overlapping degree for current candidate region Other candidate regions usually characterize the same line of text with the current candidate region, and due to the probability value of other candidate regions Less than current candidate region, therefore other candidate regions can be rejected, to pass through this article current row of current candidate area attribute.

Step S512, by the candidate region in addition to current candidate region, overlapping degree is greater than preset anti-eclipse threshold Candidate region reject；The anti-eclipse threshold can be preset, such as 0.5,0.6.

Step S514 is continued to execute using next candidate region in current candidate region as new current candidate region The step of calculating the overlapping degree in current candidate region and the candidate region in addition to current candidate region one by one, until reaching most The latter candidate region.

Include cyclic process in above-mentioned steps S510-S514, can all reject segment candidate region in every wheel circulation, when time It goes through to the last one candidate region, circulation terminates, and final remaining candidate region is determined as the text area in image to be detected Domain.If final remaining candidate region be it is multiple, can determine in image to be detected it is text filed be multiple.

Remaining candidate region after rejecting is determined as text filed in image to be detected by step S516.

In aforesaid way, pass through the probability of the available multiple candidate regions of text detection model and each candidate region Value, and then determination is text filed from multiple candidate regions by way of non-maximum restraining again.In which, text detection mould Type can automatically extract the feature of different scale, thus only need to input an image to the model and can be obtained in the image respectively The text filed candidate region of kind of scale no longer needs to artificial changing image scale, and it is convenient to operate, especially in a variety of font sizes, more Under kind font, various shapes, a variety of direction scenes, each class text in image quickly can be all-sidedly and accurately detected, in turn It is also beneficial to the accuracy of follow-up text identification, improves the effect of text identification.

The text filed determining method provided based on the above embodiment, it is true that the embodiment of the present invention also provides a kind of content of text Determine method, text filed determining method of this method described in above-described embodiment on the basis of is realized；As shown in fig. 6, this method Include the following steps:

Step S602 is obtained text filed in image by above-mentioned text filed determining method；

Step S604 is input to the text identification model that training is completed in advance for text filed, exports text filed knowledge Other result；

Step S606 determines the content of text in text filed according to recognition result.

Above-mentioned text identification model can be trained in several ways and be obtained, such as Recognition with Recurrent Neural Network, convolutional neural networks, Text filed recognition result can certainly be obtained by way of optical character identification.Text identification model can be exported Recognition result be determined as the content of text in text filed, can also first to text identification model output recognition result carry out Optimization processing, such as delete repeat character (RPT) and null character, null character, and then will treated that recognition result is determined as is text filed In content of text.

Content of text provided in an embodiment of the present invention determines method, obtains figure by above-mentioned text filed determining method first It is text filed as in；This article one's respective area is input to the text identification model that training is completed in advance again, is exported text filed Recognition result；The text information in text filed is finally determined according to the recognition result.In which, due to above-mentioned text filed The method of determination can get the text filed of various scales by text detection model, in a variety of font sizes, multiple fonts, a variety of Under shape, a variety of direction scenes, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to text The accuracy of identification improves the effect of text identification.

The embodiment of the present invention also provides another content of text and determines method, and this method is in above-described embodiment the method On the basis of realize；This method emphasis describes the training method of text identification model；Text identification model can be used for text knowledge , text identification is not it is to be understood that detect text in picture region, thus orient include text picture region Domain, and then identify in the picture region particular content of text.As shown in fig. 7, the detection model is instructed by following manner Practice and complete:

Step S702 is gathered based on preset training and determines target training text image；

The target training text image can be individual image, or mark image-region on the image.It should It can wrap the figure containing multiple images, in order to improve the widespread popularity of text identification model, in training set in training set As may include the image under various scenes, for example, live scene image, scene of game image, Outdoor Scene image, indoor field Scape image etc.；Image in training set also may include the line of text of a variety of font sizes, shape, font, language, so as to train Text identification model be able to detect all kinds of line of text.Every target training text image is corresponding with the line of text by manually marking Content of text, such as " hello " " excellent ".The content of text of the corresponding mark of every target training text image.

After the completion of mark, can also by the content of text of the corresponding all line of text of all images in training set, Establish character repertoire；Specifically, getting the content of text of the corresponding all line of text of all images in training set, Cong Zhongti Different characters is taken, character different from each other is formed into character repertoire.Furthermore it is also possible to by multiple images in above-mentioned training set Training subset and test subset are divided into according to preset ratio.In the training process, it can be instructed from training subset from target is obtained Practice image.After the completion of training, target detection image can be obtained from test subset, the property for test text identification model Energy.

Target training text image is input to the second initial model by step S704；Second initial model includes second special Sign extracts network, feature splits network, second exports network and classification function；

Step S706 extracts the characteristic pattern that network extracts target training text image by second feature；

The second feature is extracted network and can be realized by multilayer convolutional layer, in general, multilayer convolutional layer is sequentially connected, every time Convolutional layer carries out convolutional calculation, the data of the last layer convolutional layer output by the way that corresponding convolution kernel is arranged, to the data of input It can be used as the characteristic pattern of target training text image.

Step S708 splits network by feature and characteristic pattern is split at least one subcharacter figure；

Based on the purpose of identification content of text, text identification model needs to split the corresponding characteristic pattern of line of text, makes every It include one or a small amount of text or symbol in a sub- characteristic pattern, convenient for the identification of content of text.It, can be in split process The scale for presetting subcharacter figure, the scale based on the subcharacter figure split characteristic pattern；Subcharacter figure can also be preset Quantity, quantity based on the subcharacter figure splits characteristic pattern.Certainly, if line of text is natively very short, such as only one word Symbol, then characteristic pattern may also only split out a sub- characteristic pattern.

Above-mentioned subcharacter figure is separately input into the second output network, it is corresponding to export each subcharacter figure by step S710 Output matrix；

The second output network for calculating sub- characteristic pattern again；The corresponding output of each subcharacter figure of output In matrix, each position is corresponding with a preset character；Numerical value in this position can characterize the subcharacter figure and the position The matching degree of corresponding character.The second output network can be convolutional network or fully-connected network.

The corresponding output matrix of each subcharacter figure is separately input into classification function by step S712, and it is special to export every height Sign schemes corresponding probability matrix；

Each numerical value in output matrix can be mapped as probability value by the classification function, to obtain probability matrix.It should The probability value on each position in probability matrix can be used for characterizing subcharacter figure character corresponding with the position and match Probability.

Step S714 determines the second penalty values of probability matrix by preset identification loss function；According to second damage Mistake value is trained above-mentioned second initial model, until the parameter convergence in the second initial model, obtains text identification model.

Normative text content is labeled in target training text image in advance, text content is by one or more standards Character composition；Probability matrix can be generated based on text content；In the probability matrix, the corresponding standard character pair of subcharacter figure The probability value for the position answered is 1, and the probability value of other positions is 0.It is general to identify that loss function can be exported with match stop function The difference of rate matrix and the probability matrix of normative text content, usually difference are bigger, and above-mentioned second penalty values are bigger.Based on this The parameter of various pieces, trained to achieve the purpose that in adjustable above-mentioned second initial model of second penalty values.When in model When parameters are restrained, training terminates, and obtains text identification model.

In the training method of above-mentioned text identification model, the characteristic pattern of target training text image is extracted first；Again should Characteristic pattern splits at least one subcharacter figure；And then the subcharacter figure is separately input into the second output network, output is each The corresponding output matrix of subcharacter figure；The corresponding probability matrix of each subcharacter figure is obtained by classification function again；By default Identification loss function determine the second penalty values of probability matrix after, the second initial model is instructed according to second penalty values Get text identification model.In which, model can carry out cutting by the characteristic pattern to image automatically, thus the text identifies Model, it is only necessary to which input includes that content of text in the image can be obtained in the image of line of text, no longer needs to advance to text Row cutting, directly can be obtained the content of text of line of text, and operation editor, arithmetic speed is fast, at the same the recognition accuracy of text compared with It is high.

The embodiment of the present invention also provides the training method of another text identification model, and this method is described in above-described embodiment It is realized on the basis of method；This method emphasis describes the specific implementation process of each step in above-mentioned training method；This method packet Include following steps:

Step 802, gathered based on preset training and determine target training text image；

Step 804, target training text image is input to the second initial model；Second initial model includes second special Sign extracts network, feature splits network, second exports network and classification function；

Step 806, the characteristic pattern that network extracts target training text image is extracted by second feature；

In order to improve the performance that second feature extracts network, it may include sequentially connected more which, which extracts network, The second convolutional network of group；Every group of second convolutional network includes sequentially connected convolutional layer, pond layer and activation primitive layer.Fig. 8 shows A kind of structural schematic diagram of second feature extraction network is gone out；It is illustrated by taking four group of second convolutional network as an example in Fig. 8, it is latter The activation primitive layer of convolutional layer connection the second convolutional network of previous group of the second convolutional network of group.Second feature is extracted in network also The second convolutional network that may include more multiple groups or less organize.

It is appreciated that the convolutional layer in the second convolutional network generates characteristic pattern for extracting feature；The pond layer can be Average pond layer (Average Pooling or mean-pooling), global average pond layer (Global Average Pooling), maximum pond layer (max-pooling) etc.；Pond layer can be used for pressing the characteristic pattern that convolutional layer exports It contracting, the main feature in keeping characteristics figure deletes non-principal feature, to reduce the dimension of characteristic pattern, by taking average pond layer as an example, Average pond layer can feature point value averaging in the neighborhood to the preset range size of current signature point, using average value as The new feature point value of the current characteristic point.In addition, pond layer, which may also help in characteristic pattern, keeps some indeformable, such as rotate Invariance, translation invariance, flexible invariance etc..Activation primitive layer can carry out function change to pond layer treated characteristic pattern It changes, which breaks the linear combination of convolutional layer input, and the feature representation ability of the second convolutional network can be improved.This swashs Function layer living is specifically as follows Sigmoid function, tanh function, Relu function etc..

Step 808, network is split by feature and characteristic pattern is split into at least one subcharacter figure；

In view of most text behavior is transversely arranged, in order to make in the subcharacter figure after splitting to include one or few Characteristic pattern can be split at least one subcharacter figure along the column direction of characteristic pattern by the corresponding feature of the character of amount；The spy The column direction of sign figure can be understood as the vertical direction of text line direction.In actual implementation, according to the width of most of character The width of subcharacter figure is set, features described above figure is split according to the width.For example, features described above figure is H*W*C, preset son is special The width for levying figure is k, then each subcharacter figure is H* (W/k) * C.Furthermore it is also possible to the number of default subcharacter figure, such as T, Then each subcharacter figure is H* (W/T) * C.

Step 810, above-mentioned subcharacter figure is separately input into the second output network, it is corresponding defeated exports each subcharacter figure Matrix out；

By taking convolutional network as an example, which includes multiple full articulamentums；Multiple full articulamentums are set side by side；It should The quantity of full articulamentum is corresponding with the quantity of subcharacter figure, and each subcharacter figure is separately input into corresponding full articulamentum, So that each full articulamentum exports the corresponding output matrix of subcharacter figure.

Step 812, the corresponding output matrix of each subcharacter figure is separately input into classification function, exports each subcharacter Scheme corresponding probability matrix；

The classification function can be Softmax function；The Softmax function can be identified asIts In, e indicates natural constant；T indicates t-th of probability matrix；K indicates that the target training text image of the training set is included Kinds of characters number；M is indicated from 1 to K+1；∑ indicates summation operation；For i-th of element in the output matrix； It is describedFor the probability matrix p^tIn i-th of element.

Relative to the element in output matrixItself, the exponential function value of elementIt can expand between each element Difference, for example, output matrix is [3,1, -3], after the exponential function value for calculating each element, the corresponding finger of the output matrix Number function value matrix is [20,2.7,0.05].The probability that each element is calculated using the exponential function value of element, can increase each other Between probability difference away from keeping the probability of correct recognition result higher, be conducive to the accuracy of recognition result.

Step 814, the second penalty values of probability matrix are determined by preset identification loss function；According to second loss Value is trained the second initial model, until the parameter convergence in the second initial model, obtains text identification model.

The identification loss function include L=-log p (y | { p^t}_{T=1 ... T})；Wherein, y is the target instruction marked in advance Practice the probability matrix of text image；T indicates t-th of probability matrix；p^tFor each of the classification function output subcharacter Scheme corresponding probability matrix；T is the total quantity of the probability matrix；P indicates to calculate probability；Log indicates logarithm operation.Based on this It identifies loss function, in above-mentioned steps, according to the process that second penalty values are trained the second initial model, can also lead to Cross following step 32-38 realization:

Step 32, the parameter in the second initial model is updated according to the second penalty values；

In actual implementation, Function Mapping relationship can be preset, initial parameter and the second penalty values are input to this In Function Mapping relationship, the parameter of update can be calculated.The Function Mapping relationship of different parameters can be identical, can also not Together.

Specifically, can determine parameter to be updated from the second initial model according to preset rules；The parameter to be updated can Think all parameters in the second initial model, can also determine partial parameters from the second initial model at random；Is calculated again Two penalty values treat the derivative of undated parameterWherein, L ' is the penalty values of probability matrix；W ' is parameter to be updated；It should be to Undated parameter is referred to as the weight of each neuron.The process is referred to as back-propagation algorithm；If the second penalty values It is larger, then illustrate that the output of the second current initial model is not inconsistent with desired output result, then finds out above-mentioned second penalty values pair The derivative of parameter to be updated, the derivative can be used as the foundation for adjusting parameter to be updated in second initial model.

After obtaining the derivative of each parameter to be updated, then parameter to be updated is updated, obtains updated parameter to be updatedWherein, α ' is predetermined coefficient.The process is referred to as stochastic gradient descent algorithm；It is each to For the derivative of undated parameter it can be appreciated that based on current parameter to be updated, first-loss value declines most fast direction, passes through Direction adjusting parameter can be such that first-loss value quickly reduces, and restrain the parameter.In addition, when the second initial model is through one After secondary training, obtain second penalty values, at this time can from the second initial model in parameters randomly choose one or Multiple parameters carry out above-mentioned renewal process, and the model training time of which is shorter, and algorithm is very fast；It can certainly be to first All parameters carry out above-mentioned renewal process in initial model, and the model training of which is more accurate.

Step 34, judge whether updated parameter restrains；If updated parameter restrains, step 36 is executed； If updated parameter does not restrain, step 38 is executed；

Step 36, updated second initial model of parameter is determined as identification model；

Step 38, it continues to execute and the step of determining target training text image is gathered based on preset training, until updating Parameters afterwards are restrained.

Specifically, new image can be reacquired from training set as target training text image, it can also be after It is continuous to be trained current target training text image as target training text image.

In aforesaid way, model can carry out cutting, thus text identification model by the characteristic pattern to image automatically, only need Inputting includes that the content of text in the image can be obtained in the image of line of text, no longer needs to carry out cutting to line of text, directly The content of text that line of text can be obtained is connect, operation editor, arithmetic speed is fast, while the recognition accuracy of text is higher.

The content of text provided based on the above embodiment determines that method, the embodiment of the present invention also provide another content of text Determine method, content of text of this method described in above-described embodiment determines the base of the training method of method or text identification model It is realized on plinth；After this method emphasis describes text identification model output recognition result, obtained based on the recognition result text filed Content of text process；As shown in figure 9, this method comprises the following steps:

Step S902 is obtained text filed in image by above-mentioned text filed determining method；

Step S904 is normalized according to pre-set dimension to text filed.

The pre-set dimension may include preset length and width, can be with if text filed be unsatisfactory for the pre-set dimension Processing is zoomed in and out to this article one's respective area, the mode in the region that this article one's respective area can also be sheared or be plugged a gap, so that Treated text filed meets above-mentioned pre-set dimension.

Step S906, by treated it is text filed be input in advance training complete text identification model, export text The recognition result in region；The recognition result of this article one's respective area includes text filed corresponding multiple probability matrixs；

Text identification model needs to carry out cutting to text filed corresponding characteristic pattern, after cutting in identification process Subcharacter figure pass through output network output output matrix accordingly respectively, and then each output square is obtained by classification function again The corresponding probability matrix of battle array, thus text filed recognition result includes multiple probability matrixs, each probability matrix is usually corresponding One or a small amount of character.

Step S908 determines the position of the most probable value in each probability matrix；

Step S910 is obtained most general from the corresponding relationship of position each in pre-set probability matrix and character The corresponding character in the position of rate value；

As described in above-described embodiment, probability value in probability matrix on each position can be used for characterizing the subcharacter figure The probability that character corresponding with the position matches.The corresponding character in the position of most probable value can be thus determined as pair Answer the recognition result of subcharacter figure.In most cases, the corresponding character in the position of most probable value can be a word Symbol, can also be with multiple characters.The corresponding relationship of above-mentioned each position and character, can be established by following manner: be acquired first Character, the character may include text, punctuation mark, mathematic sign, network emoticon of multilingual etc.；It specifically can be Character is acquired during establishing training set, can also be acquired by dictionary, character repertoire, symbolic library etc..

Step S912 arranges the character got according to putting in order for multiple probability matrixs；

Multiple probability matrixs of text identification model output, usually according to the corresponding subcharacter figure of each probability matrix in feature Position determination in figure puts in order, thus the usual son corresponding with each probability matrix that puts in order of multiple probability matrixs is special Putting in order for the character that sign figure includes is consistent；Based on this, according to putting in order for multiple probability matrixs, arranges and get Character, the character after can making arrangement is consistent with the character arrangements of original line of text, thus can be according to the word after arrangement Symbol determines the content of text in text filed.

Step S914 determines the content of text in text filed according to the character after arrangement.

In actual implementation, the character after arrangement can be determined directly to the content of text in text filed；But it considers Character font in text is of different sizes, thus in text identification model, it, may not be fully according to one in cutting characteristic pattern The mode of the corresponding sub- characteristic pattern of a character realizes, thus, there may be mutual duplicate character in the character after final arrangement, In order to advanced optimize the recognition effect of text, can according to preset rules, delete arrangement after character in repeat character (RPT) and Null character obtains the content of text in text filed.

Specifically, a folded dictionary can be pre-established, if there are repeat character (RPT), Ke Yicong in the character after arrangement It searches whether that there are the repeat character (RPT)s in folded dictionary, if it does not exist, then deleting the repeat character (RPT), only retains in repeat character (RPT) One；Furthermore it is also possible to which whether the Semantic judgement current context in conjunction with other characters should have repeat character (RPT).For empty word Symbol can also judge whether to delete in conjunction with current context, if null character is located between two English words, without deleting, It can retain.For example, the character after above-mentioned arrangement is " -- hh-e-l-ll-oo- ", wherein "-" represents null character；It deletes After repeat character (RPT) and null character, obtained content of text is " hello ".

In aforesaid way, first to get it is text filed be normalized, then obtained by text identification model To text filed recognition result；And then the character identified is determined by each probability matrix in recognition result, and then obtain To text filed content of text.Since text identification model can carry out cutting, thus the party by the characteristic pattern to image automatically In formula, it is only necessary to which input includes that the recognition result of the image can be obtained in the image of line of text, and then obtains content of text, nothing Cutting need to be carried out to line of text again, the content of text of line of text directly can be obtained, operation editor, arithmetic speed is fast, while text Recognition accuracy it is higher.

The content of text provided based on the above embodiment determines that method, the embodiment of the present invention also provide another content of text Determine that method, this method are realized based on the above method；This method emphasis describes after obtaining text filed content of text, Based on text content judge in image whether include sensitive word process.

It is often necessary to pre-establish a sensitive dictionary, determined in the corresponding content of text of image by the sensitivity dictionary It whether include sensitive information；Include sensitive word in the sensitivity dictionary, is such as related to the sensitive word of pornographic, reaction, terrorism； Can to the word in content of text, the sensitive dictionary is matched one by one, if successful match, illustrate current term be it is quick Feel word.Based on this, the content of text of the present embodiment determines that method includes the following steps, as shown in Figure 10:

Step S1002 is obtained text filed in image by above-mentioned text filed determining method；

Step S1004 is normalized according to pre-set dimension to text filed.

Step S1006, by treated it is text filed be input in advance training complete text identification model, export text The recognition result in region；The recognition result of this article one's respective area includes text filed corresponding multiple probability matrixs；

Step S1008 determines the position of the most probable value in each probability matrix；

Step S1010 is obtained most general from the corresponding relationship of position each in pre-set probability matrix and character The corresponding character in the position of rate value；

Step S1012 arranges the character got according to putting in order for multiple probability matrixs；

Step S1014 determines the content of text in text filed according to the character after arrangement.

Step S1016, if include in image it is multiple text filed, obtain it is each it is text filed in content of text；

Step S1018 carries out participle operation to the content of text got；

Participle operation is referred to as word cutting operation；In actual implementation, can establish a dictionary, based on the dictionary into Row participle operation；Specifically, can be since the first character in content of text, by the first character and second word Symbol is searched from dictionary as a combination, if can not find comprising the corresponding word of the combination, first character is divided into One individual word；If can find comprising the corresponding word of the combination, then third character is added into the combination, continue It is searched from dictionary；Until can not find comprising the corresponding word of the combination, by the character in the combination in addition to last character It is divided into a word, and so on, until completing the word cutting operation of content of text.

Step S1020 one by one matches the participle dictionary sensitive with what is pre-established obtained after participle operation；

Step S1022, if at least one participle successful match, determines that in the corresponding content of text of image include sensitivity Information.

Step S1024, obtains text filed belonging to the participle of successful match, identifies the text got in the picture The participle of region or successful match.

In actual implementation, text filed or successful match point got can be identified in a manner of marking frame Word；If it is the real-time detection under video playing or real-time live broadcast scene, the mode mark of mosaic or blurring can be used Text filed or successful match the participle got is known, to achieve the purpose that filter sensitive word.

In aforesaid way, after getting text filed content of text, then passes through sensitive dictionary and identified from content of text Sensitive word, to realize the purpose of speech supervision；Which can obtain in real time content and identify sensitive word, be advantageously implemented in net Speech supervision under the scenes such as network live streaming, net cast, and limit the purpose of sensitive word propagation.

It should be noted that the embodiments are all described in a progressive manner for above-mentioned each method, each embodiment is stressed Be the difference from other embodiments, the same or similar parts between the embodiments can be referred to each other.

Corresponding to above method embodiment, a kind of structural representation of text detection model training apparatus shown in Figure 11 Figure, the device include:

Training image determining module 110 determines target training image for gathering based on preset training；

Training image input module 111, for target training image to be input to the first initial model；First initial model Network, Fusion Features network and the first output network are extracted including fisrt feature；

Characteristic extracting module 112, for extracting multiple initial spies that network extracts target training image by fisrt feature Sign figure；Scale between multiple initial characteristics figures is different；

Fusion Features module 113 is obtained for carrying out fusion treatment to multiple initial characteristics figures by Fusion Features network Fusion feature figure；

Output module 114 exports text in target training image for fusion feature figure to be input to the first output network The probability value of the candidate region in region and each candidate region；

Penalty values are determining and training module 115, for determining candidate region and often by preset Detectability loss function The first-loss value of the probability value of a candidate region；The first initial model is trained according to first-loss value, until first Parameter convergence in initial model, obtains text detection model.

The scale of text detection model training apparatus provided in an embodiment of the present invention, first extraction target training image is mutual Different multiple initial characteristics figures；Fusion treatment is carried out to multiple initial characteristics figures again, obtains fusion feature figure；And then it will fusion Characteristic pattern is input to the first output network, exports candidate region and each candidate region text filed in target training image Probability value；After determining first-loss value by preset Detectability loss function, according to the first-loss value to the first introductory die Type is trained, and obtains detection model.In which, feature extraction network can automatically extract the feature of different scale, thus Text detection model, it is only necessary to which inputting an image can be obtained the text filed candidate regions of various scales in the image Domain no longer needs to artificial changing image scale, and it is convenient to operate, especially in a variety of font sizes, multiple fonts, various shapes, a variety of directions Under scene, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to the standard of follow-up text identification True property, improves the effect of text identification.

In some embodiments, above-mentioned Detectability loss function includes first function and second function；First function is L₁=| G^*-G|；Wherein, G^*For coordinates matrix text filed in the target training image that marks in advance；G is the first output network output Target training image in text filed candidate region coordinates matrix；Second function is L₂=-Y^*logY-(1-Y^*)log (1-Y)；Wherein, Y^*For probability matrix text filed in the target training image that marks in advance；Y is the first output network output Target training image in text filed candidate region probability matrix；The probability value of candidate region and each candidate region First-loss value L=L₁+L₂。

A kind of structural schematic diagram of text filed determining device shown in Figure 12；The device includes:

Image collection module 120, for obtaining image to be detected；

Detection module 122 exports to be checked for image to be detected to be input to the text detection model that training is completed in advance The probability value of text filed multiple candidate regions and each candidate region in altimetric image；Text detection model passes through above-mentioned The training method training of text detection model obtains；

Text filed determining module 124, for the weight between the probability value and multiple candidate regions according to candidate region Folded degree, it is text filed in image to be detected from being determined in multiple candidate regions.

Above-mentioned text filed determining device provided in an embodiment of the present invention, the image to be detected that will acquire are input to text Detection model exports the probability value of multiple candidate regions and each candidate region text filed in image to be detected；In turn According to the overlapping degree between the probability value of candidate region and multiple candidate regions, determination is to be detected from multiple candidate regions It is text filed in image.In which, text detection model can automatically extract the feature of different scale, thus only need defeated Entering an image to the model can be obtained the text filed candidate region of various scales in the image, no longer need to manually convert Graphical rule, it is convenient to operate, can be quickly complete especially under a variety of font sizes, multiple fonts, various shapes, a variety of direction scenes Face accurately detects each class text in image, and then is also beneficial to the accuracy of follow-up text identification, improves text knowledge Other effect.

A kind of structural schematic diagram of content of text determining device shown in Figure 13；The device includes:

Region obtains module 130, for the text filed determining method by any one of claim 8-10, obtains image In it is text filed；

Identification module 132 exports text area for being input to the text identification model that training is completed in advance for text filed The recognition result in domain；

Content of text determining module 134, for determining the content of text in text filed according to recognition result.

Content of text determining device provided in an embodiment of the present invention obtains figure by above-mentioned text filed determining method first It is text filed as in；This article one's respective area is input to the text identification model that training is completed in advance again, is exported text filed Recognition result；The text information in text filed is finally determined according to the recognition result.In which, due to above-mentioned text filed The method of determination can get the text filed of various scales by text detection model, in a variety of font sizes, multiple fonts, a variety of Under shape, a variety of direction scenes, each class text in image quickly can be all-sidedly and accurately detected, and then be also beneficial to text The accuracy of identification improves the effect of text identification.

In some embodiments, above-mentioned text identification model training module is also used to:, will be special along the column direction of characteristic pattern Sign figure splits at least one subcharacter figure；The column direction of characteristic pattern is the vertical direction of text line direction.

In some embodiments, above-mentioned identification loss function include L=-log p (y | { p^t}_{T=1 ... T})；Wherein, y is preparatory The probability matrix of the target training text image of mark；T indicates t-th of probability matrix；p^tFor classification function output Each of the corresponding probability matrix of the subcharacter figure；T is the total quantity of the probability matrix；P indicates to calculate probability；Log table Show logarithm operation.

In some embodiments, above-mentioned text filed recognition result includes text filed corresponding multiple probability matrixs； Content of text determining module is also used to: determining the position of the most probable value in each probability matrix；From pre-set probability In matrix in the corresponding relationship of each position and character, the corresponding character in position of most probable value is obtained；According to multiple probability Matrix puts in order, and arranges the character got；The content of text in text filed is determined according to the character after arrangement.

The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.

The embodiment of the invention also provides a kind of electronic equipment, shown in Figure 14, which includes memory 100 With processor 101, wherein memory 100 is for storing one or more computer instruction, one or more computer instruction quilt Processor 101 executes, to realize that above-mentioned text detection model training method, text filed determining method or content of text are true The step of determining method.

Further, electronic equipment shown in Figure 14 further includes bus 102 and communication interface 103, processor 101, communication Interface 103 and memory 100 are connected by bus 102.

Wherein, memory 100 may include high-speed random access memory (RAM, RandomAccessMemory), can also It can further include non-labile memory (non-volatilememory), a for example, at least magnetic disk storage.By at least One communication interface 103 (can be wired or wireless) realizes the communication between the system network element and at least one other network element Connection, can be used internet, wide area network, local network, Metropolitan Area Network (MAN) etc..Bus 102 can be isa bus, pci bus or EISA Bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for indicating, only with one in Figure 14 Four-headed arrow indicates, it is not intended that an only bus or a type of bus.

Processor 101 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization Each step of method can be completed by the integrated logic circuit of the hardware in processor 101 or the instruction of software form.On The processor 101 stated can be general processor, including central processing unit (CentralProcessingUnit, abbreviation CPU), Network processing unit (NetworkProcessor, abbreviation NP) etc.；It can also be digital signal processor (Digital Signal Processing, abbreviation DSP), specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), ready-made programmable gate array (Field-Programmable Gate Array, abbreviation FPGA) or other are programmable Logical device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute in the embodiment of the present invention Disclosed each method, step and logic diagram.General processor can be microprocessor or the processor is also possible to appoint What conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware decoding processing Device executes completion, or in decoding processor hardware and software module combination execute completion.Software module can be located at Machine memory, flash memory, read-only memory, programmable read only memory or electrically erasable programmable memory, register etc. are originally In the storage medium of field maturation.The storage medium is located at memory 100, and processor 101 reads the information in memory 100, The step of completing the method for previous embodiment in conjunction with its hardware.

The embodiment of the invention also provides a kind of machine readable storage medium, which is stored with machine Executable instruction, for the machine-executable instruction when being called and being executed by processor, machine-executable instruction promotes processor real It is the step of existing above-mentioned text detection model training method, text filed determining method or content of text determine method, specific real Now reference can be made to embodiment of the method, details are not described herein.

Text detection model training method, text filed, content provided by the embodiment of the present invention determine method, apparatus and The computer program product of electronic equipment, the computer readable storage medium including storing program code, said program code Including instruction can be used for executing previous methods method as described in the examples, specific implementation can be found in embodiment of the method, herein It repeats no more.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a kind of text detection model training method, which is characterized in that the described method includes:

Gathered based on preset training and determines target training image；

The target training image is input to the first initial model；First initial model includes that fisrt feature extracts net Network, Fusion Features network and the first output network；

Multiple initial characteristics figures that network extracts the target training image are extracted by the fisrt feature；It is multiple described initial Scale between characteristic pattern is different；

Fusion treatment is carried out to multiple initial characteristics figures by the Fusion Features network, obtains fusion feature figure；

The fusion feature figure is input to the first output network, exports time text filed in the target training image The probability value of favored area and each candidate region；

The first of the probability value of the candidate region and each candidate region is determined by preset Detectability loss function Penalty values；First initial model is trained according to the first-loss value, until in first initial model Parameter convergence, obtains text detection model.

2. the method according to claim 1, wherein it includes sequentially connected more that the fisrt feature, which extracts network, The first convolutional network of group；First convolutional network described in every group includes sequentially connected convolutional layer, batch normalization layer and activation primitive Layer.

3. the method according to claim 1, wherein by the Fusion Features network to multiple initial spies The step of sign figure carries out fusion treatment, obtains fusion feature figure, comprising:

According to the scale of the initial characteristics figure, multiple initial characteristics figures are arranged successively；Wherein, top grade is initial The scale of characteristic pattern is minimum；The scale of the initial characteristics figure of bottom grade is maximum；

The initial characteristics figure of top grade is determined as to the fusion feature figure of the top grade；

It is in addition to the top grade, the fusion of the initial characteristics figure of current level and a upper level for the current level is special Sign figure is merged, and the fusion feature figure of current level is obtained；

The fusion feature figure of lowest hierarchical level is determined as to final fusion feature figure.

4. the method according to claim 1, wherein the first output network includes the first convolutional layer and second Convolutional layer；

The fusion feature figure is input to the first output network, exports time text filed in the target training image The step of probability value of favored area and each candidate region, comprising:

The fusion feature figure is separately input into first convolutional layer and second convolutional layer；

The first convolution algorithm, output coordinate matrix are carried out to the fusion feature figure by first convolutional layer；The coordinate Matrix includes the apex coordinate of candidate region text filed in the target training image；

The second convolution algorithm, output probability matrix are carried out to the fusion feature figure by second convolutional layer；The probability Matrix includes the probability value of each candidate region.

5. the method according to claim 1, wherein the Detectability loss function includes first function and the second letter Number；

The first function is L₁=| G^*-G|；Wherein, the G^*It is text filed in the target training image that marks in advance Coordinates matrix；G is the seat of candidate region text filed in the target training image of the first output network output Mark matrix；

The second function is L₂=-Y^*logY-(1-Y^*)log(1-Y)；Wherein, Y^*For the target training figure marked in advance The text filed probability matrix as in；Y is text filed in the target training image of the first output network output The probability matrix of candidate region；Log indicates logarithm operation；

The first-loss value L=L of the probability value of the candidate region and each candidate region₁+L₂。

6. the method according to claim 1, wherein according to the first-loss value to first initial model It is trained, up to the step of parameter in first initial model restrains, obtains text detection model, comprising:

The parameter in first initial model is updated according to the first-loss value；

Judge whether the updated parameter restrains；

If the updated parameter restrains, updated first initial model of parameter is determined as detection model；

If the updated parameter does not restrain, continues to execute and determining target training image is gathered based on preset training The step of, until the updated parameter restrains.

7. according to the method described in claim 6, it is characterized in that, updating first introductory die according to the first-loss value The step of parameter in type, comprising:

According to preset rules, parameter to be updated is determined from first initial model；

The first-loss value is calculated to the derivative of parameter to be updated described in first initial modelWherein, L is institute State first-loss value；W is the parameter to be updated；

The parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α is default system Number.

8. a kind of text filed determining method, which is characterized in that the described method includes:

Obtain image to be detected；

Described image to be detected is input to the text detection model that training is completed in advance, exports text in described image to be detected The probability value of multiple candidate regions in region and each candidate region；The text detection model passes through claim The training method training of the described in any item text detection models of 1-7 obtains；

According to the overlapping degree between the probability value of the candidate region and multiple candidate regions, from multiple candidates It is determined in region text filed in described image to be detected.

9. according to the method described in claim 8, it is characterized in that, according to the probability value of the candidate region and multiple described Overlapping degree between candidate region, from the text filed step determined in multiple candidate regions in described image to be detected Suddenly, comprising:

According to the probability value of the candidate region, multiple candidate regions are arranged successively；Wherein, first candidate region Probability value is maximum, and the probability value of the last one candidate region is minimum；

Using first candidate region as current candidate region, the current candidate region is calculated one by one and except described current The overlapping degree of candidate region other than candidate region；

By in the candidate region in addition to the current candidate region, the overlapping degree is greater than the candidate of preset anti-eclipse threshold It rejects in region；

Using next candidate region in the current candidate region as new current candidate region, continues to execute and calculate institute one by one The step of stating the overlapping degree in current candidate region and the candidate region in addition to the current candidate region, until reaching last One candidate region；

Remaining candidate region after rejecting is determined as text filed in described image to be detected.

10., will be multiple described according to the method described in claim 9, it is characterized in that, according to the probability value of the candidate region Before the step of candidate region is arranged successively, the method also includes:

By in multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained final more A candidate region.

11. a kind of content of text determines method, which is characterized in that the described method includes:

By the described in any item text filed determining methods of claim 8-10, obtain text filed in image；

By the text filed text identification model for being input to training completion in advance, the text filed identification knot is exported Fruit；

According to the recognition result determine it is described it is text filed in content of text.

12. according to the method for claim 11, which is characterized in that text filed be input to is trained completion in advance Before the step of identification model, the method also includes: according to pre-set dimension, text filed it is normalized to described.

13. according to the method for claim 11, which is characterized in that the text identification model has been trained by following manner At:

Gathered based on preset training and determines target training text image；

The target training text image is input to the second initial model；Second initial model includes that second feature is extracted Network, feature split network, the second output network and classification function；

The characteristic pattern that network extracts the target training text image is extracted by the second feature；

Network is split by the feature, and the characteristic pattern is split into at least one subcharacter figure；

The subcharacter figure is separately input into the second output network, exports the corresponding output square of each subcharacter figure Battle array；

The corresponding output matrix of each subcharacter figure is separately input into the classification function, exports each subcharacter Scheme corresponding probability matrix；

The second penalty values of the probability matrix are determined by preset identification loss function；According to second penalty values to institute It states the second initial model to be trained, until the parameter convergence in second initial model, obtains text identification model.

14. according to the method for claim 13, which is characterized in that it includes sequentially connected that the second feature, which extracts network, The second convolutional network of multiple groups；Second convolutional network described in every group includes sequentially connected convolutional layer, pond layer and activation primitive layer.

15. according to the method for claim 13, which is characterized in that split network by the feature and tear the characteristic pattern open The step of being divided at least one subcharacter figure, comprising:

Along the column direction of the characteristic pattern, the characteristic pattern is split into at least one subcharacter figure；The column of the characteristic pattern Direction is the vertical direction of text line direction.

16. according to the method for claim 13, which is characterized in that the second output network includes multiple full articulamentums； The quantity of the full articulamentum is corresponding with the quantity of the subcharacter figure；

It is described that the subcharacter figure is separately input into the second output network, it is corresponding defeated to export each subcharacter figure The step of matrix out, comprising: each subcharacter figure is separately input into corresponding full articulamentum, so that each described complete Articulamentum exports the corresponding output matrix of the subcharacter figure.

17. according to the method for claim 13, which is characterized in that the classification function includes Softmax function；

The Softmax function isWherein, e indicates natural constant；T indicates t-th of probability matrix；K table Show the number for the kinds of characters that the target training text image of the training set is included；M is indicated from 1 to K+1；∑ expression is asked And operation；For i-th of element in the output matrix；It is describedFor the probability matrix p^tIn i-th of element.

18. according to the method for claim 13, which is characterized in that the identification loss function include L=-log p (y | {p^t}_{T=1 ... T})；Wherein, y is the probability matrix of the target training text image marked in advance；T indicates t-th of probability square Battle array；p^tFor each of the classification function output corresponding probability matrix of the subcharacter figure；T is the sum of the probability matrix Amount；P indicates to calculate probability；Log indicates logarithm operation.

19. according to the method for claim 13, which is characterized in that according to second penalty values to second introductory die Type is trained, up to the step of parameter in second initial model restrains, obtains text identification model, comprising:

The parameter in second initial model is updated according to second penalty values；

Judge whether the updated parameter restrains；

If the updated parameter restrains, updated second initial model of parameter is determined as text identification mould Type；

If the updated parameter does not restrain, continues to execute and determining target training text is gathered based on preset training The step of image, until updated each parameter restrains.

20. according to the method for claim 19, which is characterized in that it is initial to update described second according to second penalty values In model the step of parameters, comprising:

According to preset rules, parameter to be updated is determined from second initial model；

Second penalty values are calculated to the derivative of the parameter to be updatedWherein, L ' is the loss of the probability matrix Value；W ' is the parameter to be updated；

The parameter to be updated is updated, updated parameter to be updated is obtainedWherein, α ' is default Coefficient.

21. according to the method for claim 11, which is characterized in that the text filed recognition result includes the text The corresponding multiple probability matrixs in region；

According to the recognition result determine it is described it is text filed in content of text the step of, comprising:

Determine the position of the most probable value in each probability matrix；

From in the corresponding relationship of position each in pre-set probability matrix and character, the position of the most probable value is obtained Corresponding character；

According to putting in order for multiple probability matrixs, the character got is arranged；

According to the character after arrangement determine it is described it is text filed in content of text.

22. according to the method for claim 21, which is characterized in that determine the text area according to the character after arrangement The step of content of text in domain, comprising:

According to preset rules, the repeat character (RPT) and null character in the character after deleting arrangement, obtain it is described it is text filed in Content of text.

23. according to the method for claim 11, which is characterized in that according to the recognition result determine it is described it is text filed in Content of text the step of after, the method also includes:

If include in described image it is multiple text filed, obtain it is each it is described it is text filed in content of text；

Determine in the corresponding content of text of described image whether include sensitive information by the sensitive dictionary pre-established.

24. according to the method for claim 23, which is characterized in that determine described image by the sensitive dictionary pre-established The step of whether including sensitive information in corresponding content of text, comprising:

Participle operation is carried out to the content of text got；

The participle dictionary sensitive with what is pre-established obtained after participle operation is matched one by one；

If at least one participle successful match, determines that in the corresponding content of text of described image include sensitive information.

25. according to the method for claim 24, which is characterized in that determine in the corresponding content of text of described image and include After sensitive information, the method also includes:

Obtain successful match participle belonging to it is text filed, identified in described image get it is described text filed, Or the participle of successful match.

26. a kind of text detection model training apparatus, which is characterized in that described device includes:

Training image determining module determines target training image for gathering based on preset training；

Training image input module, for the target training image to be input to the first initial model；First introductory die Type includes that fisrt feature extracts network, Fusion Features network and the first output network；

Characteristic extracting module, for extracting multiple initial spies that network extracts the target training image by the fisrt feature Sign figure；Scale between multiple initial characteristics figures is different；

Fusion Features module is obtained for carrying out fusion treatment to multiple initial characteristics figures by the Fusion Features network To fusion feature figure；

Output module exports the target training image for the fusion feature figure to be input to the first output network In text filed candidate region and each candidate region probability value；

Penalty values determination and training module, for determining the candidate region and each institute by preset Detectability loss function State the first-loss value of the probability value of candidate region；First initial model is trained according to the first-loss value, Until the parameter convergence in first initial model, obtains text detection model.

27. device according to claim 26, which is characterized in that it includes sequentially connected that the fisrt feature, which extracts network, The first convolutional network of multiple groups；First convolutional network described in every group includes sequentially connected convolutional layer, batch normalization layer and activation letter Several layers.

28. device according to claim 26, which is characterized in that the Fusion Features module is also used to:

29. device according to claim 26, which is characterized in that the first output network includes the first convolutional layer and the Two convolutional layers；

The output module is also used to:

30. device according to claim 26, which is characterized in that the Detectability loss function includes first function and second Function；

31. device according to claim 26, which is characterized in that the penalty values are determining and training module is also used to:

Judge whether the updated parameter restrains；

32. device according to claim 31, which is characterized in that the penalty values are determining and training module is also used to:

33. a kind of text filed determining device, which is characterized in that described device includes:

Image collection module, for obtaining image to be detected；

Detection module, for described image to be detected to be input to the text detection model that training is completed in advance, output it is described to The probability value of text filed multiple candidate regions and each candidate region in detection image；The text detection mould Type is obtained by the training method training of the described in any item text detection models of claim 1-7；

Text filed determining module, for the weight between the probability value and multiple candidate regions according to the candidate region Folded degree, it is text filed in described image to be detected from being determined in multiple candidate regions.

34. device according to claim 33, which is characterized in that the text filed determining module is also used to:

35. device according to claim 34, which is characterized in that described device further include: module is rejected in region, and being used for will In multiple candidate regions, probability value is rejected lower than the candidate region of preset probability threshold value, is obtained final multiple described Candidate region.

36. a kind of content of text determining device, which is characterized in that described device includes:

Region obtains module, for obtaining in image by the described in any item text filed determining methods of claim 8-10 It is text filed；

Identification module, for exporting the text for the text filed text identification model for being input to training completion in advance The recognition result in region；

Content of text determining module, for according to the recognition result determine it is described it is text filed in content of text.

37. device according to claim 36, which is characterized in that described device further include: normalization module, for according to Pre-set dimension text filed is normalized to described.

38. device according to claim 36, which is characterized in that described device further includes text identification model training mould Block, for completing the text identification model by following manner training:

Gathered based on preset training and determines target training text image；

The target training text image is input to the second initial model；Second initial model includes that second feature is extracted Network, the second output network and classification function；

The characteristic pattern is split into at least one subcharacter figure by second initial model；

39. the device according to claim 38, which is characterized in that it includes sequentially connected that the second feature, which extracts network, The second convolutional network of multiple groups；Second convolutional network described in every group includes sequentially connected convolutional layer, pond layer and activation primitive layer.

40. the device according to claim 38, which is characterized in that the identification model training module is also used to:

41. the device according to claim 38, which is characterized in that the second output network includes multiple full articulamentums； The quantity of the full articulamentum is corresponding with the quantity of the subcharacter figure；

The identification model training module is also used to: each subcharacter figure is separately input into corresponding full articulamentum, So that each full articulamentum exports the corresponding output matrix of the subcharacter figure.

42. the device according to claim 38, which is characterized in that the classification function includes Softmax function；

43. the device according to claim 38, which is characterized in that the identification loss function include L=-log p (y | {p^t}_{T=1 ... T})；Wherein, y is the probability matrix of the target training text image marked in advance；T indicates t-th of probability square Battle array；p^tFor each of the classification function output corresponding probability matrix of the subcharacter figure；T is the sum of the probability matrix Amount；P indicates to calculate probability；Log indicates logarithm operation.

44. the device according to claim 38, which is characterized in that the identification model training module is also used to:

Judge whether updated each parameter restrains；

If updated each parameter restrains, updated second initial model of parameter is determined as text and is known Other model；

If updated each parameter does not restrain, continues to execute and determining target training is gathered based on preset training The step of text image, until updated each parameter restrains.

45. device according to claim 44, which is characterized in that the identification model training module is also used to:

46. device according to claim 36, which is characterized in that the text filed recognition result includes the text The corresponding multiple probability matrixs in region；

The content of text determining module is also used to:

Determine the position of the most probable value in each probability matrix；

47. device according to claim 46, which is characterized in that the content of text determining module is also used to:

48. device according to claim 36, which is characterized in that described device further include:

Data obtaining module, if for include in described image it is multiple text filed, obtain it is each it is described it is text filed in Content of text；

Sensitive information determining module, determining in the corresponding content of text of described image for the sensitive dictionary by pre-establishing is No includes sensitive information.

49. device according to claim 48, which is characterized in that the sensitive information determining module is also used to:

Participle operation is carried out to the content of text got；

50. device according to claim 49, which is characterized in that described device further include:

Area identification module, it is text filed belonging to the participle of successful match for obtaining, acquisition is identified in described image That arrives is described text filed.

51. a kind of electronic equipment, which is characterized in that including processor and memory, the memory is stored with can be described The machine-executable instruction that processor executes, the processor execute the machine-executable instruction to realize claim 1 to 7 Described in any item text detection model training methods, the described in any item text filed determining methods of claim 8 to 10, or The step of described in any item content of text of person's claim 11 to 25 determine method.

52. a kind of machine readable storage medium, which is characterized in that the machine readable storage medium is stored with the executable finger of machine It enables, for the machine-executable instruction when being called and being executed by processor, machine-executable instruction promotes processor to realize that right is wanted Ask 1 to 7 described in any item text detection model training methods, the described in any item text filed determinations of claim 8 to 10 The step of method or the described in any item content of text of claim 11 to 25 determine method.