CN109858432A - Method and device, the computer equipment of text information in a kind of detection image - Google Patents

Method and device, the computer equipment of text information in a kind of detection image Download PDF

Info

Publication number
CN109858432A
CN109858432A CN201910081909.7A CN201910081909A CN109858432A CN 109858432 A CN109858432 A CN 109858432A CN 201910081909 A CN201910081909 A CN 201910081909A CN 109858432 A CN109858432 A CN 109858432A
Authority
CN
China
Prior art keywords
pixel
target image
text
value
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910081909.7A
Other languages
Chinese (zh)
Other versions
CN109858432B (en
Inventor
杨武魁
刘学博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN201910081909.7A priority Critical patent/CN109858432B/en
Publication of CN109858432A publication Critical patent/CN109858432A/en
Application granted granted Critical
Publication of CN109858432B publication Critical patent/CN109858432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application discloses method and device, the computer equipment of text information in a kind of detection image, the described method includes: obtaining the characteristic information of each pixel in target image, the characteristic information includes the pixel in the corresponding multiple probability values of multiple Scaling intervals;Based on the characteristic information of each pixel in target image, the location information and dimensional information of the text in the target image are determined.

Description

Method and device, the computer equipment of text information in a kind of detection image
Technical field
This application involves the methods and dress of text information in natural language processing technique more particularly to a kind of detection image It sets, computer equipment.
Background technique
Natural language processing is a very important research field of computer vision.Under normal conditions, in natural language It says in processing system, high-precision model treatment speed is often slower.By taking text detection as an example, in order to improve text detection Precision, it usually needs high-resolution picture is inputted, this is because small text is difficult in the case where input photo resolution is small It detected, and the processing time of network inputs and network model increases at secondary relationship substantially.How natural language is improved Treatment effeciency is problem to be solved.
Apply for content
In order to solve the above technical problems, the embodiment of the present application provides the method and dress of text information in a kind of detection image It sets, storage medium, computer program product, computer equipment.
The method of text information in detection image provided by the embodiments of the present application, comprising:
The characteristic information of each pixel in target image is obtained, the characteristic information includes the pixel multiple The corresponding multiple probability values of Scaling interval;
Based on the characteristic information of each pixel in target image, the position letter of the text in the target image is determined Breath and dimensional information.
In a kind of embodiment of the application, the characteristic information based on each pixel in target image, really The location information and dimensional information of text in the fixed target image, comprising:
For each pixel in the target image, the picture is determined based on the characteristic information of the pixel The corresponding most probable value of vegetarian refreshments, and using the corresponding Scaling interval of the most probable value as the corresponding target of the pixel Scaling interval;
Based in the target image the corresponding most probable value of each pixel and target scale section, determine institute State the location information and dimensional information of the text in target image.
In a kind of embodiment of the application, the corresponding maximum of each pixel based in the target image Probability value and target scale section determine the location information and dimensional information of the text in the target image, comprising:
Based on the corresponding most probable value of each pixel in the target image, score chart, the score chart are generated In each point and the target image in each pixel correspond, the value of each point is described in the score chart The most probable value of corresponding pixel in target image, the Scaling interval that each pair of point is answered in the score chart are the target The corresponding target scale section of corresponding pixel in image;
Based on the score chart, the location information and dimensional information of the text in the target image are determined.
It is described to be based on the score chart in a kind of embodiment of the application, determine the text in the target image Location information and dimensional information, comprising:
The value of each point in the score chart is compared with threshold value, generates binary map based on comparative result, institute The each point stated in each point and the score chart in binary map corresponds, and the value of each point is based in the binary map The value of corresponding point in the score chart and the comparison result of the threshold value determine that each pair of point is answered in the binary map Scaling interval is the corresponding Scaling interval of point corresponding in the score chart;
From the location information of the text extracted in the binary map in the target image, and the position based on the text Information determines the dimensional information of the text.
In a kind of embodiment of the application, the value of each point is based on pair in the score chart in the binary map The value for the point answered and the comparison result of the threshold value determine, comprising:
If the value of the point in the score chart is greater than the threshold value, the value of corresponding point is in the binary map First numerical value;
If the value of the point in the score chart is less than or equal to the threshold value, corresponding point is taken in the binary map Value is second value.
It is described from the text extracted in the binary map in the target image in a kind of embodiment of the application Location information, comprising:
Determine that one or more connected regions in the binary map, the connected region refer to value in the binary map Point for first numerical value is formed by continuum, obtains the corresponding callout box of the connected region, and the callout box is covered Cover the connected region;
Based on the location information of the callout box, the location information of the text is determined.
In a kind of embodiment of the application, the target image is the figure for obtain after down-sampling to original image Picture;Alternatively,
The target image is original image.
The device of text information in detection image provided by the embodiments of the present application, comprising:
Feature extraction unit, for obtaining the characteristic information of each pixel in target image, the characteristic information packet The pixel is included in the corresponding multiple probability values of multiple Scaling intervals;
Text information determination unit determines the mesh for the characteristic information based on each pixel in target image The location information and dimensional information of text in logo image.
In a kind of embodiment of the application, the text information determination unit, comprising:
It is maximized subelement, each pixel for being directed in the target image, the institute based on the pixel State characteristic information and determine the corresponding most probable value of the pixel, and using the corresponding Scaling interval of the most probable value as The corresponding target scale section of the pixel;
Subelement is determined, for based on the corresponding most probable value of each pixel and target in the target image Scaling interval determines the location information and dimensional information of the text in the target image.
In a kind of embodiment of the application, the determining subelement is used for:
Based on the corresponding most probable value of each pixel in the target image, score chart, the score chart are generated In each point and the target image in each pixel correspond, the value of each point is described in the score chart The most probable value of corresponding pixel in target image, the Scaling interval that each pair of point is answered in the score chart are the target The corresponding target scale section of corresponding pixel in image;
Based on the score chart, the location information and dimensional information of the text in the target image are determined.
In a kind of embodiment of the application, the determining subelement is used for:
The value of each point in the score chart is compared with threshold value, generates binary map based on comparative result, institute The each point stated in each point and the score chart in binary map corresponds, and the value of each point is based in the binary map The value of corresponding point in the score chart and the comparison result of the threshold value determine that each pair of point is answered in the binary map Scaling interval is the corresponding Scaling interval of point corresponding in the score chart;
From the location information of the text extracted in the binary map in the target image, and the position based on the text Information determines the dimensional information of the text.
In a kind of embodiment of the application, the value of each point is based on pair in the score chart in the binary map The value for the point answered and the comparison result of the threshold value determine, comprising:
If the value of the point in the score chart is greater than the threshold value, the value of corresponding point is in the binary map First numerical value;
If the value of the point in the score chart is less than or equal to the threshold value, corresponding point is taken in the binary map Value is second value.
In a kind of embodiment of the application, the determining subelement is used for:
Determine that one or more connected regions in the binary map, the connected region refer to value in the binary map Point for first numerical value is formed by continuum, obtains the corresponding callout box of the connected region, and the callout box is covered Cover the connected region;
Based on the location information of the callout box, the location information of the text is determined.
In a kind of embodiment of the application, the target image is the figure for obtain after down-sampling to original image Picture;Alternatively,
The target image is original image.
Computer program product provided by the embodiments of the present application includes computer executable instructions, and the computer is executable to be referred to After order is performed, the method that can be realized text information in above-mentioned detection image.
It is stored with executable instruction on storage medium provided by the embodiments of the present application, which is executed by processor The method of text information in Shi Shixian above-mentioned detection image.
Computer equipment provided by the embodiments of the present application includes memory and processor, and calculating is stored on the memory Machine executable instruction can realize above-mentioned detection figure when the processor runs the computer executable instructions on the memory The method of text information as in.
In the technical solution of the embodiment of the present application, the characteristic information of each pixel in target image, the spy are obtained Reference breath includes the pixel in the corresponding multiple probability values of multiple Scaling intervals;Based on each pixel in target image Characteristic information, determine the location information and dimensional information of the text in the target image.Using the skill of the embodiment of the present application Art scheme can extract the location information and dimensional information of text, wherein according to the location information of text from target image It can be accurately truncated to text, reduce the calculating cost of extraneous background information, it can be defeated to network according to the dimensional information of text Enter to carry out text dimension normalization, the precision of model on the one hand can be improved, on the other hand can reduce the resolution ratio of daimonji And then improve the speed of model.The preliminary screening text from target image is realized by the technical solution of the embodiment of the present application to believe The purpose of breath improves the treatment effeciency of subsequent natural language.
Detailed description of the invention
The attached drawing for constituting part of specification describes embodiments herein, and together with description for explaining The principle of the application.
The application can be more clearly understood according to following detailed description referring to attached drawing, in which:
Fig. 1 is the flow diagram one of the method for text information in detection image provided by the embodiments of the present application;
Fig. 2 (a) is the schematic diagram of character area in target image provided by the embodiments of the present application;
Fig. 2 (b) is the range schematic diagram of text provided by the embodiments of the present application;
Fig. 3 is the schematic diagram of Scaling interval provided by the embodiments of the present application;
Fig. 4 is the architecture diagram of neural network provided by the embodiments of the present application;
Fig. 5 is the flow diagram two of the method for text information in detection image provided by the embodiments of the present application;
Fig. 6 is the structure composition schematic diagram of the device of text information in detection image provided by the embodiments of the present application;
Fig. 7 is the structure composition schematic diagram of the computer equipment of the embodiment of the present application.
Specific embodiment
The various exemplary embodiments of the application are described in detail now with reference to attached drawing.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally The range of application.
Simultaneously, it should be appreciated that for ease of description, the size of various pieces shown in attached drawing is not according to reality Proportionate relationship draw.
Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the application And its application or any restrictions used.
Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, then in subsequent attached drawing does not need that it is further discussed.
The embodiment of the present application can be applied to the electronic equipments such as computer system/server, can with it is numerous other general Or special-purpose computing system environment or configuration operate together.Suitable for what is be used together with electronic equipments such as computer system/servers Well-known computing system, environment and/or the example of configuration include but is not limited to: personal computer system, server calculate Machine system, thin client, thick client computer, hand-held or laptop devices, microprocessor-based system, set-top box, programmable-consumer Electronic product, NetPC Network PC, minicomputer system, large computer system and the distribution including above-mentioned any system Cloud computing technology environment, etc..
The electronic equipments such as computer system/server can be in the executable finger of the computer system executed by computer system It enables and being described under the general context of (such as program module).In general, program module may include routine, program, target program, group Part, logic, data structure etc., they execute specific task or realize specific abstract data type.Computer system/ Server can be implemented in distributed cloud computing environment, and in distributed cloud computing environment, task is by by communication network chain What the remote processing devices connect executed.In distributed cloud computing environment, it includes the sheet for storing equipment that program module, which can be located at, On ground or remote computing system storage medium.
Fig. 1 is the flow diagram one of the method for text information in detection image provided by the embodiments of the present application, such as Fig. 1 institute Show, in the detection image text information method the following steps are included:
Step 101: obtaining the characteristic information of each pixel in target image, the characteristic information includes the pixel Point is in the corresponding multiple probability values of multiple Scaling intervals.
In the embodiment of the present application, the target image can obtain by various modes.Such as: using camera to target Object is shot, and target image is obtained.Again for example: downloading from the Internet to obtain target image.Again for example: by leading to end to end Letter mode receives target image.Again for example: the file of certain format being converted into picture format, as target image.Example again Such as: target object being scanned using scanner, obtains target image.Further, it is only to can be a frame for the target image Vertical image, the frame image being also possible in video sequence.
In the embodiment of the present application, in the target image have word content, the type of the word content with no restrictions, example Such as Chinese character, English word, number, symbol.
In the embodiment of the present application, the target image is the image for obtain after down-sampling to original image;Alternatively, institute Stating target image is original image.In one embodiment, the resolution ratio of original image is higher, can carry out to the original image Down-sampling obtains the target image of low resolution, so as to accelerate the detection speed of text information.In another embodiment In, the resolution ratio of original image is lower, it may not be necessary to carry out down-sampling processing to original image, directly detect from original image Text information.
In the embodiment of the present application, the location information of text characterizes the position of the text in the target image;The scale of text , often there is word content not of uniform size in the size of the information representation text, i.e. some texts in the target image Scale is larger, and the scale of some texts is smaller.
In the embodiment of the present application, Scaling interval characterizes some range scale, can be by a big scale when specific implementation Region division is several continuous Scaling intervals, and the range size of each Scaling interval may be the same or different.
For example: the input of neural network is target image, it is assumed that target image I, resolution ratio W*H, nerve net The output of network is y_pred, and the corresponding true value of y_pred is y_gt, and here, neural network carries out times of down-sampling to target image Number is s (value of s can be with flexible setting), then the resolution ratio of y_pred and y_gt is H/s*W/s.Assuming that t1 ..., tn } be Character area in I has information that { xi, yi, wi, hi } for text ti (1≤i≤n) as shown in Fig. 2 (a), In, xi, yi are the top left co-ordinate of ti, and wi is the width of ti, and hi is the height of ti, as shown in Fig. 2 (b).The scale of ti is based on Wi and/or hi determine, such as: the scale of ti is hi, then for example: the scale of ti is wi, another example is: the scale of ti be max (hi, Wi), max expression is maximized operation.It should be noted that the balancing method of the scale of text is not limited to this, such as ti Scale isOn the other hand, dimensional area [2^3,2^9] is divided into 60 Scaling intervals, then each Scaling interval Range size be 2^0.1, it should be noted that " ^ " represents power operation, and 2^3 refers to that 23 powers, 2^9 refer to 29 powers, 2^0.1 refers to 2 0.1 power.As shown in Figure 3, wherein the range of first Scaling interval is [2^3,2^3.1], second ruler The range for spending section is [2^3.1,2^3.2], and so on, the range of the last one Scaling interval is [2^8,2^9].For text Word ti can determine that the scale of the text falls into some Scaling interval b (b in 0≤b≤60, b 60 Scaling intervals of expression A Scaling interval) probability y_gt (x, y, i), traverse all character areas { t1 ..., tn }, all texts fallen into ruler The probability of degree section b is summed, and the probability of text occurs in available Scaling interval b.It, can be by such as when specific implementation Under type obtains y_gt (x, y, i), has for all xi/s < x < (xi+hi)/s, yi/s < y < (yi+hi)/s:
Y_gt (x, y, i)=y_gt (x, y, i)+f ((s_li+s_ri)/2)
F (x)=exp (- (x-log2 (s)) ^2/ (2* δ ^2))
(s_li+s_ri)/2 is substituted into as x in the formula of f (x), available f ((s_li+s_ri)/2), specifically:
F ((s_li+s_ri)/2)=exp (- ((s_li+s_ri)/2-log2 (s)) ^2/ (2* δ ^2))
Wherein, s_li, s_ri are the right boundary coordinate of Scaling interval b, and δ is coefficient, which is constant.It needs to illustrate , f (x) can replace with other functions, such as exponential function etc..
Through the above scheme, there is the probability of text in each Scaling interval in available 60 Scaling intervals.
It should be noted that the dimensional area in above scheme can carry out scale area to the dimensional area with flexible setting Between division can also be with flexible setting, such as the number of the dimensional area Scaling interval that includes can be with flexible setting.
In the embodiment of the present application, the characteristic information of each pixel in target image is obtained using neural network.Specifically When realization, the location information and dimensional information of the text in the neural network prediction target image of a full convolution can use. Specifically, the input of neural network is target image, it is assumed that target image I, resolution ratio W*H, nerve net referring to Fig. 4 The multiple that network carries out down-sampling to target image is s (value of s can be with flexible setting), and the size of neural network output is H/s* The characteristic information of W/s*B, wherein B indicates the number (such as 60 Scaling intervals) of Scaling interval, and H/s*W/s*B can regard B as A size is the characteristic information of H/s*W/s, and the characteristic information that B size is H/s*W/s is corresponded with B Scaling interval.Its In, the characteristic information that position is in { xi, yi, b } refers to that pixel (coordinate xi, yi) corresponds to the probability value of Scaling interval b, As it can be seen that the characteristic information of a pixel includes the corresponding B probability value of B Scaling interval in total.Here, pixel is in scale B corresponding probability value in section is bigger, shows that the general of text (scale of the text is in Scaling interval range b) occurs in the pixel Rate is bigger;Conversely, pixel is smaller in the corresponding probability value of Scaling interval b, show that the text (ruler of the text occurs in the pixel Spend in Scaling interval range b) probability it is smaller.In the above manner, available pixel is corresponding in B Scaling interval B probability value.
Step 102: the characteristic information based on each pixel in target image determines the text in the target image Location information and dimensional information.
In the embodiment of the present application, since pixel is bigger in the corresponding probability value of Scaling interval b, show that the pixel occurs The probability of text (scale of the text is in Scaling interval range b) is bigger, then can be by each pixel in B scale area Between corresponding B probability value, outline the biggish region of probability value, which be considered to the region for text occur, so as to The location information of text is obtained, further, since probability value and Scaling interval have an incidence relation, thus can be according to outlining The location information in region determines the dimensional information of text.
The technical solution of the embodiment of the present application carries out preliminary text information to target image by mini Mod and screens, obtains To text information include text location information and dimensional information, then, after capable of being effectively improved using these text informations The accuracy and speed of continuous natural-sounding processing.
Fig. 5 is the flow diagram two of the method for text information in detection image provided by the embodiments of the present application, such as Fig. 5 institute Show, in the detection image text information method the following steps are included:
Step 501: obtaining the characteristic information of each pixel in target image, the characteristic information includes the pixel Point is in the corresponding multiple probability values of multiple Scaling intervals.
In the embodiment of the present application, the target image can obtain by various modes.Such as: using camera to target Object is shot, and target image is obtained.Again for example: downloading from the Internet to obtain target image.Again for example: by leading to end to end Letter mode receives target image.Again for example: the file of certain format being converted into picture format, as target image.Example again Such as: target object being scanned using scanner, obtains target image.Further, it is only to can be a frame for the target image Vertical image, the frame image being also possible in video sequence.
In the embodiment of the present application, in the target image have word content, the type of the word content with no restrictions, example Such as Chinese character, English word, number, symbol.
In the embodiment of the present application, the target image is the image for obtain after down-sampling to original image;Alternatively, institute Stating target image is original image.In one embodiment, the resolution ratio of original image is higher, can carry out to the original image Down-sampling obtains the target image of low resolution, so as to accelerate the detection speed of text information.In another embodiment In, the resolution ratio of original image is lower, it may not be necessary to carry out down-sampling processing to original image, directly detect from original image Text information.
In the embodiment of the present application, the location information of text characterizes the position of the text in the target image;The scale of text , often there is word content not of uniform size in the size of the information representation text, i.e. some texts in the target image Scale is larger, and the scale of some texts is smaller.
In the embodiment of the present application, Scaling interval characterizes some range scale, can be by a big scale when specific implementation Region division is several continuous Scaling intervals, and the range size of each Scaling interval may be the same or different.
For example: the input of neural network is target image, it is assumed that target image I, resolution ratio W*H, nerve net The output of network is y_pred, and the corresponding true value of y_pred is y_gt, and here, neural network carries out times of down-sampling to target image Number is s (value of s can be with flexible setting), then the resolution ratio of y_pred and y_gt is H/s*W/s.Assuming that t1 ..., tn } be Character area in I has information that { xi, yi, wi, hi } for text ti (1≤i≤n) as shown in Fig. 2 (a), In, xi, yi are the top left co-ordinate of ti, and wi is the width of ti, and hi is the height of ti, as shown in Fig. 2 (b).The scale of ti is based on Wi and/or hi determine, such as: the scale of ti is hi, then for example: the scale of ti is wi, another example is: the scale of ti be max (hi, Wi), max expression is maximized operation.It should be noted that the balancing method of the scale of text is not limited to this, such as ti Scale isOn the other hand, dimensional area [2^3,2^9] is divided into 60 Scaling intervals, then each Scaling interval Range size be 2^0.1, it should be noted that " ^ " represents power operation, and 2^3 refers to that 23 powers, 2^9 refer to 29 powers, 2^0.1 refers to 2 0.1 power.As shown in Figure 3, wherein the range of first Scaling interval is [2^3,2^3.1], second ruler The range for spending section is [2^3.1,2^3.2], and so on, the range of the last one Scaling interval is [2^8,2^9].For text Word ti can determine that the scale of the text falls into some Scaling interval b (b in 0≤b≤60, b 60 Scaling intervals of expression A Scaling interval) probability y_gt (x, y, i), traverse all character areas { t1 ..., tn }, all texts fallen into ruler The probability of degree section b is summed, and the probability of text occurs in available Scaling interval b.It, can be by such as when specific implementation Under type obtains y_gt (x, y, i), has for all xi/s < x < (xi+hi)/s, yi/s < y < (yi+hi)/s:
Y_gt (x, y, i)=y_gt (x, y, i)+f ((s_li+s_ri)/2)
F (x)=exp (- (x-log2 (s)) ^2/ (2* δ ^2))
(s_li+s_ri)/2 is substituted into as x in the formula of f (x), available f ((s_li+s_ri)/2), specifically:
F ((s_li+s_ri)/2)=exp (- ((s_li+s_ri)/2-log2 (s)) ^2/ (2* δ ^2))
Wherein, s_li, s_ri are the right boundary coordinate of Scaling interval b, and δ is coefficient, which is constant.It needs to illustrate , f (x) can replace with other functions, such as exponential function etc..
Through the above scheme, there is the probability of text in each Scaling interval in available 60 Scaling intervals.
It should be noted that the dimensional area in above scheme can carry out scale area to the dimensional area with flexible setting Between division can also be with flexible setting, such as the number of the dimensional area Scaling interval that includes can be with flexible setting.
In the embodiment of the present application, the characteristic information of each pixel in target image is obtained using neural network.Specifically When realization, the location information and dimensional information of the text in the neural network prediction target image of a full convolution can use. Specifically, the input of neural network is target image, it is assumed that target image I, resolution ratio W*H, nerve net referring to Fig. 4 The multiple that network carries out down-sampling to target image is s (value of s can be with flexible setting), and the size of neural network output is H/s* The characteristic information of W/s*B, wherein B indicates the number (such as 60 Scaling intervals) of Scaling interval, and H/s*W/s*B can regard B as A size is the characteristic information of H/s*W/s, and the characteristic information that B size is H/s*W/s is corresponded with B Scaling interval.Its In, the characteristic information that position is in { x, y, b } refers to that pixel (coordinate x, y) corresponds to the probability value of Scaling interval b, can See, the characteristic information of a pixel includes the corresponding B probability value of B Scaling interval in total.Here, pixel is in scale area Between the corresponding probability value of b it is bigger, show that the probability of text (scale of the text is in Scaling interval range b) occurs in the pixel It is bigger;Conversely, pixel is smaller in the corresponding probability value of Scaling interval b, show that the text (scale of the text occurs in the pixel In Scaling interval range b) probability it is smaller.In the above manner, available pixel is in the corresponding B of B Scaling interval A probability value.
Step 502: for each pixel in the target image, the characteristic information based on the pixel is true Determine the corresponding most probable value of the pixel, and using the corresponding Scaling interval of the most probable value as the pixel pair The target scale section answered.
Step 503: based in the target image the corresponding most probable value of each pixel and target scale area Between, determine the location information and dimensional information of the text in the target image.
Specifically, based on the corresponding most probable value of each pixel in the target image, score chart is generated, it is described Each pixel in each point and the target image in score chart corresponds, the value of each point in the score chart For the most probable value of corresponding pixel in the target image, the Scaling interval that each pair of point is answered in the score chart is institute State the corresponding target scale section of corresponding pixel in target image.Then, it is based on the score chart, determines the target figure The location information and dimensional information of text as in.
Referring to Fig. 4, the characteristic information that the size to neural network output is H/s*W/s*B is a max along B dimension Operation obtains the score chart that size is H/s*W/s, and in the score chart, the value of the point of coordinate x, y is the corresponding B of the point Most probable value in a probability value, and each point obtains corresponding Scaling interval b (0 when most probable value on the figure that keeps the score ≤ b≤B, b indicate b-th of Scaling interval in B Scaling interval).
In one embodiment of the application, the value of each point in the score chart is compared with threshold value, is based on Comparison result generates binary map, and each point in each point and the score chart in the binary map corresponds, and described two The value for being worth each point in figure is determining based on the value of the corresponding point in the score chart and the comparison result of the threshold value, institute The Scaling interval that each pair of point in binary map is answered is stated as the corresponding Scaling interval of point corresponding in the score chart;Then, from institute The location information that the text in the target image is extracted in binary map is stated, and based on described in the determination of the location information of the text The dimensional information of text.
In one embodiment, if the value of the point in the score chart is greater than the threshold value, in the binary map The value of corresponding point is the first numerical value (such as 1);If the value of the point in the score chart is less than or equal to the threshold value, The value of corresponding point is second value (such as 0) in the binary map.
In one embodiment, determine that one or more connected regions in the binary map, the connected region refer to Value is that the point of first numerical value is formed by continuum in the binary map, obtains the corresponding mark of the connected region Frame, the callout box cover the connected region;Based on the location information of the callout box, the position letter of the text is determined Breath.
For example: one threshold tau of setting initializes the binary map an of size and the consistent full vacation of score chart, this two The each point being worth in each point and score chart in figure corresponds, and the value of corresponding position score chart in the binary map is greater than τ The value of point be set as very, here, value is that can very be indicated with 1, and value is that vacation can be indicated with 0.Then, to score chart The location information of text in original image can be obtained in middle connected region after asking rectangle frame, each rectangle frame coordinate to up-sample s times, There is corresponding Scaling interval since the position of each point is recorded when carrying out the selection operation of most probable value, therefore, it is possible to Corresponding Scaling interval b is determined according to the location information of text, then, the corresponding dimensional information of the text is 2^b.
The technical solution of the embodiment of the present application carries out preliminary text information to target image by mini Mod and screens, obtains To text information include text location information and dimensional information, then, after capable of being effectively improved using these text informations The accuracy and speed of continuous natural-sounding processing.By the dimensional information of predictive text, on the one hand make the text ruler of network inputs Very little size is as consistent as possible, improves the prediction effect of network;On the other hand, down-sampling energy is carried out to daimonji using network size The processing time of subsequent network is enough effectively reduced;By the location information of predictive text, the region where intercepting text can subtract The calculating cost of few extraneous background.
Fig. 6 is the structure composition schematic diagram of the device of text information in detection image provided by the embodiments of the present application, such as Fig. 6 Shown, described device includes:
Feature extraction unit 601, for obtaining the characteristic information of each pixel in target image, the characteristic information Including the pixel in the corresponding multiple probability values of multiple Scaling intervals;
Text information determination unit 602, for the characteristic information based on each pixel in target image, determine described in The location information and dimensional information of text in target image.
In a kind of embodiment of the application, the text information determination unit 602, comprising:
It is maximized subelement (not shown), for being based on institute for each pixel in the target image The characteristic information for stating pixel determines the corresponding most probable value of the pixel, and the most probable value is corresponding Scaling interval is as the corresponding target scale section of the pixel;
Subelement (not shown) is determined, for corresponding most probably based on each pixel in the target image Rate value and target scale section determine the location information and dimensional information of the text in the target image.
In a kind of embodiment of the application, the determining subelement is used for:
Based on the corresponding most probable value of each pixel in the target image, score chart, the score chart are generated In each point and the target image in each pixel correspond, the value of each point is described in the score chart The most probable value of corresponding pixel in target image, the Scaling interval that each pair of point is answered in the score chart are the target The corresponding target scale section of corresponding pixel in image;
Based on the score chart, the location information and dimensional information of the text in the target image are determined.
In a kind of embodiment of the application, the determining subelement is used for:
The value of each point in the score chart is compared with threshold value, generates binary map based on comparative result, institute The each point stated in each point and the score chart in binary map corresponds, and the value of each point is based in the binary map The value of corresponding point in the score chart and the comparison result of the threshold value determine that each pair of point is answered in the binary map Scaling interval is the corresponding Scaling interval of point corresponding in the score chart;
From the location information of the text extracted in the binary map in the target image, and the position based on the text Information determines the dimensional information of the text.
In a kind of embodiment of the application, the value of each point is based on the correspondence in the score chart in the binary map Point value and the threshold value comparison result determine, comprising:
If the value of the point in the score chart is greater than the threshold value, the value of corresponding point is in the binary map First numerical value;
If the value of the point in the score chart is less than or equal to the threshold value, corresponding point is taken in the binary map Value is second value.
In a kind of embodiment of the application, the determining subelement is used for:
Determine that one or more connected regions in the binary map, the connected region refer to value in the binary map Point for first numerical value is formed by continuum, obtains the corresponding callout box of the connected region, and the callout box is covered Cover the connected region;
Based on the location information of the callout box, the location information of the text is determined.
In a kind of embodiment of the application, the target image is the figure for obtain after down-sampling to original image Picture;Alternatively, the target image is original image.
It will be appreciated by those skilled in the art that each unit in detection image shown in fig. 6 in the device of text information Realize that function can refer to the associated description of the method for text information in aforementioned detection image and understand.Detection image shown in fig. 6 The function of each unit in the device of middle text information can realize and running on the program on processor, can also be by specific Logic circuit and realize.
If the device of text information is realized in the form of software function module in the above-mentioned detection image of the embodiment of the present application And when sold or used as an independent product, it also can store in a computer readable storage medium.Based in this way Understanding, substantially the part that contributes to existing technology can be produced the technical solution of the embodiment of the present application in other words with software The form of product embodies, which is stored in a storage medium, including some instructions are used so that one Platform computer equipment (can be personal computer, server or network equipment etc.) executes described in each embodiment of the application The all or part of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read Only Memory), the various media that can store program code such as magnetic or disk.In this way, the embodiment of the present application is not limited to appoint What specific hardware and software combines.
Correspondingly, the embodiment of the present application also provides a kind of computer program product, wherein being stored with, computer is executable to be referred to It enables, which is performed the side of text information in the above-mentioned detection image for can be realized the embodiment of the present application Method.
Fig. 7 is the structure composition schematic diagram of the computer equipment of the embodiment of the present application, as shown in fig. 7, computer equipment 100 It may include that (processor 1002 can include but is not limited to micro process to one or more (one is only shown in figure) processors 1002 Device (MCU, Micro Controller Unit) or programmable logic device (FPGA, Field Programmable Gate ) etc. Array processing unit), memory 1004 for storing data and the transmitting device 1006 for communication function. It will appreciated by the skilled person that structure shown in Fig. 7 is only to illustrate, the structure of above-mentioned electronic device is not made At restriction.For example, computer equipment 100 may also include the more perhaps less component than shown in Fig. 7 or have and Fig. 7 Shown different configuration.
Memory 1004 can be used for storing the software program and module of application software, such as the method in the embodiment of the present application Corresponding program instruction/module, the software program and module that processor 1002 is stored in memory 1004 by operation, from And perform various functions application and data processing, that is, realize above-mentioned method.Memory 1004 may include high speed random storage Device may also include nonvolatile memory, such as one or more magnetic storage device, flash memory or other are non-volatile solid State memory.In some instances, memory 1004 can further comprise the memory remotely located relative to processor 1002, These remote memories can pass through network connection to computer equipment 100.The example of above-mentioned network includes but is not limited to interconnect Net, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 1006 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of computer equipment 100 provide.In an example, transmitting device 1006 includes a network Adapter (NIC, Network Interface Controller), can be connected by base station with other network equipments so as to It is communicated with internet.In an example, transmitting device 1006 can be radio frequency (RF, Radio Frequency) module, It is used to wirelessly be communicated with internet.
It, in the absence of conflict, can be in any combination between technical solution documented by the embodiment of the present application.
In several embodiments provided herein, it should be understood that disclosed method and smart machine, Ke Yitong Other modes are crossed to realize.Apparatus embodiments described above are merely indicative, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as: multiple units or components can be tied It closes, or is desirably integrated into another system, or some features can be ignored or not executed.In addition, shown or discussed each group Can be through some interfaces at the mutual coupling in part or direct-coupling or communication connection, equipment or unit it is indirect Coupling or communication connection, can be electrical, mechanical or other forms.
Above-mentioned unit as illustrated by the separation member, which can be or may not be, to be physically separated, aobvious as unit The component shown can be or may not be physical unit, it can and it is in one place, it may be distributed over multiple network lists In member;Some or all of units can be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
In addition, can be fully integrated into a second processing unit in each functional unit in each embodiment of the application, It is also possible to each unit individually as a unit, can also be integrated in one unit with two or more units; Above-mentioned integrated unit both can take the form of hardware realization, can also add the form of SFU software functional unit real using hardware It is existing.
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain Lid is within the scope of protection of this application.

Claims (10)

1. a kind of method of text information in detection image, which is characterized in that the described method includes:
The characteristic information of each pixel in target image is obtained, the characteristic information includes the pixel in multiple scales The corresponding multiple probability values in section;
Based on the characteristic information of each pixel in target image, determine the text in the target image location information and Dimensional information.
2. the method according to claim 1, wherein the feature based on each pixel in target image Information determines the location information and dimensional information of the text in the target image, comprising:
For each pixel in the target image, the pixel is determined based on the characteristic information of the pixel Corresponding most probable value, and using the corresponding Scaling interval of the most probable value as the corresponding target scale of the pixel Section;
Based in the target image the corresponding most probable value of each pixel and target scale section, determine the mesh The location information and dimensional information of text in logo image.
3. according to the method described in claim 2, it is characterized in that, each pixel pair based in the target image The most probable value answered and target scale section determine the location information and dimensional information of the text in the target image, Include:
Based on the corresponding most probable value of each pixel in the target image, score chart is generated, in the score chart Each pixel in each point and the target image corresponds, and the value of each point is the target in the score chart The most probable value of corresponding pixel in image, the Scaling interval that each pair of point is answered in the score chart are the target image In the corresponding target scale section of corresponding pixel;
Based on the score chart, the location information and dimensional information of the text in the target image are determined.
4. according to the method described in claim 3, it is characterized in that, it is described be based on the score chart, determine the target image In text location information and dimensional information, comprising:
The value of each point in the score chart is compared with threshold value, generates binary map based on comparative result, described two The each point being worth in each point and the score chart in figure corresponds, and the value of each point is based on described in the binary map The value of corresponding point in score chart and the comparison result of the threshold value are determining, the scale that each pair of point is answered in the binary map Section is the corresponding Scaling interval of point corresponding in the score chart;
From the location information of the text extracted in the binary map in the target image, and the location information based on the text Determine the dimensional information of the text.
5. according to the method described in claim 4, it is characterized in that, the value of each point is based on the score in the binary map The value of corresponding point in figure and the comparison result of the threshold value determine, comprising:
If the value of the point in the score chart is greater than the threshold value, the value of corresponding point is first in the binary map Numerical value;
If the value of the point in the score chart is less than or equal to the threshold value, the value of corresponding point is in the binary map Second value.
6. the device of text information in a kind of detection image, which is characterized in that described device includes:
Feature extraction unit, for obtaining the characteristic information of each pixel in target image, the characteristic information includes institute Pixel is stated in the corresponding multiple probability values of multiple Scaling intervals;
Text information determination unit determines the target figure for the characteristic information based on each pixel in target image The location information and dimensional information of text as in.
7. device according to claim 6, which is characterized in that the text information determination unit, comprising:
It is maximized subelement, each pixel for being directed in the target image, the spy based on the pixel Reference breath determines the corresponding most probable value of the pixel, and using the corresponding Scaling interval of the most probable value as described in The corresponding target scale section of pixel;
Subelement is determined, for based on the corresponding most probable value of each pixel and target scale in the target image Section determines the location information and dimensional information of the text in the target image.
8. a kind of computer program product, which is characterized in that the computer program product includes computer executable instructions, should After computer executable instructions are performed, method and step described in any one of claim 1 to 5 can be realized.
9. a kind of storage medium, which is characterized in that be stored with executable instruction on the storage medium, which is located Reason device realizes method and step described in any one of claim 1 to 5 when executing.
10. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor, the memory On be stored with computer executable instructions, can be realized when the processor runs the computer executable instructions on the memory Method and step described in any one of claim 1 to 5.
CN201910081909.7A 2019-01-28 2019-01-28 Method and device for detecting character information in image and computer equipment Active CN109858432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910081909.7A CN109858432B (en) 2019-01-28 2019-01-28 Method and device for detecting character information in image and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910081909.7A CN109858432B (en) 2019-01-28 2019-01-28 Method and device for detecting character information in image and computer equipment

Publications (2)

Publication Number Publication Date
CN109858432A true CN109858432A (en) 2019-06-07
CN109858432B CN109858432B (en) 2022-01-04

Family

ID=66896531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910081909.7A Active CN109858432B (en) 2019-01-28 2019-01-28 Method and device for detecting character information in image and computer equipment

Country Status (1)

Country Link
CN (1) CN109858432B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778470A (en) * 2015-03-12 2015-07-15 浙江大学 Character detection and recognition method based on component tree and Hough forest
CN104852892A (en) * 2014-02-18 2015-08-19 天津市阿波罗信息技术有限公司 Autonomous login method and identification method of novel Internet of Things website system
CN105760891A (en) * 2016-03-02 2016-07-13 上海源庐加佳信息科技有限公司 Chinese character verification code recognition method
CN106257496A (en) * 2016-07-12 2016-12-28 华中科技大学 Mass network text and non-textual image classification method
CN108062547A (en) * 2017-12-13 2018-05-22 北京小米移动软件有限公司 Character detecting method and device
CN108073979A (en) * 2016-11-14 2018-05-25 顾泽苍 A kind of ultra-deep study of importing artificial intelligence knows method for distinguishing for image
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
EP3432167A1 (en) * 2017-07-21 2019-01-23 Tata Consultancy Services Limited System and method for theme extraction

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104852892A (en) * 2014-02-18 2015-08-19 天津市阿波罗信息技术有限公司 Autonomous login method and identification method of novel Internet of Things website system
CN104778470A (en) * 2015-03-12 2015-07-15 浙江大学 Character detection and recognition method based on component tree and Hough forest
CN105760891A (en) * 2016-03-02 2016-07-13 上海源庐加佳信息科技有限公司 Chinese character verification code recognition method
CN106257496A (en) * 2016-07-12 2016-12-28 华中科技大学 Mass network text and non-textual image classification method
CN108073979A (en) * 2016-11-14 2018-05-25 顾泽苍 A kind of ultra-deep study of importing artificial intelligence knows method for distinguishing for image
EP3432167A1 (en) * 2017-07-21 2019-01-23 Tata Consultancy Services Limited System and method for theme extraction
CN108062547A (en) * 2017-12-13 2018-05-22 北京小米移动软件有限公司 Character detecting method and device
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JADERBERG M ET AL.: "Deep Features for Text Spotting", 《COMPUTER VISION-ECCV 2014.SPRINGER INTERNATIONAL PUBLISHING》 *

Also Published As

Publication number Publication date
CN109858432B (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN108205655B (en) Key point prediction method and device, electronic equipment and storage medium
CN109961009B (en) Pedestrian detection method, system, device and storage medium based on deep learning
EP3786892B1 (en) Method, device and apparatus for repositioning in camera orientation tracking process, and storage medium
CN109522874B (en) Human body action recognition method and device, terminal equipment and storage medium
US10043308B2 (en) Image processing method and apparatus for three-dimensional reconstruction
CN111242088B (en) Target detection method and device, electronic equipment and storage medium
US10452953B2 (en) Image processing device, image processing method, program, and information recording medium
JP2020507850A (en) Method, apparatus, equipment, and storage medium for determining the shape of an object in an image
CN111476306A (en) Object detection method, device, equipment and storage medium based on artificial intelligence
CN107771391B (en) Method and apparatus for determining exposure time of image frame
CN114186632B (en) Method, device, equipment and storage medium for training key point detection model
CN110287775B (en) Palm image clipping method, palm image clipping device, computer equipment and storage medium
CN107644423B (en) Scene segmentation-based video data real-time processing method and device and computing equipment
CN111209377B (en) Text processing method, device, equipment and medium based on deep learning
CN111680675B (en) Face living body detection method, system, device, computer equipment and storage medium
CN103946865B (en) Method and apparatus for contributing to the text in detection image
US11604963B2 (en) Feedback adversarial learning
CN111104813A (en) Two-dimensional code image key point detection method and device, electronic equipment and storage medium
JP5820236B2 (en) Image processing apparatus and control method thereof
CN110619334A (en) Portrait segmentation method based on deep learning, architecture and related device
CN111833344A (en) Medical image processing method and device, electronic equipment and storage medium
CN109697442B (en) Training method and device of character recognition model
JP7124957B2 (en) Image processing system, estimation device, processing method and program
CN108921138B (en) Method and apparatus for generating information
CN112036307A (en) Image processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant