WO2023024793A1 - Character recognition method and related device thereof - Google Patents

Character recognition method and related device thereof Download PDF

Info

Publication number
WO2023024793A1
WO2023024793A1 PCT/CN2022/107728 CN2022107728W WO2023024793A1 WO 2023024793 A1 WO2023024793 A1 WO 2023024793A1 CN 2022107728 W CN2022107728 W CN 2022107728W WO 2023024793 A1 WO2023024793 A1 WO 2023024793A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recognized
text
text image
slice
Prior art date
Application number
PCT/CN2022/107728
Other languages
French (fr)
Chinese (zh)
Inventor
蔡悦
张宇轩
黄灿
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023024793A1 publication Critical patent/WO2023024793A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • the present application relates to the technical field of data processing, in particular to a character recognition method and related equipment.
  • the application range of character recognition technology is getting wider and wider.
  • the character recognition technology is used to perform recognition processing on characters appearing in an image.
  • long text recognition refers to a process of character recognition for an image including long text.
  • the present application provides a character recognition method and related equipment, which can improve the recognition accuracy of long text recognition.
  • An embodiment of the present application provides a character recognition method, the method comprising:
  • the text image to be recognized is acquired, the text image to be recognized is subjected to a first segmentation process according to preset slice parameters to obtain at least one image slice and position information of the at least one image slice; wherein, the text image to be recognized Text images include long text;
  • the word detection result of the at least one image slice and the position information of the at least one image slice determine the actual image cutting position corresponding to the text image to be recognized
  • the character recognition result of the at least one image to be used is determined.
  • the determining the actual image cutting position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice includes:
  • the position information of the at least one image slice, and the preset cut position corresponding to the text image to be recognized determine the actual cut position corresponding to the text image to be recognized .
  • the process of determining the actual cutting position corresponding to the text image to be recognized includes:
  • the word detection result of described at least one image slice is spliced according to the positional information of described at least one image slice, obtains the word detection result of described to-be-recognized text image;
  • the actual cut position corresponding to the text image to be recognized is determined.
  • the preset slice parameters include a segmentation interval and a segmentation offset length; wherein, the segmentation offset length is smaller than the segmentation interval;
  • the process of determining the at least one image slice includes:
  • Segmenting the image to be segmented according to the segmentation interval to obtain at least one image slice.
  • the preset slicing parameters also include a resection starting position
  • the determination process of the image to be segmented includes:
  • the process of determining the word detection result of the at least one image slice includes:
  • the embodiment of the present application also provides a text recognition device, including:
  • the first segmentation unit is configured to perform a first segmentation process on the text image to be recognized according to preset slice parameters after acquiring the text image to be recognized, to obtain at least one image slice and the position of the at least one image slice Information; wherein, the text image to be recognized includes long text;
  • a position determination unit configured to determine the actual cut-out position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice;
  • the second segmentation unit is configured to perform a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one image to be used;
  • the result determination unit is configured to determine the character recognition result of the text image to be recognized according to the character recognition result of the at least one image to be used.
  • the embodiment of the present application also provides a device, the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the character recognition method provided in the embodiments of the present application according to the computer program.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
  • Fig. 1 is the flowchart of a kind of character recognition method that the embodiment of the present application provides;
  • FIG. 2 is a schematic diagram of a text image to be recognized provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of another text image to be recognized provided in the embodiment of the present application.
  • FIG. 4 is a schematic diagram of an image slice processing process provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a comparison of two character recognition processes provided by the embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a single word detection model provided in the embodiment of the present application.
  • FIG. 7 is a schematic diagram of a character recognition process provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a character recognition device provided by an embodiment of the present application.
  • substantial reduction usually greatly reduces the image definition, the reduced image is prone to blurred image content, so that the character recognition result determined based on the reduced image is inaccurate , so that the recognition accuracy of long text recognition is low.
  • the embodiment of the present application provides a text recognition method, the method includes: after obtaining the text image to be recognized including long text, first the text image to be recognized according to Performing the first segmentation process with preset slice parameters to obtain at least one image slice and the position information of the at least one image slice; cutting position; then, according to the actual cutting position corresponding to the text image to be recognized, the text image to be recognized is subjected to a second segmentation process to obtain at least one picture to be used; finally, according to the text of the at least one picture to be used
  • the recognition result determines the character recognition result of the text image to be recognized, so that the character recognition process for long text can be realized.
  • the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition.
  • the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
  • the embodiment of the present application does not limit the subject of the character recognition method.
  • the character recognition method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers.
  • the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer.
  • the server can be an independent server, a cluster server or a cloud server.
  • FIG. 1 this figure is a flow chart of a character recognition method provided by an embodiment of the present application.
  • the character recognition method provided in the embodiment of this application includes S1-S5:
  • the text image to be recognized refers to an image that requires character recognition processing (especially long text recognition processing); and the text image to be recognized includes long text (especially super long text).
  • long text refers to a text whose number of characters exceeds the first threshold; moreover, the first threshold can be preset.
  • Extremely long text refers to text whose number of characters exceeds a second threshold; the second threshold can be preset, and the second threshold is greater than the above-mentioned "first threshold”.
  • the text image to be recognized may be the image to be processed as shown in FIG. 2 , or the text image corresponding to the image to be processed as shown in FIG. 3 .
  • the "text image corresponding to the image to be processed” refers to an image cut from the image to be processed according to the text detection result of the image to be processed.
  • Example 1 may specifically include: after the image to be processed is acquired, the image to be processed may be directly determined as the text image to be recognized.
  • S1 in order to avoid as much as possible other image information other than text in the image to be processed from causing adverse effects on long text recognition, S1 can specifically include S11-S12:
  • the image to be processed refers to an image that requires image processing (such as text detection and/or character recognition); and the embodiment of the present application does not limit the image to be processed, for example, the image to be processed may be a frame of video image.
  • the text detection result of the image to be processed is used to describe the position of the text in the image to be processed (for example, "this is an image including long text") in the image to be processed.
  • the embodiment of the present application does not limit the implementation of "text detection” in S11, and any existing or future method capable of text detection for images can be used for implementation.
  • the image area corresponding to the text detection result in the image to be processed is cut off to obtain the text image to be recognized (as shown in Figure 2 ). 3), so that the text image to be recognized can more accurately represent the character information carried by the image to be processed.
  • the text image to be recognized can be determined according to the image to be processed, so that the text image to be recognized is used to represent the image to be processed
  • the character information carried by the image so that the character information carried by the image to be processed can be accurately determined subsequently based on the text image to be recognized.
  • S2 Perform a first segmentation process on the text image to be recognized according to preset slice parameters to obtain at least one image slice and position information of the at least one image slice.
  • the "preset slice parameter” refers to the parameter that needs to be referred to when performing the first segmentation process on the text image to be recognized; and the embodiment of the present application does not limit the “preset slice parameter", for example, it may include the segmentation interval.
  • Segmentation interval is used to indicate the distance between two adjacent segmentation positions when the first segmentation process is performed on the text image to be recognized; and the embodiment of the present application does not limit the “segmentation interval” (for example, FIG. 4 512 pixels shown).
  • the "first slicing process” is used to indicate the slicing process implemented according to the above-mentioned “preset slicing parameters”.
  • At least one image slice refers to at least one image segment obtained after the first segmentation process of the text image to be recognized; and "position information of at least one image slice” is used to describe the position of each image slice in the text image to be recognized location.
  • the embodiment of the present application does not limit the process of determining "at least one image slice".
  • two possible implementation manners are described below in combination.
  • the determination process of "at least one image slice” may specifically include: first segmenting the text image to be recognized according to the segmentation interval Processing to obtain at least one image slice, so that the length of each image slice is the above-mentioned "segment interval” (eg, 512 pixels as shown in FIG. 4 ).
  • the embodiment of the present application also provides another possible implementation of determining "at least one image slice".
  • the determination process of "at least one image slice” may specifically include S21-S22:
  • the "segmentation offset length" is used to indicate the segmentation offset to be used when performing the first segmentation process on the text image to be recognized; and the “segmentation offset length” can be smaller than the above-mentioned “segmentation interval ".
  • the embodiment of the present application does not limit the "segmentation offset length". For example, as shown in FIG. 4, when the above-mentioned "segmentation interval" is 512 pixels, the “segmentation offset length" can be 256 pixels .
  • the embodiment of the present application does not limit the position of the above-mentioned "image area with segmentation offset length", for example, it may be located in the leftmost area of the text image to be recognized (as shown in Figure 4), or it may be located in the text image to be recognized The rightmost region of the text image may also be located in a preset inner region of the text image to be recognized.
  • S21 may specifically include S211-S212:
  • S211 Determine the position of the resection area according to the resection starting position and the segmentation offset length.
  • the "cutting start position” is used to indicate the position of a boundary position (such as the left end boundary position) of the above-mentioned "image area with segmentation offset length" in the above-mentioned "text image to be recognized”; and the present application
  • the embodiment does not limit the “cutting start position", for example, as shown in FIG. 4 , it may be the left boundary position of the text image to be recognized.
  • Excision area position is used to indicate the position of the above-mentioned “image area with segmentation offset length” in the "text image to be recognized”; and the length of the "excision area position” is the above-mentioned “segmentation offset length” , the boundary position of the "resection area position” includes the above-mentioned “resection start position”.
  • S212 Execute region cutting processing on the text image to be recognized according to the position of the cutting region to obtain the image to be segmented.
  • the image area occupying the position of the cut-out area (that is, the above-mentioned "image area with a segmentation offset length") can be cut off from the text image to be recognized, and The remaining area of the text image to be recognized is determined as the image to be segmented, so that the image to be segmented is used to represent other image areas in the text image to be recognized except the above-mentioned "image area with a segmentation offset length", Therefore, the image to be segmented does not include the above-mentioned "image region with a segmented offset length".
  • the image area with the segmentation offset length can be cut off from the text image to be recognized to obtain the image to be segmented, so that the image to be segmented does not include
  • image region with segmentation offset length is used to perform segmentation processing on the image to be segmented subsequently.
  • S22 Perform segmentation processing on the image to be segmented according to the segmentation interval to obtain at least one image slice.
  • the image to be segmented may be segmented according to the segmentation interval to obtain at least one image slice (a plurality of image slices as shown in FIG. 4 ).
  • the segmenting position used when performing segmentation processing on the "image to be segmented” is relative to the "text image to be recognized”.
  • Text image has a certain amount of offset, so that the segmentation position used for the above-mentioned “image to be segmented” is almost impossible to be the same as the following “preset image segmentation position” position, which can effectively avoid the above-mentioned adverse effects caused by the phenomenon of broken characters in the "first segmentation process".
  • the text image to be recognized can be subjected to the first segmentation process according to the preset slice parameters to obtain at least one image slice and the position information of the at least one image slice, In order to realize the character recognition result for the "text image to be recognized" based on the at least one image slice later.
  • the processing process for each image slice The time consumption is far less than the time consumption for processing the image of the text to be recognized, which is beneficial to improve the efficiency of text recognition.
  • the "single character detection result of at least one image slice" is used to indicate the position of each character in each image slice.
  • the embodiment of the present application does not limit the determination process of "the word detection result of at least one image slice". For example, word detection processing may be performed on each image slice to obtain the word detection result of each image slice. It should be noted that the embodiment of the present application does not limit the implementation manner of "single word detection processing", for example, any existing or future single word detection method may be used for implementation. As another example, the "single character detection model” shown below can be used for implementation.
  • the embodiment of the present application also provides another possible implementation manner of determining "the word detection result of at least one image slice", which may specifically include: using a pre-built word detection model to at least A parallel word detection process is performed on one image slice to obtain a word detection result of the at least one image slice.
  • the single-character detection model is used to perform character position detection (for example, to perform character boundary position detection) on the input data of the single-character detection model.
  • the word detection model 600 may include a feature extraction layer 601 and a word position determination layer 602, and the The input data of the word position determination layer 602 includes the output data of the feature extraction layer 601 .
  • the process of determining the single character detection result of the above target image is taken as an example to describe below.
  • the "target image” is used to represent any image slice in the above "at least one image slice”.
  • the process of using the word detection model 600 to determine the above-mentioned "single word detection result” may specifically include steps 11-12:
  • Step 11 Input the target image into the feature extraction layer 601 to obtain the image position feature output by the feature extraction layer 601 .
  • the feature extraction layer 601 is used to perform feature extraction for the input data of the feature extraction layer 601; and the embodiment of the present application does not limit the feature extraction layer 601, for example, the feature extraction layer 601 can use any convolutional neural network ( Convolutional Neural Networks, CNN) for implementation (eg, (Visual Geometry Group, VGG) network can be used for implementation).
  • CNN Convolutional Neural Networks
  • VGG Visual Geometry Group
  • the image position feature is used to represent the information carried by each position in the target image (especially, the information carried by each position in the width direction).
  • the embodiment of the present application does not limit the image position feature.
  • the target image is a [C, H, W] matrix
  • the image position feature may be a [1, 1, W/4] matrix.
  • Step 12 Input the feature of the image position into the word position determination layer 602, and obtain the word detection result of the target image output by the word position determination layer 602.
  • the character position determining layer 602 is used for performing character boundary position recognition processing on the input data of the character position determining layer 602 .
  • the embodiment of the present application does not limit the word position determination layer 602.
  • the word location determination layer 602 may include a location classification layer and a location mapping layer, and the input data of the location mapping layer includes the output data of the location classification layer.
  • the determination process of the above-mentioned "single character detection result" may include steps 21-22:
  • Step 21 Input the image position feature into the position classification layer, and obtain the position classification result output by the position classification layer.
  • the position classification layer is used to judge whether the input data of the position classification layer belongs to the character boundary position.
  • the embodiment of the present application does not limit the implementation manner of the position classification layer, and any existing or future classifier (eg, softmax, etc.) may be used for implementation.
  • any existing or future classifier eg, softmax, etc.
  • the position classification result is used to indicate whether each position in the target image belongs to a character boundary (especially, whether each position in the width direction of the target image belongs to a character boundary).
  • Step 22 Input the position classification result into the position mapping layer, and obtain the word detection result of the target image output by the position mapping layer.
  • the position mapping layer is used for performing mapping processing on the position mapping layer.
  • the embodiment of the present application does not limit the working principle of the location mapping layer.
  • the location mapping layer may map each location in the location classification result according to formula (1).
  • y represents the mapped position coordinate corresponding to x
  • "a” represents the ratio between the width of the target image and the width of the image position feature (for example, 4);
  • x represents a position coordinate in the position classification result ( In particular, a position coordinate of the position classification result in the width direction);
  • “b” represents the convolution offset used in the feature extraction layer 601 .
  • the width of the image position feature is smaller than the width of the target image (for example, the width of the image position feature is 1/4 of the width of the target image)
  • the position classification result determined based on the image position feature The width of is also smaller than the width of the target image (for example, the width of the position classification result is also 1/4 of the width of the target image), at this time, in order to describe more accurately whether each position of the target image in the width direction belongs to the character
  • Each position coordinate of the position classification result in the width direction can be mapped to the position coordinates of the target image in the width direction according to formula (1).
  • step 11 to step 12 Based on the relevant content of the above step 11 to step 12, it can be known that for the word detection model 600 shown in FIG. Perform feature extraction processing and single character position determination processing, obtain and output the single character detection result of the target image, so that the single character detection result can accurately represent the boundary position of each character in the target image.
  • the single character detection model can be constructed in advance according to the sample text image and the actual position of each character in the sample text image.
  • the sample text image refers to the image used to construct the word detection model; and the embodiment of the present application does not limit the number of sample text images.
  • the embodiment of the present application does not limit the actual position of each character in the sample text image, for example, it may be the actual boundary position of each character in the sample text image.
  • the embodiment of the present application does not limit the construction process of the single word detection model, for example, in a possible implementation manner, the construction process of the single word detection model may include steps 31-step 34:
  • Step 31 Input the sample text image into the model to be trained, and obtain the predicted character position of the sample text image output by the model to be trained.
  • the model to be trained is used for character position detection (for example, character boundary position detection) for the data input of the model to be trained.
  • the model structure of the model to be trained is the same as the "word detection model" above, so the relevant content of the model structure of the model to be trained can refer to the relevant content of the model structure of the "word detection model” above.
  • the predicted character position of the sample text image is used to describe the predicted position of at least one character in the sample text image.
  • Step 32 Judging whether the preset stop condition is met, if yes, execute step 34; if not, execute step 33.
  • the preset stop condition can be set in advance, for example, the preset stop condition can be that the loss value of the model to be trained is lower than the preset loss threshold, or that the rate of change of the loss value of the model to be trained is lower than the preset
  • the rate-of-change threshold (that is, the character position detection performance of the model to be trained is in a convergent state) may also be that the number of updates of the model to be trained reaches a preset threshold.
  • the loss value of the model to be trained is used to characterize the character position detection performance of the model to be trained; moreover, the embodiment of the present application does not limit the method for determining the loss value of the model to be trained.
  • the preset loss threshold, the preset change rate threshold, and the preset number of times threshold can all be preset.
  • Step 33 Update the model to be trained according to the predicted character position of the sample text image and the actual position of each character in the sample text image, and return to step 31.
  • the prediction of the sample text image can be based on The difference between the character position and the actual position of each character in the sample text image, update the model to be trained in the current round, so that the updated model to be trained has better character position detection performance, and return to continue execution Step 31 and its subsequent steps.
  • Step 34 Determine a word detection model according to the model to be trained.
  • the model to be trained in the current round after it is determined that the model to be trained in the current round has reached the preset stop condition, it can be determined that the model to be trained in the current round has good character position detection performance, so it can be based on the model to be trained in the current round, Determine the single-word detection model (for example, the model to be trained in the current round can be directly determined as the single-word detection model.
  • the model structure and the model of the single-word detection model can be determined according to the model structure and model parameters of the model to be trained in the current round parameters, so that the model structure and model parameters of the word detection model are consistent with the model structure and model parameters of the current round of the model to be trained), so that the word detection model also has better character position detection performance, so that Subsequent word detection results determined for at least one image slice using the word detection model can accurately represent the positions of each character in each image slice.
  • the embodiment of the present application does not limit the determination process of "the actual cutting position corresponding to the text image to be recognized" (That is, the embodiment of S3), for example, can first determine the word position information of the text image to be recognized according to the word detection result of at least one image slice and the position information of the at least one image slice; then according to the text image to be recognized
  • the single character position information of the text image to be recognized is determined to determine the actual cutting position corresponding to the text image to be recognized, so that the "actual cutting position corresponding to the text image to be recognized" will not appear inside the character as much as possible.
  • end users may set character recognition efficiency requirements; or, different application scenarios may correspond to different character recognition efficiency requirements.
  • the embodiment of the present application also provides a possible implementation of S3, which may specifically include: according to the word detection result of at least one image slice, the at least one image slice The position information of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized determine the actual cutting position corresponding to the text image to be recognized.
  • the "preset cutting position corresponding to the text image to be recognized” refers to the cutting position preset for the text image to be recognized; and the “preset cutting position corresponding to the text image to be recognized” is based on the above “text Identification of efficiency needs” identified.
  • this embodiment of the present application does not limit the preset cut position corresponding to the text image to be recognized, for example, it may include at least one hard cut position.
  • the "hard cutting position" is used to indicate a preset cutting position corresponding to the text image to be recognized.
  • the text image to be recognized shown in FIG. 7 is taken as an example for description below.
  • the preset cut position corresponding to the text image to be recognized may be ⁇ 512, 1024, 1536, 2048 ⁇ .
  • "512", “1024", “1536", and “2048” are hard-cut positions corresponding to the text image to be recognized.
  • the embodiment of the present application does not limit the process of determining the preset cutting position corresponding to the text image to be recognized.
  • it may specifically include steps 41-42:
  • Step 41 Obtain preset segmentation parameters.
  • the "preset segmentation parameter” is used to indicate the maximum width of a cut image (that is, the distance between two adjacent hard cut positions in the above-mentioned "preset cut image position”); and the preset cut image
  • the sub-parameters can be preset according to the application scenario (in particular, they can be set according to the character recognition efficiency requirement in the application scenario).
  • the preset segmentation parameter may be 512 pixel values.
  • Step 42 According to the preset segmentation parameters and the text image to be recognized, determine the preset cutting position corresponding to the text image to be recognized.
  • the preset segmentation parameters can be referred to to determine the preset cutting position corresponding to the text image to be recognized (as ⁇ 512, 1024, 1536, 2048 ⁇ in Fig. 7) , so that the position interval between adjacent positions in the preset image cutting position does not exceed the preset cutting parameter.
  • the preset cut-out position corresponding to the text image to be recognized can be determined according to the application scenario (in particular, it can be determined according to the text recognition efficiency requirements in the application scenario), so that based on the preset
  • the actual image cutting position determined by the image cutting position can perform image segmentation on the premise of meeting the text recognition efficiency requirements in this application scenario, so that the text recognition method provided by this application can meet the text recognition efficiency requirements in this application scenario.
  • this embodiment of the present application does not limit the above-mentioned implementation of determining the actual cutting position corresponding to the text image to be recognized by referring to the above-mentioned "preset cutting position". For example, it may specifically include steps 51-52:
  • Step 51 Concatenate the word detection result of at least one image slice according to the position information of the at least one image slice, to obtain the word detection result of the text image to be recognized.
  • the "single word detection result of the text image to be recognized" is used to describe the position of at least one character in the text image to be recognized.
  • this embodiment of the present application does not limit the "single character detection result of the text image to be recognized", for example, the single character detection result may include at least one boundary position.
  • boundary position is used to indicate the edge position of a character.
  • the word detection result of the text image to be recognized may be ⁇ 43, 82, 293, 309, . . . ⁇ .
  • “43” represents the left boundary of "this”
  • “82” represents the right boundary of “this”
  • “293” represents the left boundary of "yes”
  • "309” represents the right boundary of “yes”
  • the word detection result of the at least one image slice can be spliced according to the position information of the at least one image slice to obtain the text to be recognized
  • the single-character detection result of the image so that the "single-character detection result of the text image to be recognized" is used to describe the position of at least one character in the text image to be recognized.
  • Step 52 Determine the actual cutting position corresponding to the text image to be recognized according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
  • the actual cutting position corresponding to the text image to be recognized can be determined by referring to the above two and the determination process can specifically include: as shown in Figure 7, a preset algorithm can be used to match the preset cut-out position corresponding to the text image to be recognized with the word detection result of the text image to be recognized to obtain the text image to be recognized The actual cutting position corresponding to the text image.
  • the preset algorithm may be preset, for example, the preset algorithm may be a greedy algorithm or a Hungarian algorithm.
  • step 52 In order to facilitate the understanding of step 52, the following will be described in conjunction with an example.
  • Step 52 may specifically include Step 61-Step 63:
  • Step 61 Determine a first position set and a second position set according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
  • the number of positions in the first position set is not less than the number of positions in the second position set. That is, the first position set refers to a set with more image cutting positions, and the second position set refers to a set with fewer image cutting positions.
  • step 61 may specifically include step 611-step 612:
  • Step 611 If the number of boundary positions is not less than the number of hard-cut positions, determine the set of "at least one boundary position" as the first set of positions, and determine the set of "at least one hard-cut position” Set for the second position.
  • Step 612 If the number of boundary positions is lower than the number of hard-cut positions, then determine the set of "at least one hard-cut position" as the first set of positions, and determine the set of "at least one boundary position” as Second location set.
  • the first position set and the second position set can be based on the number of cut-out positions (that is, boundary positions) represented by the word detection result and the number of cut-out positions represented by the preset cut-out position
  • the size relationship between the positions (that is, the hard cut position) number is determined, so that the first position set is used to represent the position in the cut picture position represented by the word detection result and the preset cut picture position represented by the cut picture position.
  • the set with a larger number, and the second set of positions is used to represent the set with a smaller number of positions among the image cut positions indicated by the word detection result and the preset image cut positions.
  • the word detection result of the text image to be recognized is the position set ⁇ 43, 82, 293, 309, ... ⁇ shown in Figure 7, and the preset cutting position corresponding to the text image to be recognized is as shown in Figure 4
  • the set of locations ⁇ 512, 1024, 1536, 2048 ⁇ the first set of locations may be ⁇ 43, 82, 293, 309, ... ⁇ , and the second set of locations may be ⁇ 512, 1024, 1536, 2048 ⁇ .
  • Step 62 Match each position in the second position set with at least one position in the first position set, and obtain matching results corresponding to each position in the second position set.
  • the position that successfully matches the nth position in the second position set can be searched from the first position set (for example, from the first position set to find The closest position to the nth position in the second position set), to obtain the matching result corresponding to the nth position in the second position set, so that the matching result corresponding to the nth position in the second position set is used Yu represents the location in the first location set that successfully matches the nth location.
  • the matching result corresponding to 512 may be that "512" and "335" match successfully, ... (and so on).
  • Step 63 According to the matching results corresponding to each position in the second position set, determine the actual cutting position corresponding to the text image to be recognized.
  • the actual image cutting position corresponding to the text image to be recognized can be determined by referring to the matching result corresponding to each position in the second position set ( For example, the matching result corresponding to each position in the second position set is directly determined as the actual cutting position corresponding to the text image to be recognized).
  • the number of cutting positions indicated by the word detection result can be determined first. number and the number of cut-out positions indicated by the preset cut-out positions; The image cut positions are matched to obtain the matching results corresponding to each image cut position in the set of cut image positions with fewer image cut positions; finally, according to the matching results, the actual image cut position corresponding to the text image to be recognized is determined.
  • step 52 may specifically include step 71-step 74:
  • Step 71 Determine a first position set and a second position set according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
  • step 71 for the relevant content of step 71, please refer to S21 above.
  • Step 72 If it is determined that the second set of positions includes at least one boundary position, then determine the second set of positions as the actual cut-out position corresponding to the text image to be recognized.
  • the second set of positions includes at least one boundary position
  • it can be determined that the second set of positions is determined according to the single word detection result of the text image to be recognized, so that each position in the second set of positions is not will appear inside the character
  • the second set of positions can be directly determined as the actual cut-out position corresponding to the text image to be recognized, so that the actual cut-out position will not appear inside the character, so that based on the actual cut-out position
  • Step 73 If it is determined that the second set of positions includes at least one hard-cut position, then match each position in the second set of positions with at least one position in the first set of positions, and obtain matching results corresponding to each position in the second set of positions .
  • step 73 can be implemented by using any implementation manner of S62 above.
  • the second set of positions includes at least one hard cut position, it can be determined that the second set of positions is determined according to the preset cut position corresponding to the text image to be recognized, so that the second set of positions may appear in the character Therefore, the positions that can be successfully matched with each position in the second position set can be searched from the first position set, so that these found positions can be used to determine the actual cut-out position corresponding to the text image to be recognized, so that the The actual image cutting position will not appear inside the character, so that the character will not be damaged when cutting the image based on the actual image cutting position, so that it can effectively avoid the occurrence of incomplete characters, which is beneficial to improve the recognition accuracy of long text recognition.
  • Step 74 According to the matching results corresponding to each position in the second position set, determine the actual image cutting position corresponding to the text image to be recognized.
  • step 74 please refer to the above S63.
  • step 71 to step 74 Based on the relevant content of the above step 71 to step 74, after obtaining the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, it should be selected from the word detection result as much as possible.
  • the actual image cutting position corresponding to the text image to be recognized so that the actual image cutting position can meet the character recognition efficiency requirement in the application scenario as much as possible without cutting the characters.
  • the above two can be combined to determine the actual text image corresponding to the text image to be recognized.
  • the image cutting position so that the actual image cutting position corresponding to the text image to be recognized can meet the text recognition efficiency requirements in the application scenario as much as possible without cutting the characters.
  • the preset cutting position corresponding to the text image to be recognized is determined according to the preset segmentation parameters corresponding to the application scene, so that The actual image cutting position determined based on the preset image cutting position also meets the text recognition efficiency requirements in this application scenario, so that the text recognition process based on the preset image cutting position can meet the text recognition efficiency requirements in this application scenario In this way, on the premise of ensuring the recognition accuracy of long text recognition, the text recognition efficiency requirements in different application scenarios can be met as much as possible.
  • the text image to be recognized can be determined by referring to the word detection result and position information of the at least one image slice The corresponding actual cutting position.
  • S4 Perform a second segmentation process on the text image to be recognized according to the actual picture cutting position corresponding to the text image to be recognized, to obtain at least one picture to be used.
  • the "second segmentation processing” refers to the process of performing segmentation processing on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized.
  • the text image to be recognized can be segmented according to the actual cutout position to obtain each cutout image corresponding to the text image to be recognized, and each cutout
  • the pictures are respectively determined as pictures to be used.
  • S5 Determine the text recognition result of the text image to be recognized according to the text recognition result of at least one image to be used.
  • the text recognition result of the picture to be used is used to describe the character information carried by the picture to be used; and the embodiment of the present application does not limit the determination process of the text recognition result of the picture to be used, any existing or future A character recognition method is implemented (for example, an OCR model can be used for implementation).
  • character recognition processing can be performed on all the pictures to be used in parallel to obtain a character recognition result of each picture to be used.
  • the character recognition result of the text image to be recognized is used to describe the character information carried by the text image to be recognized.
  • S5 may specifically include: splicing the text recognition results of at least one picture to be used according to the sequence corresponding to the at least one picture to be used to obtain the text image to be recognized. Text recognition results.
  • the arrangement order corresponding to at least one picture to be used is used to represent the positional adjacent relationship of the at least one picture to be used in the text image to be recognized;
  • the pictures to be used are adjacent to each other, the picture to be used with the sequence number 2 is adjacent to the picture to be used with the sequence number 3, ... (and so on), the picture to be used with the sequence number T-1 is adjacent to the picture to be used with the sequence number T
  • the images to be used are adjacent to each other.
  • T is a positive integer
  • T represents the number of pictures to be used.
  • the text image to be recognized is firstly processed according to the preset slice parameters. Segmentation processing to obtain at least one image slice and the position information of the at least one image slice; then according to the word detection result and the position information of the at least one image slice, determine the actual cutting position corresponding to the text image to be recognized; then, According to the actual cutting position corresponding to the text image to be recognized, the text image to be recognized is subjected to a second segmentation process to obtain at least one picture to be used; finally, according to the text recognition result of the at least one picture to be used, determine the text to be recognized The character recognition result of recognizing the text image, so that the character recognition process for long text can be realized.
  • the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition.
  • the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
  • the embodiment of the present application also provides a character recognition device, which will be explained and described below with reference to the accompanying drawings.
  • FIG. 8 is a schematic structural diagram of a character recognition device provided by an embodiment of the present application.
  • the character recognition device 800 provided in the embodiment of the present application includes:
  • the first segmentation unit 801 is configured to, after acquiring the text image to be recognized, perform the first segmentation process on the text image to be recognized according to preset slice parameters, to obtain at least one image slice and the at least one image slice Location information; wherein, the text image to be recognized includes long text;
  • a position determination unit 802 configured to determine the actual cut-out position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice;
  • the second segmentation unit 803 is configured to perform a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one picture to be used;
  • the result determining unit 804 is configured to determine a character recognition result of the to-be-recognized text image according to the character recognition result of the at least one image to be used.
  • the position determination unit 802 is specifically configured to: according to the word detection result of the at least one image slice, the position information of the at least one image slice, and the text image to be recognized
  • the preset image cutting position is used to determine the actual image cutting position corresponding to the text image to be recognized.
  • the position determination unit 802 is specifically configured to: splicing the word detection results of the at least one image slice according to the position information of the at least one image slice to obtain the to-be-recognized The word detection result of the text image; according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, determine the actual cutting position corresponding to the text image to be recognized.
  • the preset slice parameters include a segmentation interval and a segmentation offset length; wherein, the segmentation offset length is smaller than the segmentation interval;
  • the first dividing unit 801 includes:
  • a region cutting subunit configured to cut an image region having the segmentation offset length from the text image to be recognized to obtain the image to be segmented
  • the image slice subunit is configured to perform segmentation processing on the image to be segmented according to the segmentation interval to obtain at least one image slice.
  • the preset slicing parameters also include a resection starting position
  • the region cutting subunit is specifically configured to: determine the position of the cutting region according to the cutting start position and the segmentation offset length; perform region cutting processing on the text image to be recognized according to the cutting region position, Obtain the image to be segmented.
  • the process of determining the word detection result of the at least one image slice includes: using a pre-built word detection model to perform parallel word detection processing on the at least one image slice to obtain the at least one The single character detection result of the image slice; wherein, the single character detection model is constructed according to the sample text image and the actual position of each character in the sample text image.
  • the first segmentation process is performed on the text image to be recognized according to the preset slice parameters , to obtain at least one image slice and the position information of the at least one image slice; then according to the word detection result and the position information of the at least one image slice, determine the actual cut-out position corresponding to the text image to be recognized; then, according to the to-be-recognized
  • the text image corresponds to the actual image cutting position
  • the second segmentation process is performed on the text image to be recognized to obtain at least one image to be used; finally, according to the text recognition result of the at least one image to be used, determine the text image to be identified Text recognition results, so that the text recognition process for long texts can be implemented.
  • the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition.
  • the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
  • the embodiment of the present application also provides a device, the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the character recognition method provided in the embodiments of the present application according to the computer program.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any of the character recognition methods provided in the embodiment of the present application. implementation.
  • the embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.

Abstract

The present application discloses a character recognition method and a related device thereof. The method comprises: after acquiring a text image to be recognized comprising a long text, performing first segmentation processing on the text image to be recognized according to a preset slice parameter, obtaining at least one image slice and position information of the at least one image slice; according to a single character detection result and the position information of the at least one image slice, determining an actual slicing position corresponding to the text image to be recognized; then, according to the actual slicing position corresponding to the text image to be recognized, performing second segmentation processing on the text image to be recognized, obtaining at least one picture to be used; finally, according to the character recognition result of the at least one picture to be used, determining a character recognition result of the text image to be recognized, so that a character recognition process for the long text can be realized.

Description

一种文字识别方法及其相关设备A character recognition method and related equipment
本公开要求于2021年08月26日提交中国专利局、申请号为202110988932.1、发明名称为“一种文字识别方法及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure claims the priority of the Chinese patent application with the application number 202110988932.1 and the title of the invention "a character recognition method and related equipment" submitted to the China Patent Office on August 26, 2021, the entire contents of which are incorporated herein by reference In public.
技术领域technical field
本申请涉及数据处理技术领域,尤其涉及一种文字识别方法及其相关设备。The present application relates to the technical field of data processing, in particular to a character recognition method and related equipment.
背景技术Background technique
随着文字识别技术的发展,文字识别技术的应用范围越来越广。其中,文字识别技术用于针对一个图像中出现的字符进行识别处理。With the development of character recognition technology, the application range of character recognition technology is getting wider and wider. Among them, the character recognition technology is used to perform recognition processing on characters appearing in an image.
然而,因一些文字识别技术(如,光学字符识别(Optical Character Recognition,OCR)等技术)存在缺陷,使得这些文字识别技术在一些应用场景(如,长文本识别等应用场景)下的识别准确性较低。其中,“长文本识别”是指针对包括长文本的图像进行文字识别的过程。However, due to defects in some text recognition technologies (such as optical character recognition (Optical Character Recognition, OCR) and other technologies), the recognition accuracy of these text recognition technologies in some application scenarios (such as long text recognition and other application scenarios) lower. Wherein, "long text recognition" refers to a process of character recognition for an image including long text.
发明内容Contents of the invention
为了解决上述技术问题,本申请提供了一种文字识别方法及其相关设备,能够提高长文本识别的识别准确性。In order to solve the above technical problems, the present application provides a character recognition method and related equipment, which can improve the recognition accuracy of long text recognition.
为了实现上述目的,本申请实施例提供的技术方案如下:In order to achieve the above objectives, the technical solutions provided in the embodiments of the present application are as follows:
本申请实施例提供一种文字识别方法,所述方法包括:An embodiment of the present application provides a character recognition method, the method comprising:
在获取到待识别文本图像之后,将所述待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和所述至少一个图像切片的位置信息;其中,所述待识别文本图像包括长文本;After the text image to be recognized is acquired, the text image to be recognized is subjected to a first segmentation process according to preset slice parameters to obtain at least one image slice and position information of the at least one image slice; wherein, the text image to be recognized Text images include long text;
根据所述至少一个图像切片的单字检测结果和所述至少一个图像切片的位置信息,确定所述待识别文本图像对应的实际切图位置;According to the word detection result of the at least one image slice and the position information of the at least one image slice, determine the actual image cutting position corresponding to the text image to be recognized;
按照所述待识别文本图像对应的实际切图位置,对所述待识别文本图像进行第二切分处理,得到至少一个待使用图片;Performing a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one picture to be used;
根据所述至少一个待使用图片的文字识别结果,确定所述待识别文本图像的文字识别结果。According to the character recognition result of the at least one image to be used, the character recognition result of the text image to be recognized is determined.
在一种可能的实施方式下,所述根据所述至少一个图像切片的单字检测结果和所述至少一个图像切片的位置信息,确定所述待识别文本图像对应的实际切图位置,包括:In a possible implementation manner, the determining the actual image cutting position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice includes:
根据所述至少一个图像切片的单字检测结果、所述至少一个图像切片的位置信息、和所述待识别文本图像对应的预设切图位置,确定所述待识别文本图像对应的实际切图位置。According to the word detection result of the at least one image slice, the position information of the at least one image slice, and the preset cut position corresponding to the text image to be recognized, determine the actual cut position corresponding to the text image to be recognized .
在一种可能的实施方式下,所述待识别文本图像对应的实际切图位置的确定过程,包括:In a possible implementation manner, the process of determining the actual cutting position corresponding to the text image to be recognized includes:
将所述至少一个图像切片的单字检测结果按照所述至少一个图像切片的位置信息进行 拼接处理,得到所述待识别文本图像的单字检测结果;The word detection result of described at least one image slice is spliced according to the positional information of described at least one image slice, obtains the word detection result of described to-be-recognized text image;
根据所述待识别文本图像的单字检测结果和所述待识别文本图像对应的预设切图位置,确定所述待识别文本图像对应的实际切图位置。According to the single character detection result of the text image to be recognized and the preset cut position corresponding to the text image to be recognized, the actual cut position corresponding to the text image to be recognized is determined.
在一种可能的实施方式下,所述预设切片参数包括切分间隔和切分偏移长度;其中,所述切分偏移长度小于所述切分间隔;In a possible implementation manner, the preset slice parameters include a segmentation interval and a segmentation offset length; wherein, the segmentation offset length is smaller than the segmentation interval;
所述至少一个图像切片的确定过程,包括:The process of determining the at least one image slice includes:
从所述待识别文本图像中切除具有所述切分偏移长度的图像区域,得到待切分图像;cutting the image region with the segmentation offset length from the text image to be recognized to obtain the image to be segmented;
将所述待切分图像按照所述切分间隔进行切分处理,得到至少一个图像切片。Segmenting the image to be segmented according to the segmentation interval to obtain at least one image slice.
在一种可能的实施方式下,所述预设切片参数还包括切除起始位置;In a possible implementation manner, the preset slicing parameters also include a resection starting position;
所述待切分图像的确定过程,包括:The determination process of the image to be segmented includes:
根据所述切除起始位置和所述切分偏移长度,确定切除区域位置;Determine the position of the resection area according to the resection start position and the segmentation offset length;
按照所述切除区域位置对所述待识别文本图像进行区域切除处理,得到所述待切分图像。Perform region cutting processing on the to-be-recognized text image according to the cut-off region position to obtain the to-be-segmented image.
在一种可能的实施方式下,所述至少一个图像切片的单字检测结果的确定过程,包括:In a possible implementation manner, the process of determining the word detection result of the at least one image slice includes:
利用预先构建的单字检测模型对所述至少一个图像切片进行并行单字检测处理,得到所述至少一个图像切片的单字检测结果;其中,所述单字检测模型是根据样本文本图像和所述样本文本图像中各个字符的实际位置进行构建的。Using a pre-built word detection model to perform parallel word detection processing on the at least one image slice to obtain a word detection result of the at least one image slice; wherein the word detection model is based on the sample text image and the sample text image The actual position of each character in is constructed.
本申请实施例还提供了一种文字识别装置,包括:The embodiment of the present application also provides a text recognition device, including:
第一切分单元,用于在获取到待识别文本图像之后,将所述待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和所述至少一个图像切片的位置信息;其中,所述待识别文本图像包括长文本;The first segmentation unit is configured to perform a first segmentation process on the text image to be recognized according to preset slice parameters after acquiring the text image to be recognized, to obtain at least one image slice and the position of the at least one image slice Information; wherein, the text image to be recognized includes long text;
位置确定单元,用于根据所述至少一个图像切片的单字检测结果和所述至少一个图像切片的位置信息,确定所述待识别文本图像对应的实际切图位置;A position determination unit, configured to determine the actual cut-out position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice;
第二切分单元,用于按照所述待识别文本图像对应的实际切图位置,对所述待识别文本图像进行第二切分处理,得到至少一个待使用图片;The second segmentation unit is configured to perform a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one image to be used;
结果确定单元,用于根据所述至少一个待使用图片的文字识别结果,确定所述待识别文本图像的文字识别结果。The result determination unit is configured to determine the character recognition result of the text image to be recognized according to the character recognition result of the at least one image to be used.
本申请实施例还提供了一种设备,所述设备包括处理器以及存储器:The embodiment of the present application also provides a device, the device includes a processor and a memory:
所述存储器用于存储计算机程序;The memory is used to store computer programs;
所述处理器用于根据所述计算机程序执行本申请实施例提供的文字识别方法的任一实施方式。The processor is configured to execute any implementation of the character recognition method provided in the embodiments of the present application according to the computer program.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行本申请实施例提供的文字识别方法的任一实施方式。The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行本申请实施例提供的文字识别方法的任一实施方式。The embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in this application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的一种文字识别方法的流程图;Fig. 1 is the flowchart of a kind of character recognition method that the embodiment of the present application provides;
图2为本申请实施例提供的一种待识别文本图像的示意图;FIG. 2 is a schematic diagram of a text image to be recognized provided by an embodiment of the present application;
图3为本申请实施例提供的另一种待识别文本图像的示意图;FIG. 3 is a schematic diagram of another text image to be recognized provided in the embodiment of the present application;
图4为本申请实施例提供的一种图像切片的处理过程示意图;FIG. 4 is a schematic diagram of an image slice processing process provided by an embodiment of the present application;
图5为本申请实施例提供的一种两个文字识别过程的对比示意图;FIG. 5 is a schematic diagram of a comparison of two character recognition processes provided by the embodiment of the present application;
图6为本申请实施例提供的一种单字检测模型的结构示意图;FIG. 6 is a schematic structural diagram of a single word detection model provided in the embodiment of the present application;
图7为本申请实施例提供的一种文字识别过程的示意图;FIG. 7 is a schematic diagram of a character recognition process provided by an embodiment of the present application;
图8为本申请实施例提供的一种文字识别装置的结构示意图。FIG. 8 is a schematic structural diagram of a character recognition device provided by an embodiment of the present application.
具体实施方式Detailed ways
为发明人在针对文字识别的研究中发现,因一些文字识别模型(如,光学字符识别(Optical Character Recognition,OCR)识别模型)通常支持固定宽度的输入数据,使得在获取到包括长文本的图像之后,需要先将该图像进行大幅度缩小;再由这些文字识别模型针对缩小后的图像进行文字识别,得到该长文本的文字识别结果。然而,因上述“大幅度缩小”操作通常会大幅度降低图像清晰度,使得缩小后的图像易出现图像内容模糊不清的现象,从而使得基于该缩小后的图像确定出的文字识别结果不准确,如此导致长文本识别的识别准确性较低。The inventor found in the research on text recognition that because some text recognition models (such as Optical Character Recognition (Optical Character Recognition, OCR) recognition models) usually support fixed-width input data, it is difficult to obtain images containing long texts. Afterwards, the image needs to be greatly reduced first; then these character recognition models perform character recognition on the reduced image to obtain the character recognition result of the long text. However, because the above-mentioned "substantial reduction" operation usually greatly reduces the image definition, the reduced image is prone to blurred image content, so that the character recognition result determined based on the reduced image is inaccurate , so that the recognition accuracy of long text recognition is low.
基于上述发现,为了解决背景技术部分的技术问题,本申请实施例提供了一种文字识别方法,该方法包括:在获取到包括长文本的待识别文本图像之后,先将该待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和该至少一个图像切片的位置信息;再根据该至少一个图像切片的单字检测结果以及位置信息,确定该待识别文本图像对应的实际切图位置;然后,按照该待识别文本图像对应的实际切图位置,对该待识别文本图像进行第二切分处理,得到至少一个待使用图片;最后,根据该至少一个待使用图片的文字识别结果,确定该待识别文本图像的文字识别结果,如此能够实现针对长文本的文字识别过程。Based on the above findings, in order to solve the technical problems in the background technology section, the embodiment of the present application provides a text recognition method, the method includes: after obtaining the text image to be recognized including long text, first the text image to be recognized according to Performing the first segmentation process with preset slice parameters to obtain at least one image slice and the position information of the at least one image slice; cutting position; then, according to the actual cutting position corresponding to the text image to be recognized, the text image to be recognized is subjected to a second segmentation process to obtain at least one picture to be used; finally, according to the text of the at least one picture to be used The recognition result determines the character recognition result of the text image to be recognized, so that the character recognition process for long text can be realized.
可见,因上述“至少一个图像切片的单字检测结果以及位置信息”能够准确地表示出待识别文本图像中至少一个字符的位置信息,使得基于该单字检测结果确定的实际切图位置尽可能地不会出现在字符内部,从而使得在基于该实际切图位置进行切图时尽可能地不会出现切坏字符的现象,如此能够尽可能地避免该待识别文本图像对应的各个切图(也就是,各个待使用图片)中出现不完整字符,从而有利于提高长文本识别的识别准确性。还因各个图像切片的长度远远小于待识别文本图像的长度,使得针对各个图像切片的处理耗时远远小于针对待识别文本图像的处理耗时,如此有利于提高文字识别效率。It can be seen that because the above-mentioned "single character detection result and position information of at least one image slice" can accurately represent the position information of at least one character in the text image to be recognized, the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition. Also, because the length of each image slice is much smaller than the length of the text image to be recognized, the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
另外,本申请实施例不限定文字识别方法的执行主体,例如,本申请实施例提供的文 字识别方法可以应用于终端设备或服务器等数据处理设备。其中,终端设备可以为智能手机、计算机、个人数字助理(Personal Digital Assitant,PDA)或平板电脑等。服务器可以为独立服务器、集群服务器或云服务器。In addition, the embodiment of the present application does not limit the subject of the character recognition method. For example, the character recognition method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers. Wherein, the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer. The server can be an independent server, a cluster server or a cloud server.
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
方法实施例method embodiment
参见图1,该图为本申请实施例提供的一种文字识别方法的流程图。Referring to FIG. 1 , this figure is a flow chart of a character recognition method provided by an embodiment of the present application.
本申请实施例提供的文字识别方法,包括S1-S5:The character recognition method provided in the embodiment of this application includes S1-S5:
S1:获取待识别文本图像。S1: Obtain a text image to be recognized.
待识别文本图像是指需要进行文字识别处理(尤其是,进行长文本识别处理)的图像;而且该待识别文本图像包括长文本(尤其是,超长文本)。其中,“长文本”是指字符数超过第一阈值的文本;而且,第一阈值可以预先设定。“超长文本”指字符数超过第二阈值的文本;该第二阈值可以预先设定,而且该第二阈值大于上述“第一阈值”。The text image to be recognized refers to an image that requires character recognition processing (especially long text recognition processing); and the text image to be recognized includes long text (especially super long text). Wherein, "long text" refers to a text whose number of characters exceeds the first threshold; moreover, the first threshold can be preset. "Extremely long text" refers to text whose number of characters exceeds a second threshold; the second threshold can be preset, and the second threshold is greater than the above-mentioned "first threshold".
另外,本申请实施例不限定待识别文本图像,例如,待识别文本图像可以是图2所示的待处理图像,也可以是图3所示的该待处理图像对应的文本图像。其中,“待处理图像对应的文本图像”是指依据该待处理图像的文本检测结果从该待处理图像中切割下来的图像。另外,“待处理图像”以及“待处理图像的文本检测结果”的相关内容请参见下文S11。In addition, this embodiment of the present application does not limit the text image to be recognized. For example, the text image to be recognized may be the image to be processed as shown in FIG. 2 , or the text image corresponding to the image to be processed as shown in FIG. 3 . Wherein, the "text image corresponding to the image to be processed" refers to an image cut from the image to be processed according to the text detection result of the image to be processed. In addition, please refer to S11 below for the relevant content of the "image to be processed" and the "text detection result of the image to be processed".
此外,本申请实施例不限定S1的实施方式,为了便于理解,下面结合两个示例进行说明。In addition, the embodiment of the present application does not limit the implementation manner of S1. For ease of understanding, two examples are used for description below.
示例1,S1具体可以包括:在获取到待处理图像之后,可以直接将该待处理图像,确定为待识别文本图像。Example 1, S1 may specifically include: after the image to be processed is acquired, the image to be processed may be directly determined as the text image to be recognized.
示例2,为了尽可能地避免待处理图像中除了文字以外其他图像信息对长文本识别造成不良影响,S1具体可以包括S11-S12:Example 2, in order to avoid as much as possible other image information other than text in the image to be processed from causing adverse effects on long text recognition, S1 can specifically include S11-S12:
S11:在获取到待处理图像之后,对该待处理图像进行文本检测,得到该待处理图像的文本检测结果。S11: After the image to be processed is acquired, perform text detection on the image to be processed to obtain a text detection result of the image to be processed.
其中,待处理图像是指需要进行图像处理(如,文本检测和/或文字识别)的图像;而且本申请实施例不限定待处理图像,例如,待处理图像可以是一帧视频图像。Wherein, the image to be processed refers to an image that requires image processing (such as text detection and/or character recognition); and the embodiment of the present application does not limit the image to be processed, for example, the image to be processed may be a frame of video image.
待处理图像的文本检测结果用于描述该待处理图像中文本(如,“这是一张包括长文本的图像”)在该待处理图像中所处位置。The text detection result of the image to be processed is used to describe the position of the text in the image to be processed (for example, "this is an image including long text") in the image to be processed.
另外,本申请实施例不限定S11中“文本检测”的实施方式,可以采用现有的或者未来出现的任一种能够针对图像进行文本检测的方法进行实施。In addition, the embodiment of the present application does not limit the implementation of "text detection" in S11, and any existing or future method capable of text detection for images can be used for implementation.
S12:按照待处理图像的文本检测结果,从该待处理图像中切割出待识别文本图像。S12: Cut out a text image to be recognized from the image to be processed according to the text detection result of the image to be processed.
本申请实施例中,在获取到待处理图像(如图2所示)的文本检测结果之后,将该待处理图像中该文本检测结果对应的图像区域切割下来,得到待识别文本图像(如图3所示),以使该待识别文本图像能够更准确地表示出该待处理图像携带的字符信息。In the embodiment of the present application, after obtaining the text detection result of the image to be processed (as shown in Figure 2), the image area corresponding to the text detection result in the image to be processed is cut off to obtain the text image to be recognized (as shown in Figure 2 ). 3), so that the text image to be recognized can more accurately represent the character information carried by the image to be processed.
基于上述S1的相关内容可知,在获取到待处理图像(如,一帧视频图像)之后,可以根据该待处理图像,确定待识别文本图像,以使该待识别文本图像用于表示该待处理图像携带的字符信息,以便后续能够基于该待识别文本图像准确地确定出该待处理图像携带的字符信息。Based on the relevant content of S1 above, after the image to be processed (such as a frame of video image) is acquired, the text image to be recognized can be determined according to the image to be processed, so that the text image to be recognized is used to represent the image to be processed The character information carried by the image, so that the character information carried by the image to be processed can be accurately determined subsequently based on the text image to be recognized.
S2:将待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和该至少一个图像切片的位置信息。S2: Perform a first segmentation process on the text image to be recognized according to preset slice parameters to obtain at least one image slice and position information of the at least one image slice.
其中,“预设切片参数”是指对待识别文本图像进行第一切分处理时所需参考的参数;而且本申请实施例不限定“预设切片参数”,例如,其可以包括切分间隔。“切分间隔”用于表示在针对待识别文本图像进行第一切分处理时两个相邻切分位置之间的距离;而且本申请实施例不限定“切分间隔”(例如,图4所示的512个像素)。Wherein, the "preset slice parameter" refers to the parameter that needs to be referred to when performing the first segmentation process on the text image to be recognized; and the embodiment of the present application does not limit the "preset slice parameter", for example, it may include the segmentation interval. "Segmentation interval" is used to indicate the distance between two adjacent segmentation positions when the first segmentation process is performed on the text image to be recognized; and the embodiment of the present application does not limit the "segmentation interval" (for example, FIG. 4 512 pixels shown).
“第一切分处理”用于表示按照上述“预设切片参数”进行实施的切分处理过程。The "first slicing process" is used to indicate the slicing process implemented according to the above-mentioned "preset slicing parameters".
“至少一个图像切片”是指针对待识别文本图像进行第一切分处理之后得到的至少一个图像片段;而且“至少一个图像切片的位置信息”用于描述该各个图像切片在待识别文本图像中所处位置。"At least one image slice" refers to at least one image segment obtained after the first segmentation process of the text image to be recognized; and "position information of at least one image slice" is used to describe the position of each image slice in the text image to be recognized location.
另外,本申请实施例不限定“至少一个图像切片”的确定过程,为了便于理解,下面结合两种可能的实施方式进行说明。In addition, the embodiment of the present application does not limit the process of determining "at least one image slice". For ease of understanding, two possible implementation manners are described below in combination.
在一种可能的实施方式下,当上述“预设切片参数”包括切分间隔时,“至少一个图像切片”的确定过程具体可以包括:将待识别文本图像按照切分间隔进行第一切分处理,得到至少一个图像切片,以使每个图像切片的长度均为上述“切分间隔”(如,图4所示的512个像素)。In a possible implementation manner, when the above-mentioned "preset slice parameter" includes a segmentation interval, the determination process of "at least one image slice" may specifically include: first segmenting the text image to be recognized according to the segmentation interval Processing to obtain at least one image slice, so that the length of each image slice is the above-mentioned "segment interval" (eg, 512 pixels as shown in FIG. 4 ).
在一些情况(例如,类似于下文S3通过参考“待识别文本图像对应的预设切图位置”确定“待识别文本图像对应的实际切图位置”的情况)下,当上述“第一切分处理”出现切坏字现象(如图5所示的将“结”切成“纟”与“吉”这两部分等现象)时,这些被切坏的字易导致后续发生文字识别错误。例如,当上述“第一切分处理”所涉及的切分位置与下文“预设切图位置”之间存在相同位置(如,均存在图5所示的位于“纟”与“吉”之间的位置)时,易发生文字识别错误(例如,导致将“结”错误地识别为“三”和“吉”这两个字的错误)。In some cases (for example, similar to S3 below by referring to the "preset cutting position corresponding to the text image to be recognized" to determine the "actual cutting position corresponding to the text image to be recognized"), when the above "first segmentation When there is a phenomenon of cutting bad characters in "handling" (as shown in Figure 5, "knot" is cut into two parts, such as "纟" and "吉", etc.), these badly cut characters will easily lead to subsequent character recognition errors. For example, when there is the same position between the segmentation position involved in the above-mentioned "first segmentation process" and the following "preset image segmentation position" (for example, there are both positions between "纟" and "吉" as shown in Figure 5). space between), prone to character recognition errors (for example, an error that causes "knot" to be mistakenly recognized as the two characters "three" and "ji").
基于上述分析可知,为了尽可能地避免因“第一切分处理”出现切坏字现象而导致的上述不良影响,可以通过控制“第一切分处理”所涉及的切分位置与下文“预设切图位置”之间不存在相同位置进行实现。基于此,本申请实施例还提供了确定“至少一个图像切片”的另一种可能的实施方式,在该实施方式中,当上述“预设切片参数”包括切分间隔和切分偏移长度时,“至少一个图像切片”的确定过程具体可以包括S21-S22:Based on the above analysis, in order to avoid as much as possible the above-mentioned adverse effects caused by the phenomenon of broken characters in the "first segmentation process", you can control the segmentation position involved in the "first segmentation process" There is no same position between "set cut map position" for implementation. Based on this, the embodiment of the present application also provides another possible implementation of determining "at least one image slice". In this implementation, when the above-mentioned "preset slice parameters" include segmentation interval and segmentation offset length When , the determination process of "at least one image slice" may specifically include S21-S22:
S21:从待识别文本图像中切除具有切分偏移长度的图像区域,得到待切分图像,以使该待切分图像不包括上述“具有切分偏移长度的图像区域”。S21: Cut out the image area with the segmentation offset length from the text image to be recognized to obtain the image to be segmented, so that the image to be segmented does not include the above-mentioned "image area with the segmentation offset length".
其中,“切分偏移长度”用于表示针对待识别文本图像进行第一切分处理时所需使用的切分偏移量;而且该“切分偏移长度”可以小于上述“切分间隔”。另外,本申请实施例不限定“切分偏移长度”,例如,如图4所示,当上述“切分间隔”为512个像素时,该“切 分偏移长度”可以为256个像素。Among them, the "segmentation offset length" is used to indicate the segmentation offset to be used when performing the first segmentation process on the text image to be recognized; and the "segmentation offset length" can be smaller than the above-mentioned "segmentation interval ". In addition, the embodiment of the present application does not limit the "segmentation offset length". For example, as shown in FIG. 4, when the above-mentioned "segmentation interval" is 512 pixels, the "segmentation offset length" can be 256 pixels .
另外,本申请实施例不限定上述“具有切分偏移长度的图像区域”的位置,例如,可以位于该待识别文本图像的最左端区域(如图4所示),也可以位于该待识别文本图像的最右端区域,还可以位于该待识别文本图像的预设内部区域。In addition, the embodiment of the present application does not limit the position of the above-mentioned "image area with segmentation offset length", for example, it may be located in the leftmost area of the text image to be recognized (as shown in Figure 4), or it may be located in the text image to be recognized The rightmost region of the text image may also be located in a preset inner region of the text image to be recognized.
此外,本申请实施例不限定S21的实施方式,例如,在一种可能的实施方式中,若上述“预设切片参数”还包括切除起始位置,则S21具体可以包括S211-S212:In addition, the embodiment of this application does not limit the implementation of S21. For example, in a possible implementation, if the above-mentioned "preset slice parameters" also include the starting position of resection, then S21 may specifically include S211-S212:
S211:根据切除起始位置和切分偏移长度,确定切除区域位置。S211: Determine the position of the resection area according to the resection starting position and the segmentation offset length.
其中,“切除起始位置”用于表示上述“具有切分偏移长度的图像区域”的一个边界位置(如,左端边界位置)在上述“待识别文本图像”中所处位置;而且本申请实施例不限定“切除起始位置”,例如,如图4所示,其可以是该待识别文本图像的左端边界位置。Among them, the "cutting start position" is used to indicate the position of a boundary position (such as the left end boundary position) of the above-mentioned "image area with segmentation offset length" in the above-mentioned "text image to be recognized"; and the present application The embodiment does not limit the "cutting start position", for example, as shown in FIG. 4 , it may be the left boundary position of the text image to be recognized.
“切除区域位置”用于表示上述“具有切分偏移长度的图像区域”在“待识别文本图像”中所处位置;而且该“切除区域位置”的长度为上述“切分偏移长度”,该“切除区域位置”的边界位置包括上述“切除起始位置”。"Excision area position" is used to indicate the position of the above-mentioned "image area with segmentation offset length" in the "text image to be recognized"; and the length of the "excision area position" is the above-mentioned "segmentation offset length" , the boundary position of the "resection area position" includes the above-mentioned "resection start position".
S212:按照切除区域位置对待识别文本图像进行区域切除处理,得到待切分图像。S212: Execute region cutting processing on the text image to be recognized according to the position of the cutting region to obtain the image to be segmented.
本申请实施例中,在获取到切除区域位置之后,可以从待识别文本图像中切除占有该切除区域位置的图像区域(也就是,上述“具有切分偏移长度的图像区域”),并将该待识别文本图像的剩余区域确定为待切分图像,以使该待切分图像用于表示该待识别文本图像中除了上述“具有切分偏移长度的图像区域”以外的其他图像区域,从而使得该待切分图像不包括上述“具有切分偏移长度的图像区域”。In the embodiment of the present application, after the position of the cut-out area is obtained, the image area occupying the position of the cut-out area (that is, the above-mentioned "image area with a segmentation offset length") can be cut off from the text image to be recognized, and The remaining area of the text image to be recognized is determined as the image to be segmented, so that the image to be segmented is used to represent other image areas in the text image to be recognized except the above-mentioned "image area with a segmentation offset length", Therefore, the image to be segmented does not include the above-mentioned "image region with a segmented offset length".
基于上述S21的相关内容可知,在获取到待识别文本图像之后,可以从待识别文本图像中切除具有切分偏移长度的图像区域,得到待切分图像,以使该待切分图像不包括上述“具有切分偏移长度的图像区域”,以便后续能够针对该待切分图像进行切分处理。Based on the relevant content of S21 above, after obtaining the text image to be recognized, the image area with the segmentation offset length can be cut off from the text image to be recognized to obtain the image to be segmented, so that the image to be segmented does not include The above-mentioned "image region with segmentation offset length" is used to perform segmentation processing on the image to be segmented subsequently.
S22:将待切分图像按照切分间隔进行切分处理,得到至少一个图像切片。S22: Perform segmentation processing on the image to be segmented according to the segmentation interval to obtain at least one image slice.
本申请实施例中,在获取到待切分图像之后,可以将待切分图像按照切分间隔进行切分处理,得到至少一个图像切片(如图4所示的多个图像切片)。其中,因上述“待切分图像”相较于“待识别文本图像”来说缺少一部分区域,使得针对该“待切分图像”进行切分处理时所使用的切分位置相对于“待识别文本图像”来说发生了一定量的偏移,从而使得针对上述“待切分图像”进行切分处理时所使用的切分位置几乎不可能与下文“预设切图位置”之间存在相同位置,如此能够有效地避免因“第一切分处理”出现切坏字现象而导致的上述不良影响。In the embodiment of the present application, after the image to be segmented is acquired, the image to be segmented may be segmented according to the segmentation interval to obtain at least one image slice (a plurality of image slices as shown in FIG. 4 ). Among them, because the above-mentioned "image to be segmented" lacks a part of the area compared with the "text image to be recognized", the segmenting position used when performing segmentation processing on the "image to be segmented" is relative to the "text image to be recognized". Text image” has a certain amount of offset, so that the segmentation position used for the above-mentioned “image to be segmented” is almost impossible to be the same as the following “preset image segmentation position” position, which can effectively avoid the above-mentioned adverse effects caused by the phenomenon of broken characters in the "first segmentation process".
基于上述S2的相关内容可知,在获取到待识别文本图像之后,可以将待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和该至少一个图像切片的位置信息,以便后续能够基于该至少一个图像切片”实现针对该“待识别文本图像”的文字识别结果。其中,因各个图像切片的长度远远小于待识别文本图像的长度,使得针对各个图像切片的处理过程耗时远远小于针对待识别文本图像的处理过程耗时,如此有利于提高文字识别效率。Based on the relevant content of S2 above, after the text image to be recognized is acquired, the text image to be recognized can be subjected to the first segmentation process according to the preset slice parameters to obtain at least one image slice and the position information of the at least one image slice, In order to realize the character recognition result for the "text image to be recognized" based on the at least one image slice later. Wherein, because the length of each image slice is much smaller than the length of the text image to be recognized, the processing process for each image slice The time consumption is far less than the time consumption for processing the image of the text to be recognized, which is beneficial to improve the efficiency of text recognition.
S3:根据至少一个图像切片的单字检测结果和该至少一个图像切片的位置信息,确定 待识别文本图像对应的实际切图位置。S3: According to the word detection result of at least one image slice and the position information of the at least one image slice, determine the actual cut-out position corresponding to the text image to be recognized.
其中,“至少一个图像切片的单字检测结果”用于表示各个图像切片中各个字符所处位置。Wherein, the "single character detection result of at least one image slice" is used to indicate the position of each character in each image slice.
另外,本申请实施例不限定“至少一个图像切片的单字检测结果”的确定过程,例如,可以分别各个图像切片进行单字检测处理,得到各个图像切片的单字检测结果。需要说明的是,本申请实施例不限定“单字检测处理”的实施方式,例如,可以采用现有的或者未来出现的任一种单字检测方法进行实施。又如,可以采用下文所示的“单字检测模型”进行实施。In addition, the embodiment of the present application does not limit the determination process of "the word detection result of at least one image slice". For example, word detection processing may be performed on each image slice to obtain the word detection result of each image slice. It should be noted that the embodiment of the present application does not limit the implementation manner of "single word detection processing", for example, any existing or future single word detection method may be used for implementation. As another example, the "single character detection model" shown below can be used for implementation.
此外,为了进一步提高单字检测效率,本申请实施例还提供了确定“至少一个图像切片的单字检测结果”的另一种可能的实施方式,其具体可以包括:利用预先构建的单字检测模型对至少一个图像切片进行并行单字检测处理,得到所述至少一个图像切片的单字检测结果。In addition, in order to further improve the word detection efficiency, the embodiment of the present application also provides another possible implementation manner of determining "the word detection result of at least one image slice", which may specifically include: using a pre-built word detection model to at least A parallel word detection process is performed on one image slice to obtain a word detection result of the at least one image slice.
其中,单字检测模型用于针对该单字检测模型的输入数据进行字符位置检测(如,进行字符边界位置检测)。Wherein, the single-character detection model is used to perform character position detection (for example, to perform character boundary position detection) on the input data of the single-character detection model.
本申请实施例不限定单字检测模型的模型结构,例如,在一种可能的实施方式下,如图6所示,该单字检测模型600可以包括特征提取层601和单字位置确定层602,而且该单字位置确定层602的输入数据包括该特征提取层601的输出数据。The embodiment of the present application does not limit the model structure of the word detection model. For example, in a possible implementation manner, as shown in FIG. 6, the word detection model 600 may include a feature extraction layer 601 and a word position determination layer 602, and the The input data of the word position determination layer 602 includes the output data of the feature extraction layer 601 .
为了便于理解单字检测模型600的工作原理,下面以上述目标图像的单字检测结果的确定过程为例进行说明。其中,“目标图像”用于表示上述“至少一个图像切片”中的任一图像切片。In order to facilitate the understanding of the working principle of the single character detection model 600 , the process of determining the single character detection result of the above target image is taken as an example to describe below. Wherein, the "target image" is used to represent any image slice in the above "at least one image slice".
作为示例,利用单字检测模型600确定上述“单字检测结果”的过程,具体可以包括步骤11-步骤12:As an example, the process of using the word detection model 600 to determine the above-mentioned "single word detection result" may specifically include steps 11-12:
步骤11:将目标图像输入特征提取层601,得到该特征提取层601输出的图像位置特征。Step 11: Input the target image into the feature extraction layer 601 to obtain the image position feature output by the feature extraction layer 601 .
其中,特征提取层601用于针对该特征提取层601的输入数据进行特征提取;而且本申请实施例不限定特征提取层601,例如,该特征提取层601可以利用任一种卷积神经网络(Convolutional Neural Networks,CNN)进行实施(如,可以采用(Visual Geometry Group,VGG)网络进行实施)。Wherein, the feature extraction layer 601 is used to perform feature extraction for the input data of the feature extraction layer 601; and the embodiment of the present application does not limit the feature extraction layer 601, for example, the feature extraction layer 601 can use any convolutional neural network ( Convolutional Neural Networks, CNN) for implementation (eg, (Visual Geometry Group, VGG) network can be used for implementation).
图像位置特征用于表示目标图像中各个位置携带的信息(尤其是,在宽度方向上各个位置携带的信息)。另外,本申请实施例不限定图像位置特征,例如,若目标图像为[C,H,W]矩阵,则该图像位置特征可以是[1,1,W/4]的矩阵。其中,C表示图像通道数(如,C=3),H表示图像高度(如,H=32),W表示图像宽度(如,W=512)。The image position feature is used to represent the information carried by each position in the target image (especially, the information carried by each position in the width direction). In addition, the embodiment of the present application does not limit the image position feature. For example, if the target image is a [C, H, W] matrix, the image position feature may be a [1, 1, W/4] matrix. Wherein, C represents the number of image channels (eg, C=3), H represents the image height (eg, H=32), and W represents the image width (eg, W=512).
步骤12:将图像位置特征输入单字位置确定层602,得到该单字位置确定层602输出的目标图像的单字检测结果。Step 12: Input the feature of the image position into the word position determination layer 602, and obtain the word detection result of the target image output by the word position determination layer 602.
其中,单字位置确定层602用于针对该单字位置确定层602的输入数据进行字符边界位置识别处理。Wherein, the character position determining layer 602 is used for performing character boundary position recognition processing on the input data of the character position determining layer 602 .
另外,本申请实施例不限定单字位置确定层602,例如,在一种可能的实施方式下, 若图像位置特征的宽度小于目标图像的宽度(例如,该图像位置特征的宽度是该目标图像的宽度的1/4),则该单字位置确定层602可以包括位置分类层和位置映射层,而且该位置映射层的输入数据包括该位置分类层的输出数据。In addition, the embodiment of the present application does not limit the word position determination layer 602. For example, in a possible implementation manner, if the width of the image position feature is smaller than the width of the target image (for example, the width of the image position feature is the width of the target image 1/4 of the width), then the word location determination layer 602 may include a location classification layer and a location mapping layer, and the input data of the location mapping layer includes the output data of the location classification layer.
为了便于理解单字位置确定层602的工作原理,下面以上述“单字检测结果”的确定过程为例进行说明。In order to facilitate the understanding of the working principle of the single character position determination layer 602, the determination process of the above-mentioned "single character detection result" is taken as an example to describe below.
作为示例,若单字位置确定层602包括位置分类层和位置映射层,则上述“单字检测结果”的确定过程可以包括步骤21-步骤22:As an example, if the single character position determination layer 602 includes a position classification layer and a position mapping layer, then the determination process of the above-mentioned "single character detection result" may include steps 21-22:
步骤21:将图像位置特征输入位置分类层,得到该位置分类层输出的位置分类结果。Step 21: Input the image position feature into the position classification layer, and obtain the position classification result output by the position classification layer.
其中,位置分类层用于判断该位置分类层的输入数据是否属于字符边界位置。Wherein, the position classification layer is used to judge whether the input data of the position classification layer belongs to the character boundary position.
另外,本申请实施例不限定位置分类层的实施方式,可以采用现有的或者未来出现的任一种分类器(如,softmax等)进行实施。In addition, the embodiment of the present application does not limit the implementation manner of the position classification layer, and any existing or future classifier (eg, softmax, etc.) may be used for implementation.
位置分类结果用于表示目标图像中各个位置是否属于字符边界(尤其是,目标图像在宽度方向上各个位置是否属于字符边界)。The position classification result is used to indicate whether each position in the target image belongs to a character boundary (especially, whether each position in the width direction of the target image belongs to a character boundary).
步骤22:将位置分类结果输入位置映射层,得到该位置映射层输出的目标图像的单字检测结果。Step 22: Input the position classification result into the position mapping layer, and obtain the word detection result of the target image output by the position mapping layer.
其中,位置映射层用于针对该位置映射层进行映射处理。Wherein, the position mapping layer is used for performing mapping processing on the position mapping layer.
另外,本申请实施例不限定位置映射层的工作原理,例如,该位置映射层可以将位置分类结果中各个位置按照公式(1)进行映射处理。In addition, the embodiment of the present application does not limit the working principle of the location mapping layer. For example, the location mapping layer may map each location in the location classification result according to formula (1).
y=a×x+b                          (1)y=a×x+b (1)
式中,y表示x对应的映射后位置坐标;“a”表示目标图像的宽度与图像位置特征的宽度之间的比值(如,4);“x”表示位置分类结果中的一个位置坐标(尤其是,位置分类结果在宽度方向上的一个位置坐标);“b”表示特征提取层601中所使用的卷积偏移量。In the formula, y represents the mapped position coordinate corresponding to x; "a" represents the ratio between the width of the target image and the width of the image position feature (for example, 4); "x" represents a position coordinate in the position classification result ( In particular, a position coordinate of the position classification result in the width direction); “b” represents the convolution offset used in the feature extraction layer 601 .
可见,在一些情况下,因图像位置特征的宽度小于目标图像的宽度(例如,该图像位置特征的宽度是该目标图像的宽度的1/4),使得基于该图像位置特征确定的位置分类结果的宽度也小于目标图像的宽度(例如,该位置分类结果的宽度也是该目标图像的宽度的1/4),此时,为了能够更准确地描述出目标图像在宽度方向上各个位置是否属于字符边界,可以将该位置分类结果在宽度方向上的各个位置坐标按照公式(1)映射到该目标图像在宽度方向上的位置坐标。It can be seen that, in some cases, because the width of the image position feature is smaller than the width of the target image (for example, the width of the image position feature is 1/4 of the width of the target image), the position classification result determined based on the image position feature The width of is also smaller than the width of the target image (for example, the width of the position classification result is also 1/4 of the width of the target image), at this time, in order to describe more accurately whether each position of the target image in the width direction belongs to the character Each position coordinate of the position classification result in the width direction can be mapped to the position coordinates of the target image in the width direction according to formula (1).
基于上述步骤11至步骤12的相关内容可知,对于图6所示的单字检测模型600来说,在将目标图像输入该单字检测模型600之后,可以由该单字检测模型600能够针对该目标图像依次进行特征提取处理以及单字位置确定处理,得到并输出该目标图像的单字检测结果,以使该单字检测结果能够准确地表示出该目标图像中各个字符的边界位置。Based on the relevant content of the above step 11 to step 12, it can be known that for the word detection model 600 shown in FIG. Perform feature extraction processing and single character position determination processing, obtain and output the single character detection result of the target image, so that the single character detection result can accurately represent the boundary position of each character in the target image.
另外,单字检测模型可以预先根据样本文本图像和该样本文本图像中各个字符的实际位置进行构建。其中,样本文本图像是指构建单字检测模型所使用的图像;而且本申请实施例不限定样本文本图像的个数。另外,本申请实施例不限定样本文本图像中各个字符的实际位置,例如,其可以样本文本图像中各个字符的实际边界位置。In addition, the single character detection model can be constructed in advance according to the sample text image and the actual position of each character in the sample text image. Wherein, the sample text image refers to the image used to construct the word detection model; and the embodiment of the present application does not limit the number of sample text images. In addition, the embodiment of the present application does not limit the actual position of each character in the sample text image, for example, it may be the actual boundary position of each character in the sample text image.
此外,本申请实施例不限定单字检测模型的构建过程,例如,在一种可能的实施方式 中,单字检测模型的构建过程可以包括步骤31-步骤34:In addition, the embodiment of the present application does not limit the construction process of the single word detection model, for example, in a possible implementation manner, the construction process of the single word detection model may include steps 31-step 34:
步骤31:将样本文本图像输入待训练模型,得到该待训练模型输出的该样本文本图像的预测字符位置。Step 31: Input the sample text image into the model to be trained, and obtain the predicted character position of the sample text image output by the model to be trained.
其中,待训练模型用于针对该待训练模型的数据输入进行字符位置检测(如,进行字符边界位置检测)。另外,待训练模型的模型结构与上文“单字检测模型”相同,故待训练模型的模型结构的相关内容可以参见上文“单字检测模型”的模型结构的相关内容。Wherein, the model to be trained is used for character position detection (for example, character boundary position detection) for the data input of the model to be trained. In addition, the model structure of the model to be trained is the same as the "word detection model" above, so the relevant content of the model structure of the model to be trained can refer to the relevant content of the model structure of the "word detection model" above.
样本文本图像的预测字符位置用于描述该样本文本图像中至少一个字符的预测位置。The predicted character position of the sample text image is used to describe the predicted position of at least one character in the sample text image.
步骤32:判断是否达到预设停止条件,若是,则执行步骤34;若否,则执行步骤33。Step 32: Judging whether the preset stop condition is met, if yes, execute step 34; if not, execute step 33.
其中,预设停止条件可以预先设定,例如,该预设停止条件可以是待训练模型的损失值低于预设损失阈值,也可以是该待训练模型的损失值的变化率低于预设变化率阈值(也就是,该待训练模型的字符位置检测性能处于收敛状态),还可以是该待训练模型的更新次数达到预设次数阈值。Wherein, the preset stop condition can be set in advance, for example, the preset stop condition can be that the loss value of the model to be trained is lower than the preset loss threshold, or that the rate of change of the loss value of the model to be trained is lower than the preset The rate-of-change threshold (that is, the character position detection performance of the model to be trained is in a convergent state) may also be that the number of updates of the model to be trained reaches a preset threshold.
待训练模型的损失值用于表征该待训练模型的字符位置检测性能;而且,本申请实施例不限定待训练模型的损失值的确定方法。The loss value of the model to be trained is used to characterize the character position detection performance of the model to be trained; moreover, the embodiment of the present application does not limit the method for determining the loss value of the model to be trained.
另外,预设损失阈值、预设变化率阈值、以及预设次数阈值均可以预先设定。In addition, the preset loss threshold, the preset change rate threshold, and the preset number of times threshold can all be preset.
步骤33:根据样本文本图像的预测字符位置和该样本文本图像中各个字符的实际位置,更新待训练模型,并返回执行步骤31。Step 33: Update the model to be trained according to the predicted character position of the sample text image and the actual position of each character in the sample text image, and return to step 31.
本申请实施例中,在确定当前轮的待训练模型仍未达到预设停止条件之后,可以确定当前轮的待训练模型仍然具有较差的字符位置检测性能,故可以依据该样本文本图像的预测字符位置和该样本文本图像中各个字符的实际位置之间的差异性,对当前轮的待训练模型进行更新,以使更新后的待训练模型具有更好的字符位置检测性能,并返回继续执行步骤31及其后续步骤。In the embodiment of the present application, after it is determined that the model to be trained in the current round has not yet reached the preset stop condition, it can be determined that the model to be trained in the current round still has poor character position detection performance, so the prediction of the sample text image can be based on The difference between the character position and the actual position of each character in the sample text image, update the model to be trained in the current round, so that the updated model to be trained has better character position detection performance, and return to continue execution Step 31 and its subsequent steps.
步骤34:根据待训练模型,确定单字检测模型。Step 34: Determine a word detection model according to the model to be trained.
本申请实施例中,在确定当前轮的待训练模型已达到预设停止条件之后,可以确定当前轮的待训练模型已经具有较好的字符位置检测性能,故可以依据当前轮的待训练模型,确定单字检测模型(例如,可以直接将当前轮的待训练模型确定为单字检测模型。又如,可以按照当前轮的待训练模型的模型结构以及模型参数,确定该单字检测模型的模型结构以及模型参数,以使该单字检测模型的模型结构以及模型参数分别与该当前轮的待训练模型的模型结构以及模型参数保持一致),如此使得单字检测模型也具有较好的字符位置检测性能,从而使得后续利用该单字检测模型针对至少一个图像切片确定的单字检测结果能够准确地表示出各个图像切片中各个字符所处位置。In the embodiment of the present application, after it is determined that the model to be trained in the current round has reached the preset stop condition, it can be determined that the model to be trained in the current round has good character position detection performance, so it can be based on the model to be trained in the current round, Determine the single-word detection model (for example, the model to be trained in the current round can be directly determined as the single-word detection model. For another example, the model structure and the model of the single-word detection model can be determined according to the model structure and model parameters of the model to be trained in the current round parameters, so that the model structure and model parameters of the word detection model are consistent with the model structure and model parameters of the current round of the model to be trained), so that the word detection model also has better character position detection performance, so that Subsequent word detection results determined for at least one image slice using the word detection model can accurately represent the positions of each character in each image slice.
上述“待识别文本图像对应的实际切图位置”用于描述针对该待识别文本图像的实际切割位置;而且,本申请实施例不限定“待识别文本图像对应的实际切图位置”的确定过程(也就是,S3的实施方式),例如,可以先根据至少一个图像切片的单字检测结果和该至少一个图像切片的位置信息,确定待识别文本图像的单字位置信息;再根据该待识别文本图像的单字位置信息,确定该待识别文本图像对应的实际切图位置,以使该“待识别文本图像对应的实际切图位置”尽可能地不会出现在字符内部。The above "actual cutting position corresponding to the text image to be recognized" is used to describe the actual cutting position for the text image to be recognized; moreover, the embodiment of the present application does not limit the determination process of "the actual cutting position corresponding to the text image to be recognized" (That is, the embodiment of S3), for example, can first determine the word position information of the text image to be recognized according to the word detection result of at least one image slice and the position information of the at least one image slice; then according to the text image to be recognized The single character position information of the text image to be recognized is determined to determine the actual cutting position corresponding to the text image to be recognized, so that the "actual cutting position corresponding to the text image to be recognized" will not appear inside the character as much as possible.
在一些情况下,终端用户可以设定文字识别效率需求;或者,不同应用场景可以对应于不同的文字识别效率需求。基于此可知,为了满足上述“文字识别效率需求”,本申请实施例还提供了S3的一种可能的实施方式,其具体可以包括:根据至少一个图像切片的单字检测结果、该至少一个图像切片的位置信息、和待识别文本图像对应的预设切图位置,确定该待识别文本图像对应的实际切图位置。In some cases, end users may set character recognition efficiency requirements; or, different application scenarios may correspond to different character recognition efficiency requirements. Based on this, it can be seen that, in order to meet the above-mentioned "character recognition efficiency requirements", the embodiment of the present application also provides a possible implementation of S3, which may specifically include: according to the word detection result of at least one image slice, the at least one image slice The position information of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized determine the actual cutting position corresponding to the text image to be recognized.
其中,“待识别文本图像对应的预设切图位置”是指针对该待识别文本图像预先设定的切割位置;而且该“待识别文本图像对应的预设切图位置”是根据上述“文字识别效率需求”确定的。Among them, the "preset cutting position corresponding to the text image to be recognized" refers to the cutting position preset for the text image to be recognized; and the "preset cutting position corresponding to the text image to be recognized" is based on the above "text Identification of efficiency needs" identified.
另外,本申请实施例不限定待识别文本图像对应的预设切图位置,例如,其可以包括至少一个硬切位置。其中,“硬切位置”用于表示待识别文本图像对应的一个预设切割位置。为了便于理解,下面以图7所示的待识别文本图像为例进行说明。In addition, this embodiment of the present application does not limit the preset cut position corresponding to the text image to be recognized, for example, it may include at least one hard cut position. Wherein, the "hard cutting position" is used to indicate a preset cutting position corresponding to the text image to be recognized. For ease of understanding, the text image to be recognized shown in FIG. 7 is taken as an example for description below.
作为示例,若待识别文本图像为图7所示的待识别文本图像,则该待识别文本图像对应的预设切图位置可以为{512,1024,1536,2048}。其中,“512”、“1024”、“1536”、以及“2048”均是待识别文本图像对应的硬切位置。As an example, if the text image to be recognized is the text image to be recognized as shown in FIG. 7 , the preset cut position corresponding to the text image to be recognized may be {512, 1024, 1536, 2048}. Among them, "512", "1024", "1536", and "2048" are hard-cut positions corresponding to the text image to be recognized.
此外,本申请实施例不限定待识别文本图像对应的预设切图位置的确定过程,例如,其具体可以包括步骤41-步骤42:In addition, the embodiment of the present application does not limit the process of determining the preset cutting position corresponding to the text image to be recognized. For example, it may specifically include steps 41-42:
步骤41:获取预设切分参数。Step 41: Obtain preset segmentation parameters.
其中,“预设切分参数”用于表示一张切图的最大宽度(也就是,上述“预设切图位置”中两个相邻硬切位置之间的距离);而且该预设切分参数可以根据应用场景预先设定(尤其可以根据该应用场景下的文字识别效率需求进行设定)。例如,预设切分参数可以是512个像素值。Among them, the "preset segmentation parameter" is used to indicate the maximum width of a cut image (that is, the distance between two adjacent hard cut positions in the above-mentioned "preset cut image position"); and the preset cut image The sub-parameters can be preset according to the application scenario (in particular, they can be set according to the character recognition efficiency requirement in the application scenario). For example, the preset segmentation parameter may be 512 pixel values.
步骤42:根据预设切分参数和待识别文本图像,确定该待识别文本图像对应的预设切图位置。Step 42: According to the preset segmentation parameters and the text image to be recognized, determine the preset cutting position corresponding to the text image to be recognized.
本申请实施例中,在获取到待识别文本图像之后,可以参考预设切分参数,确定待识别文本图像对应的预设切图位置(如图7中{512,1024,1536,2048}),以使该预设切图位置中相邻位置之间的位置间隔不超过该预设切分参数。In the embodiment of the present application, after the text image to be recognized is obtained, the preset segmentation parameters can be referred to to determine the preset cutting position corresponding to the text image to be recognized (as {512, 1024, 1536, 2048} in Fig. 7) , so that the position interval between adjacent positions in the preset image cutting position does not exceed the preset cutting parameter.
基于上述步骤41至步骤42的相关内容可知,待识别文本图像对应的预设切图位置可以依据应用场景确定(尤其可以依据该应用场景下的文字识别效率需求进行确定),使得基于该预设切图位置确定的实际切图位置能够在满足该应用场景下的文字识别效率需求的前提下进行图片切分,从而使得本申请提供的文字识别方法能够满足该应用场景下的文字识别效率需求。Based on the relevant content of the above-mentioned steps 41 to 42, it can be known that the preset cut-out position corresponding to the text image to be recognized can be determined according to the application scenario (in particular, it can be determined according to the text recognition efficiency requirements in the application scenario), so that based on the preset The actual image cutting position determined by the image cutting position can perform image segmentation on the premise of meeting the text recognition efficiency requirements in this application scenario, so that the text recognition method provided by this application can meet the text recognition efficiency requirements in this application scenario.
另外,本申请实施例不限定上述参考上述“预设切图位置”确定待识别文本图像对应的实际切图位置的实施方式,例如,其具体可以包括步骤51-步骤52:In addition, this embodiment of the present application does not limit the above-mentioned implementation of determining the actual cutting position corresponding to the text image to be recognized by referring to the above-mentioned "preset cutting position". For example, it may specifically include steps 51-52:
步骤51:将至少一个图像切片的单字检测结果按照该至少一个图像切片的位置信息进行拼接处理,得到待识别文本图像的单字检测结果。Step 51: Concatenate the word detection result of at least one image slice according to the position information of the at least one image slice, to obtain the word detection result of the text image to be recognized.
其中,“待识别文本图像的单字检测结果”用于描述该待识别文本图像中至少一个字符所处位置。Wherein, the "single word detection result of the text image to be recognized" is used to describe the position of at least one character in the text image to be recognized.
另外,本申请实施例不限定“待识别文本图像的单字检测结果”,例如,该单字检测结果可以包括至少一个边界位置。其中,“边界位置”用于表示一个字符的边缘位置。为了便于理解,下面以图7所示的待识别文本图像为例进行说明。In addition, this embodiment of the present application does not limit the "single character detection result of the text image to be recognized", for example, the single character detection result may include at least one boundary position. Wherein, "border position" is used to indicate the edge position of a character. For ease of understanding, the text image to be recognized shown in FIG. 7 is taken as an example for description below.
作为示例,若待识别文本图像为图7所示的待识别文本图像,则该待识别文本图像的单字检测结果可以为{43,82,293,309,……}。其中,“43”表示“这”的左边界,“82”表示“这”的右边界,“293”表示“是”的左边界,“309”表示“是”的右边界,……(以此类推)。As an example, if the text image to be recognized is the text image to be recognized as shown in FIG. 7 , the word detection result of the text image to be recognized may be {43, 82, 293, 309, . . . }. Among them, "43" represents the left boundary of "this", "82" represents the right boundary of "this", "293" represents the left boundary of "yes", "309" represents the right boundary of "yes", ... (with and so on).
基于上述步骤51的相关内容可知,在获取到至少一个图像切片的单字检测结果之后,可以将该至少一个图像切片的单字检测结果按照该至少一个图像切片的位置信息进行拼接处理,得到待识别文本图像的单字检测结果,以使该“待识别文本图像的单字检测结果”用于描述该待识别文本图像中至少一个字符所处位置。Based on the relevant content of the above-mentioned step 51, after obtaining the word detection result of at least one image slice, the word detection result of the at least one image slice can be spliced according to the position information of the at least one image slice to obtain the text to be recognized The single-character detection result of the image, so that the "single-character detection result of the text image to be recognized" is used to describe the position of at least one character in the text image to be recognized.
步骤52:根据待识别文本图像的单字检测结果和该待识别文本图像对应的预设切图位置,确定该待识别文本图像对应的实际切图位置。Step 52: Determine the actual cutting position corresponding to the text image to be recognized according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
本申请实施例中,在获取到待识别文本图像的单字检测结果、以及该待识别文本图像对应的预设切图位置之后,可以参考上述两者确定该待识别文本图像对应的实际切图位置;而且该确定过程具体可以包括:如图7所示,可以利用预设算法,将待识别文本图像对应的预设切图位置与该待识别文本图像的单字检测结果进行匹配,得到该待识别文本图像对应的实际切图位置。其中,预设算法可以预先设定,例如,预设算法可以是贪心算法或者匈牙利算法。In the embodiment of the present application, after obtaining the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, the actual cutting position corresponding to the text image to be recognized can be determined by referring to the above two and the determination process can specifically include: as shown in Figure 7, a preset algorithm can be used to match the preset cut-out position corresponding to the text image to be recognized with the word detection result of the text image to be recognized to obtain the text image to be recognized The actual cutting position corresponding to the text image. Wherein, the preset algorithm may be preset, for example, the preset algorithm may be a greedy algorithm or a Hungarian algorithm.
为了便于理解步骤52,下面结合示例进行说明。In order to facilitate the understanding of step 52, the following will be described in conjunction with an example.
下面结合两个示例进行说明。The following describes with two examples.
示例一,步骤52具体可以包括步骤61-步骤63:Example 1, Step 52 may specifically include Step 61-Step 63:
步骤61:根据待识别文本图像的单字检测结果和该待识别文本图像对应的预设切图位置,确定第一位置集合和第二位置集合。Step 61: Determine a first position set and a second position set according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
其中,第一位置集合中位置的个数不少于第二位置集合中位置的个数。也就是,第一位置集合是指切图位置较多的集合,且第二位置集合是指切图位置较少的集合。Wherein, the number of positions in the first position set is not less than the number of positions in the second position set. That is, the first position set refers to a set with more image cutting positions, and the second position set refers to a set with fewer image cutting positions.
另外,本申请实施例不限定步骤61的实施方式,例如,若待识别文本图像的单字检测结果包括至少一个边界位置,且该待识别文本图像对应的预设切图位置包括至少一个硬切位置,则步骤61具体可以包括步骤611-步骤612:In addition, the embodiment of the present application does not limit the implementation of step 61. For example, if the word detection result of the text image to be recognized includes at least one boundary position, and the preset cut position corresponding to the text image to be recognized includes at least one hard cut position , then step 61 may specifically include step 611-step 612:
步骤611:若边界位置的个数不低于硬切位置的个数,则将上述“至少一个边界位置”的集合确定为第一位置集合,并将上述“至少一个硬切位置”的集合确定为第二位置集合。Step 611: If the number of boundary positions is not less than the number of hard-cut positions, determine the set of "at least one boundary position" as the first set of positions, and determine the set of "at least one hard-cut position" Set for the second position.
步骤612:若边界位置的个数低于硬切位置的个数,则将上述“至少一个硬切位置”的集合确定为第一位置集合,并将上述“至少一个边界位置”的集合确定为第二位置集合。Step 612: If the number of boundary positions is lower than the number of hard-cut positions, then determine the set of "at least one hard-cut position" as the first set of positions, and determine the set of "at least one boundary position" as Second location set.
基于上述步骤611至步骤612的相关内容可知,第一位置集合和第二位置集合可以依据单字检测结果表示的切图位置(也就是,边界位置)个数以及预设切图位置表示的切图位置(也就是,硬切位置)个数之间的大小关系确定,以使该第一位置集合用于表示单字检测结果表示的切图位置以及预设切图位置表示的切图位置中位置个数较多的集合,且该 第二位置集合用于表示单字检测结果表示的切图位置以及预设切图位置表示的切图位置中位置个数较少的集合。例如,若待识别文本图像的单字检测结果为图7所示的位置集合{43,82,293,309,……},且该待识别文本图像对应的预设切图位置为图4所示的位置集合{512,1024,1536,2048},则第一位置集合可以是{43,82,293,309,……},且第二位置集合可以是{512,1024,1536,2048}。Based on the relevant content of the above step 611 to step 612, it can be seen that the first position set and the second position set can be based on the number of cut-out positions (that is, boundary positions) represented by the word detection result and the number of cut-out positions represented by the preset cut-out position The size relationship between the positions (that is, the hard cut position) number is determined, so that the first position set is used to represent the position in the cut picture position represented by the word detection result and the preset cut picture position represented by the cut picture position. The set with a larger number, and the second set of positions is used to represent the set with a smaller number of positions among the image cut positions indicated by the word detection result and the preset image cut positions. For example, if the word detection result of the text image to be recognized is the position set {43, 82, 293, 309, ... } shown in Figure 7, and the preset cutting position corresponding to the text image to be recognized is as shown in Figure 4 The set of locations {512, 1024, 1536, 2048}, the first set of locations may be {43, 82, 293, 309, ...}, and the second set of locations may be {512, 1024, 1536, 2048}.
步骤62:将第二位置集合中各个位置分别与第一位置集合中至少一个位置进行匹配,得到第二位置集合中各个位置对应的匹配结果。Step 62: Match each position in the second position set with at least one position in the first position set, and obtain matching results corresponding to each position in the second position set.
本申请实施例中,若第二位置集合包括N个位置,则可以从第一位置集合中查找与该第二位置集合中第n个位置匹配成功的位置(例如,从第一位置集合中查找与该第二位置集合中第n个位置距离最近的位置),得到该第二位置集合中第n个位置对应的匹配结果,以使该第二位置集合中第n个位置对应的匹配结果用于表示在第一位置集合中存在的与该第n个位置匹配成功的位置。例如,如图7所示,若第一位置集合是{43,82,293,309,……},且第二位置集合是{512,1024,1536,2048},则第二位置集合中“512”对应的匹配结果可以是“512”与“335”匹配成功、……(以此类推)。In the embodiment of the present application, if the second position set includes N positions, then the position that successfully matches the nth position in the second position set can be searched from the first position set (for example, from the first position set to find The closest position to the nth position in the second position set), to obtain the matching result corresponding to the nth position in the second position set, so that the matching result corresponding to the nth position in the second position set is used Yu represents the location in the first location set that successfully matches the nth location. For example, as shown in Figure 7, if the first location set is {43, 82, 293, 309, ...}, and the second location set is {512, 1024, 1536, 2048}, then in the second location set " The matching result corresponding to 512" may be that "512" and "335" match successfully, ... (and so on).
步骤63:根据第二位置集合中各个位置对应的匹配结果,确定待识别文本图像对应的实际切图位置。Step 63: According to the matching results corresponding to each position in the second position set, determine the actual cutting position corresponding to the text image to be recognized.
本申请实施例中,在获取到第二位置集合中各个位置对应的匹配结果之后,可以参考该第二位置集合中各个位置对应的匹配结果,确定该待识别文本图像对应的实际切图位置(如,直接将该第二位置集合中各个位置对应的匹配结果,确定为该待识别文本图像对应的实际切图位置)。In the embodiment of the present application, after the matching results corresponding to each position in the second position set are obtained, the actual image cutting position corresponding to the text image to be recognized can be determined by referring to the matching result corresponding to each position in the second position set ( For example, the matching result corresponding to each position in the second position set is directly determined as the actual cutting position corresponding to the text image to be recognized).
基于上述步骤61至步骤63的相关内容可知,在获取到待识别文本图像的单字检测结果和该待识别文本图像对应的预设切图位置之后,可以先确定单字检测结果表示的切图位置个数以及该预设切图位置表示的切图位置个数;再将具有较少切图位置的切图位置集合中各个切图位置分别与具有较多切图位置的切图位置集合中至少一个切图位置进行匹配,得到具有较少切图位置的切图位置集合中各个切图位置对应的匹配结果;最后,依据该匹配结果,确定待识别文本图像对应的实际切图位置。Based on the relevant content of the above-mentioned steps 61 to 63, after obtaining the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, the number of cutting positions indicated by the word detection result can be determined first. number and the number of cut-out positions indicated by the preset cut-out positions; The image cut positions are matched to obtain the matching results corresponding to each image cut position in the set of cut image positions with fewer image cut positions; finally, according to the matching results, the actual image cut position corresponding to the text image to be recognized is determined.
示例二,若待识别文本图像的单字检测结果包括至少一个边界位置,且该待识别文本图像对应的预设切图位置包括至少一个硬切位置,则步骤52具体可以包括步骤71-步骤74:Example 2, if the word detection result of the text image to be recognized includes at least one boundary position, and the preset cut position corresponding to the text image to be recognized includes at least one hard cut position, then step 52 may specifically include step 71-step 74:
步骤71:根据待识别文本图像的单字检测结果和该待识别文本图像对应的预设切图位置,确定第一位置集合和第二位置集合。Step 71: Determine a first position set and a second position set according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
需要说明的是,步骤71的相关内容请参见上文S21。It should be noted that, for the relevant content of step 71, please refer to S21 above.
步骤72:若确定第二位置集合包括至少一个边界位置,则将该第二位置集合确定为待识别文本图像对应的实际切图位置。Step 72: If it is determined that the second set of positions includes at least one boundary position, then determine the second set of positions as the actual cut-out position corresponding to the text image to be recognized.
本申请实施例中,若确定第二位置集合包括至少一个边界位置,则可以确定该第二位置集合是根据待识别文本图像的单字检测结果确定的,使得该第二位置集合中各个位置均不会出现在字符内部,故可以直接将该第二位置集合确定为待识别文本图像对应的实际切图位置,以使该实际切图位置不会出现在字符内部,从而使得在基于该实际切图位置进行 切图时不会出现切坏字符的现象,如此能够有效地避免该待识别文本图像对应的各个切图中出现不完整字符,从而有利于提高长文本识别的识别准确性。In the embodiment of the present application, if it is determined that the second set of positions includes at least one boundary position, it can be determined that the second set of positions is determined according to the single word detection result of the text image to be recognized, so that each position in the second set of positions is not will appear inside the character, so the second set of positions can be directly determined as the actual cut-out position corresponding to the text image to be recognized, so that the actual cut-out position will not appear inside the character, so that based on the actual cut-out position When the image is cut at the corresponding position, there will be no phenomenon of broken characters, which can effectively prevent incomplete characters from appearing in each cut image corresponding to the text image to be recognized, thereby helping to improve the recognition accuracy of long text recognition.
步骤73:若确定第二位置集合包括至少一个硬切位置,则将第二位置集合中各个位置分别与第一位置集合中至少一个位置进行匹配,得到第二位置集合中各个位置对应的匹配结果。Step 73: If it is determined that the second set of positions includes at least one hard-cut position, then match each position in the second set of positions with at least one position in the first set of positions, and obtain matching results corresponding to each position in the second set of positions .
需要说明的是,步骤73可以采用上文S62的任一实施方式进行实施。It should be noted that step 73 can be implemented by using any implementation manner of S62 above.
可见,若确定第二位置集合包括至少一个硬切位置,则可以确定该第二位置集合是根据待识别文本图像对应的预设切图位置确定的,使得该第二位置集合有可能出现在字符内部,故可以从第一位置集合中分别查找能够与第二位置集合中各个位置匹配成功的位置,以便后续能够利用这些查找到的位置确定待识别文本图像对应的实际切图位置,以使该实际切图位置不会出现在字符内部,从而使得在基于该实际切图位置进行切图时不会出现切坏字符的现象,如此能够有效地避免该待识别文本图像对应的各个切图中出现不完整字符,从而有利于提高长文本识别的识别准确性。It can be seen that if it is determined that the second set of positions includes at least one hard cut position, it can be determined that the second set of positions is determined according to the preset cut position corresponding to the text image to be recognized, so that the second set of positions may appear in the character Therefore, the positions that can be successfully matched with each position in the second position set can be searched from the first position set, so that these found positions can be used to determine the actual cut-out position corresponding to the text image to be recognized, so that the The actual image cutting position will not appear inside the character, so that the character will not be damaged when cutting the image based on the actual image cutting position, so that it can effectively avoid the occurrence of incomplete characters, which is beneficial to improve the recognition accuracy of long text recognition.
步骤74:根据第二位置集合中各个位置对应的匹配结果,确定待识别文本图像对应的实际切图位置。Step 74: According to the matching results corresponding to each position in the second position set, determine the actual image cutting position corresponding to the text image to be recognized.
需要说明的是,步骤74的相关内容请参见上文S63。It should be noted that, for the relevant content of step 74, please refer to the above S63.
基于上述步骤71至步骤74的相关内容可知,在获取到待识别文本图像的单字检测结果和该待识别文本图像对应的预设切图位置之后,应该尽可能地从该单字检测结果中挑选出该待识别文本图像对应的实际切图位置,以使该实际切图位置能够在不切坏字符的情况下尽可能地满足应用场景下的文字识别效率需求。Based on the relevant content of the above step 71 to step 74, after obtaining the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, it should be selected from the word detection result as much as possible. The actual image cutting position corresponding to the text image to be recognized, so that the actual image cutting position can meet the character recognition efficiency requirement in the application scenario as much as possible without cutting the characters.
基于上述步骤52的相关内容可知,在获取到待识别文本图像的单字检测结果和该待识别文本图像对应的预设切图位置之后,可以综合上述两者来确定该待识别文本图像对应的实际切图位置,以使该待识别文本图像对应的实际切图位置能够在不切坏字符的情况下尽可能地满足应用场景下的文字识别效率需求。其中,因上述“待识别文本图像对应的预设切图位置”是依据应用场景对应的预设切分参数确定,使得该预设切图位置符合该应用场景下的文字识别效率需求,从而使得基于该预设切图位置确定的实际切图位置也符合该应用场景下的文字识别效率需求,从而使得基于该预设切图位置实现的文字识别过程能够满足该应用场景下的文字识别效率需求,如此实现在保证长文本识别的识别准确性的前提下尽可能地满足不同应用场景下的文字识别效率需求。Based on the relevant content of the above step 52, it can be seen that after obtaining the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, the above two can be combined to determine the actual text image corresponding to the text image to be recognized. The image cutting position, so that the actual image cutting position corresponding to the text image to be recognized can meet the text recognition efficiency requirements in the application scenario as much as possible without cutting the characters. Wherein, because the above-mentioned "preset cutting position corresponding to the text image to be recognized" is determined according to the preset segmentation parameters corresponding to the application scene, the preset cutting position meets the text recognition efficiency requirement in the application scene, so that The actual image cutting position determined based on the preset image cutting position also meets the text recognition efficiency requirements in this application scenario, so that the text recognition process based on the preset image cutting position can meet the text recognition efficiency requirements in this application scenario In this way, on the premise of ensuring the recognition accuracy of long text recognition, the text recognition efficiency requirements in different application scenarios can be met as much as possible.
基于上述S3的相关内容可知,在获取到至少一个图像切片的单字检测结果和该至少一个图像切片的位置信息之后,可以参考该至少一个图像切片的单字检测结果以及位置信息,确定待识别文本图像对应的实际切图位置。Based on the relevant content of S3 above, after obtaining the word detection result of at least one image slice and the position information of the at least one image slice, the text image to be recognized can be determined by referring to the word detection result and position information of the at least one image slice The corresponding actual cutting position.
S4:按照待识别文本图像对应的实际切图位置,对该待识别文本图像进行第二切分处理,得到至少一个待使用图片。S4: Perform a second segmentation process on the text image to be recognized according to the actual picture cutting position corresponding to the text image to be recognized, to obtain at least one picture to be used.
其中,“第二切分处理”是指按照待识别文本图像对应的实际切图位置对该待识别文本图像进行切分处理的过程。Wherein, the "second segmentation processing" refers to the process of performing segmentation processing on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized.
可见,在获取到待识别文本图像对应的实际切图位置之后,可以按照该实际切图位置 对该待识别文本图像进行切分,得到该待识别文本图像对应的各个切图,并将各个切图分别确定为待使用图片。It can be seen that after obtaining the actual image cutout position corresponding to the text image to be recognized, the text image to be recognized can be segmented according to the actual cutout position to obtain each cutout image corresponding to the text image to be recognized, and each cutout The pictures are respectively determined as pictures to be used.
S5:根据至少一个待使用图片的文字识别结果,确定待识别文本图像的文字识别结果。S5: Determine the text recognition result of the text image to be recognized according to the text recognition result of at least one image to be used.
其中,待使用图片的文字识别结果用于描述该待使用图片携带的字符信息;而且本申请实施例不限定待使用图片的文字识别结果的确定过程,可以采用现有的或者未来出现的任一种文字识别方法进行实施(例如,可以采用OCR模型进行实施)。另外,为了提高文字识别效率,可以将所有待使用图片并行进行文字识别处理,得到各个待使用图片的文字识别结果。Wherein, the text recognition result of the picture to be used is used to describe the character information carried by the picture to be used; and the embodiment of the present application does not limit the determination process of the text recognition result of the picture to be used, any existing or future A character recognition method is implemented (for example, an OCR model can be used for implementation). In addition, in order to improve the efficiency of character recognition, character recognition processing can be performed on all the pictures to be used in parallel to obtain a character recognition result of each picture to be used.
待识别文本图像的文字识别结果用于描述该待识别文本图像携带的字符信息。The character recognition result of the text image to be recognized is used to describe the character information carried by the text image to be recognized.
另外,本申请实施例不限定S5的实施方式,例如,S5具体可以包括:将至少一个待使用图片的文字识别结果按照该至少一个待使用图片对应的排列顺序进行拼接,得到待识别文本图像的文字识别结果。In addition, the embodiment of the present application does not limit the implementation of S5. For example, S5 may specifically include: splicing the text recognition results of at least one picture to be used according to the sequence corresponding to the at least one picture to be used to obtain the text image to be recognized. Text recognition results.
其中,至少一个待使用图片对应的排列顺序用于表示该至少一个待使用图片在待识别文本图像中的位置相邻关系;而且其具体为:排列序号为1的待使用图片与排列序号为2的待使用图片相邻,排列序号为2的待使用图片与排列序号为3的待使用图片相邻,……(以此类推),排列序号为T-1的待使用图片与排列序号为T的待使用图片相邻。其中,T为正整数,T表示待使用图片个数。Wherein, the arrangement order corresponding to at least one picture to be used is used to represent the positional adjacent relationship of the at least one picture to be used in the text image to be recognized; The pictures to be used are adjacent to each other, the picture to be used with the sequence number 2 is adjacent to the picture to be used with the sequence number 3, ... (and so on), the picture to be used with the sequence number T-1 is adjacent to the picture to be used with the sequence number T The images to be used are adjacent to each other. Wherein, T is a positive integer, and T represents the number of pictures to be used.
基于上述S1至S5的相关内容可知,对于本申请实施例提供的文字识别方法来说,在获取到包括长文本的待识别文本图像之后,先将该待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和该至少一个图像切片的位置信息;再根据该至少一个图像切片的单字检测结果以及位置信息,确定该待识别文本图像对应的实际切图位置;然后,按照该待识别文本图像对应的实际切图位置,对该待识别文本图像进行第二切分处理,得到至少一个待使用图片;最后,根据该至少一个待使用图片的文字识别结果,确定该待识别文本图像的文字识别结果,如此能够实现针对长文本的文字识别过程。Based on the relevant content of the above S1 to S5, it can be known that for the character recognition method provided in the embodiment of the present application, after the text image to be recognized including long text is obtained, the text image to be recognized is firstly processed according to the preset slice parameters. Segmentation processing to obtain at least one image slice and the position information of the at least one image slice; then according to the word detection result and the position information of the at least one image slice, determine the actual cutting position corresponding to the text image to be recognized; then, According to the actual cutting position corresponding to the text image to be recognized, the text image to be recognized is subjected to a second segmentation process to obtain at least one picture to be used; finally, according to the text recognition result of the at least one picture to be used, determine the text to be recognized The character recognition result of recognizing the text image, so that the character recognition process for long text can be realized.
可见,因上述“至少一个图像切片的单字检测结果以及位置信息”能够准确地表示出待识别文本图像中至少一个字符的位置信息,使得基于该单字检测结果确定的实际切图位置尽可能地不会出现在字符内部,从而使得在基于该实际切图位置进行切图时尽可能地不会出现切坏字符的现象,如此能够尽可能地避免该待识别文本图像对应的各个切图(也就是,各个待使用图片)中出现不完整字符,从而有利于提高长文本识别的识别准确性。还因各个图像切片的长度远远小于待识别文本图像的长度,使得针对各个图像切片的处理耗时远远小于针对待识别文本图像的处理耗时,如此有利于提高文字识别效率。It can be seen that because the above-mentioned "single character detection result and position information of at least one image slice" can accurately represent the position information of at least one character in the text image to be recognized, the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition. Also, because the length of each image slice is much smaller than the length of the text image to be recognized, the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
基于上述方法实施例提供的文字识别方法,本申请实施例还提供了一种文字识别装置,下面结合附图进行解释和说明。Based on the character recognition method provided by the above method embodiment, the embodiment of the present application also provides a character recognition device, which will be explained and described below with reference to the accompanying drawings.
装置实施例Device embodiment
装置实施例提供的文字识别装置的技术详情,请参照上述方法实施例。For the technical details of the character recognition device provided by the device embodiment, please refer to the above method embodiment.
参见图8,该图为本申请实施例提供的一种文字识别装置的结构示意图。Refer to FIG. 8 , which is a schematic structural diagram of a character recognition device provided by an embodiment of the present application.
本申请实施例提供的文字识别装置800,包括:The character recognition device 800 provided in the embodiment of the present application includes:
第一切分单元801,用于在获取到待识别文本图像之后,将所述待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和所述至少一个图像切片的位置信息;其中,所述待识别文本图像包括长文本;The first segmentation unit 801 is configured to, after acquiring the text image to be recognized, perform the first segmentation process on the text image to be recognized according to preset slice parameters, to obtain at least one image slice and the at least one image slice Location information; wherein, the text image to be recognized includes long text;
位置确定单元802,用于根据所述至少一个图像切片的单字检测结果和所述至少一个图像切片的位置信息,确定所述待识别文本图像对应的实际切图位置;A position determination unit 802, configured to determine the actual cut-out position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice;
第二切分单元803,用于按照所述待识别文本图像对应的实际切图位置,对所述待识别文本图像进行第二切分处理,得到至少一个待使用图片;The second segmentation unit 803 is configured to perform a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one picture to be used;
结果确定单元804,用于根据所述至少一个待使用图片的文字识别结果,确定所述待识别文本图像的文字识别结果。The result determining unit 804 is configured to determine a character recognition result of the to-be-recognized text image according to the character recognition result of the at least one image to be used.
在一种可能的实施方式中,所述位置确定单元802,具体用于:根据所述至少一个图像切片的单字检测结果、所述至少一个图像切片的位置信息、和所述待识别文本图像对应的预设切图位置,确定所述待识别文本图像对应的实际切图位置。In a possible implementation manner, the position determination unit 802 is specifically configured to: according to the word detection result of the at least one image slice, the position information of the at least one image slice, and the text image to be recognized The preset image cutting position is used to determine the actual image cutting position corresponding to the text image to be recognized.
在一种可能的实施方式中,所述位置确定单元802,具体用于:将所述至少一个图像切片的单字检测结果按照所述至少一个图像切片的位置信息进行拼接处理,得到所述待识别文本图像的单字检测结果;根据所述待识别文本图像的单字检测结果和所述待识别文本图像对应的预设切图位置,确定所述待识别文本图像对应的实际切图位置。In a possible implementation manner, the position determination unit 802 is specifically configured to: splicing the word detection results of the at least one image slice according to the position information of the at least one image slice to obtain the to-be-recognized The word detection result of the text image; according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, determine the actual cutting position corresponding to the text image to be recognized.
在一种可能的实施方式中,所述预设切片参数包括切分间隔和切分偏移长度;其中,所述切分偏移长度小于所述切分间隔;In a possible implementation manner, the preset slice parameters include a segmentation interval and a segmentation offset length; wherein, the segmentation offset length is smaller than the segmentation interval;
所述第一切分单元801,包括:The first dividing unit 801 includes:
区域切除子单元,用于从所述待识别文本图像中切除具有所述切分偏移长度的图像区域,得到待切分图像;a region cutting subunit, configured to cut an image region having the segmentation offset length from the text image to be recognized to obtain the image to be segmented;
图像切片子单元,用于将所述待切分图像按照所述切分间隔进行切分处理,得到至少一个图像切片。The image slice subunit is configured to perform segmentation processing on the image to be segmented according to the segmentation interval to obtain at least one image slice.
在一种可能的实施方式中,所述预设切片参数还包括切除起始位置;In a possible implementation manner, the preset slicing parameters also include a resection starting position;
所述区域切除子单元,具体用于:根据所述切除起始位置和所述切分偏移长度,确定切除区域位置;按照所述切除区域位置对所述待识别文本图像进行区域切除处理,得到所述待切分图像。The region cutting subunit is specifically configured to: determine the position of the cutting region according to the cutting start position and the segmentation offset length; perform region cutting processing on the text image to be recognized according to the cutting region position, Obtain the image to be segmented.
在一种可能的实施方式中,所述至少一个图像切片的单字检测结果的确定过程,包括:利用预先构建的单字检测模型对所述至少一个图像切片进行并行单字检测处理,得到所述至少一个图像切片的单字检测结果;其中,所述单字检测模型是根据样本文本图像和所述样本文本图像中各个字符的实际位置进行构建的。In a possible implementation manner, the process of determining the word detection result of the at least one image slice includes: using a pre-built word detection model to perform parallel word detection processing on the at least one image slice to obtain the at least one The single character detection result of the image slice; wherein, the single character detection model is constructed according to the sample text image and the actual position of each character in the sample text image.
基于上述文字识别装置800的相关内容可知,对于文字识别装置800来说,在获取到包括长文本的待识别文本图像之后,先将该待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和该至少一个图像切片的位置信息;再根据该至少一个图像切片的单字检测结果以及位置信息,确定该待识别文本图像对应的实际切图位置;然后,按照该待识别文本图像对应的实际切图位置,对该待识别文本图像进行第二切分处理,得到至少一个待使用图片;最后,根据该至少一个待使用图片的文字识别结果,确定该待识 别文本图像的文字识别结果,如此能够实现针对长文本的文字识别过程。Based on the relevant content of the above-mentioned character recognition device 800, it can be seen that for the character recognition device 800, after obtaining the text image to be recognized including long text, the first segmentation process is performed on the text image to be recognized according to the preset slice parameters , to obtain at least one image slice and the position information of the at least one image slice; then according to the word detection result and the position information of the at least one image slice, determine the actual cut-out position corresponding to the text image to be recognized; then, according to the to-be-recognized The text image corresponds to the actual image cutting position, and the second segmentation process is performed on the text image to be recognized to obtain at least one image to be used; finally, according to the text recognition result of the at least one image to be used, determine the text image to be identified Text recognition results, so that the text recognition process for long texts can be implemented.
可见,因上述“至少一个图像切片的单字检测结果以及位置信息”能够准确地表示出待识别文本图像中至少一个字符的位置信息,使得基于该单字检测结果确定的实际切图位置尽可能地不会出现在字符内部,从而使得在基于该实际切图位置进行切图时尽可能地不会出现切坏字符的现象,如此能够尽可能地避免该待识别文本图像对应的各个切图(也就是,各个待使用图片)中出现不完整字符,从而有利于提高长文本识别的识别准确性。还因各个图像切片的长度远远小于待识别文本图像的长度,使得针对各个图像切片的处理耗时远远小于针对待识别文本图像的处理耗时,如此有利于提高文字识别效率。It can be seen that because the above-mentioned "single character detection result and position information of at least one image slice" can accurately represent the position information of at least one character in the text image to be recognized, the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition. Also, because the length of each image slice is much smaller than the length of the text image to be recognized, the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
进一步地,本申请实施例还提供了一种设备,所述设备包括处理器以及存储器:Further, the embodiment of the present application also provides a device, the device includes a processor and a memory:
所述存储器用于存储计算机程序;The memory is used to store computer programs;
所述处理器用于根据所述计算机程序执行本申请实施例提供的文字识别方法的任一实施方式。The processor is configured to execute any implementation of the character recognition method provided in the embodiments of the present application according to the computer program.
进一步地,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行本申请实施例提供的文字识别方法的任一实施方式。Further, the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any of the character recognition methods provided in the embodiment of the present application. implementation.
进一步地,本申请实施例还提供了一种计算机程序产品,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行本申请实施例提供的文字识别方法的任一实施方式。Furthermore, the embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制。虽然本发明已以较佳实施例揭露如上,然而并非用以限定本发明。任何熟悉本领域的技术人员,在不脱离本发明技术方案范围情况下,都可利用上述揭示的方法和技术内容对本发明技术方案做出许多可能的变动和修饰,或修改为等同变化的等效实施例。因此,凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所做的任何简单修改、等同变化及修饰,均仍属于本发明技术方案保护的范围内。The above descriptions are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Any person familiar with the art, without departing from the scope of the technical solution of the present invention, can use the methods and technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it into an equivalent of equivalent change Example. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention, which do not deviate from the technical solution of the present invention, still fall within the protection scope of the technical solution of the present invention.

Claims (10)

  1. 一种文字识别方法,其特征在于,所述方法包括:A character recognition method, characterized in that the method comprises:
    在获取到待识别文本图像之后,将所述待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和所述至少一个图像切片的位置信息;其中,所述待识别文本图像包括长文本;After the text image to be recognized is acquired, the text image to be recognized is subjected to a first segmentation process according to preset slice parameters to obtain at least one image slice and position information of the at least one image slice; wherein, the text image to be recognized Text images include long text;
    根据所述至少一个图像切片的单字检测结果和所述至少一个图像切片的位置信息,确定所述待识别文本图像对应的实际切图位置;According to the word detection result of the at least one image slice and the position information of the at least one image slice, determine the actual image cutting position corresponding to the text image to be recognized;
    按照所述待识别文本图像对应的实际切图位置,对所述待识别文本图像进行第二切分处理,得到至少一个待使用图片;Performing a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one picture to be used;
    根据所述至少一个待使用图片的文字识别结果,确定所述待识别文本图像的文字识别结果。According to the character recognition result of the at least one image to be used, the character recognition result of the text image to be recognized is determined.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个图像切片的单字检测结果和所述至少一个图像切片的位置信息,确定所述待识别文本图像对应的实际切图位置,包括:The method according to claim 1, characterized in that, according to the word detection result of the at least one image slice and the position information of the at least one image slice, the actual cutting position corresponding to the text image to be recognized is determined ,include:
    根据所述至少一个图像切片的单字检测结果、所述至少一个图像切片的位置信息、和所述待识别文本图像对应的预设切图位置,确定所述待识别文本图像对应的实际切图位置。According to the word detection result of the at least one image slice, the position information of the at least one image slice, and the preset cut position corresponding to the text image to be recognized, determine the actual cut position corresponding to the text image to be recognized .
  3. 根据权利要求2所述的方法,其特征在于,所述待识别文本图像对应的实际切图位置的确定过程,包括:The method according to claim 2, wherein the process of determining the actual cutting position corresponding to the text image to be recognized includes:
    将所述至少一个图像切片的单字检测结果按照所述至少一个图像切片的位置信息进行拼接处理,得到所述待识别文本图像的单字检测结果;performing splicing processing on the word detection result of the at least one image slice according to the position information of the at least one image slice, to obtain the word detection result of the text image to be recognized;
    根据所述待识别文本图像的单字检测结果和所述待识别文本图像对应的预设切图位置,确定所述待识别文本图像对应的实际切图位置。According to the single character detection result of the text image to be recognized and the preset cut position corresponding to the text image to be recognized, the actual cut position corresponding to the text image to be recognized is determined.
  4. 根据权利要求1所述的方法,其特征在于,所述预设切片参数包括切分间隔和切分偏移长度;其中,所述切分偏移长度小于所述切分间隔;The method according to claim 1, wherein the preset slice parameters include a segmentation interval and a segmentation offset length; wherein the segmentation offset length is smaller than the segmentation interval;
    所述至少一个图像切片的确定过程,包括:The process of determining the at least one image slice includes:
    从所述待识别文本图像中切除具有所述切分偏移长度的图像区域,得到待切分图像;cutting the image region with the segmentation offset length from the text image to be recognized to obtain the image to be segmented;
    将所述待切分图像按照所述切分间隔进行切分处理,得到至少一个图像切片。Segmenting the image to be segmented according to the segmentation interval to obtain at least one image slice.
  5. 根据权利要求4所述的方法,其特征在于,所述预设切片参数还包括切除起始位置;The method according to claim 4, wherein the preset slicing parameters further include a resection starting position;
    所述待切分图像的确定过程,包括:The determination process of the image to be segmented includes:
    根据所述切除起始位置和所述切分偏移长度,确定切除区域位置;Determine the position of the resection area according to the resection start position and the segmentation offset length;
    按照所述切除区域位置对所述待识别文本图像进行区域切除处理,得到所述待切分图像。Perform region cutting processing on the to-be-recognized text image according to the cut-off region position to obtain the to-be-segmented image.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述至少一个图像切片的单字检测结果的确定过程,包括:The method according to any one of claims 1-5, wherein the determination process of the word detection result of the at least one image slice comprises:
    利用预先构建的单字检测模型对所述至少一个图像切片进行并行单字检测处理,得到所述至少一个图像切片的单字检测结果;其中,所述单字检测模型是根据样本文本图像和所述样本文本图像中各个字符的实际位置进行构建的。Using a pre-built word detection model to perform parallel word detection processing on the at least one image slice to obtain a word detection result of the at least one image slice; wherein the word detection model is based on the sample text image and the sample text image The actual position of each character in is constructed.
  7. 一种文字识别装置,其特征在于,包括:A character recognition device, characterized in that it comprises:
    第一切分单元,用于在获取到待识别文本图像之后,将所述待识别文本图像按照预设切片参数进行第一切分处理,得到至少一个图像切片和所述至少一个图像切片的位置信息;其中,所述待识别文本图像包括长文本;The first segmentation unit is configured to perform a first segmentation process on the text image to be recognized according to preset slice parameters after acquiring the text image to be recognized, to obtain at least one image slice and the position of the at least one image slice Information; wherein, the text image to be recognized includes long text;
    位置确定单元,用于根据所述至少一个图像切片的单字检测结果和所述至少一个图像切片的位置信息,确定所述待识别文本图像对应的实际切图位置;A position determination unit, configured to determine the actual cut-out position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice;
    第二切分单元,用于按照所述待识别文本图像对应的实际切图位置,对所述待识别文本图像进行第二切分处理,得到至少一个待使用图片;The second segmentation unit is configured to perform a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one image to be used;
    结果确定单元,用于根据所述至少一个待使用图片的文字识别结果,确定所述待识别文本图像的文字识别结果。The result determination unit is configured to determine the character recognition result of the text image to be recognized according to the character recognition result of the at least one picture to be used.
  8. 一种设备,其特征在于,所述设备包括处理器以及存储器:A device, characterized in that the device includes a processor and a memory:
    所述存储器用于存储计算机程序;The memory is used to store computer programs;
    所述处理器用于根据所述计算机程序执行权利要求1-6中任一项所述的方法。The processor is configured to execute the method according to any one of claims 1-6 according to the computer program.
  9. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行权利要求1-6中任一项所述的方法。A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the method according to any one of claims 1-6.
  10. 一种计算机程序产品,其特征在于,所述计算机程序产品在终端设备上运行时,使得所述终端设备执行权利要求1-6中任一项所述的方法。A computer program product, characterized in that, when the computer program product is run on a terminal device, the terminal device is made to execute the method according to any one of claims 1-6.
PCT/CN2022/107728 2021-08-26 2022-07-26 Character recognition method and related device thereof WO2023024793A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110988932.1 2021-08-26
CN202110988932.1A CN113657369A (en) 2021-08-26 2021-08-26 Character recognition method and related equipment thereof

Publications (1)

Publication Number Publication Date
WO2023024793A1 true WO2023024793A1 (en) 2023-03-02

Family

ID=78492998

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107728 WO2023024793A1 (en) 2021-08-26 2022-07-26 Character recognition method and related device thereof

Country Status (2)

Country Link
CN (1) CN113657369A (en)
WO (1) WO2023024793A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657369A (en) * 2021-08-26 2021-11-16 北京有竹居网络技术有限公司 Character recognition method and related equipment thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738602A (en) * 2019-09-12 2020-01-31 北京三快在线科技有限公司 Image processing method and device, electronic equipment and readable storage medium
WO2020155763A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Ocr recognition method and electronic device thereof
CN111582085A (en) * 2020-04-26 2020-08-25 中国工商银行股份有限公司 Document shooting image identification method and device
CN113657369A (en) * 2021-08-26 2021-11-16 北京有竹居网络技术有限公司 Character recognition method and related equipment thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298982B (en) * 2013-07-16 2019-03-08 深圳市腾讯计算机系统有限公司 A kind of character recognition method and device
CN105046254A (en) * 2015-07-17 2015-11-11 腾讯科技(深圳)有限公司 Character recognition method and apparatus
CN105678293A (en) * 2015-12-30 2016-06-15 成都数联铭品科技有限公司 Complex image and text sequence identification method based on CNN-RNN
CN106056114B (en) * 2016-05-24 2019-07-05 腾讯科技(深圳)有限公司 Contents of visiting cards recognition methods and device
CN110991437B (en) * 2019-11-28 2023-11-14 嘉楠明芯(北京)科技有限公司 Character recognition method and device, training method and device for character recognition model
CN113139629A (en) * 2020-01-16 2021-07-20 武汉金山办公软件有限公司 Font identification method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155763A1 (en) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Ocr recognition method and electronic device thereof
CN110738602A (en) * 2019-09-12 2020-01-31 北京三快在线科技有限公司 Image processing method and device, electronic equipment and readable storage medium
CN111582085A (en) * 2020-04-26 2020-08-25 中国工商银行股份有限公司 Document shooting image identification method and device
CN113657369A (en) * 2021-08-26 2021-11-16 北京有竹居网络技术有限公司 Character recognition method and related equipment thereof

Also Published As

Publication number Publication date
CN113657369A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
US20210201445A1 (en) Image cropping method
US11275961B2 (en) Character image processing method and apparatus, device, and storage medium
CN109146892B (en) Image clipping method and device based on aesthetics
WO2021017260A1 (en) Multi-language text recognition method and apparatus, computer device, and storage medium
US20190188222A1 (en) Thumbnail-Based Image Sharing Method and Terminal
CN110136198B (en) Image processing method, apparatus, device and storage medium thereof
CN111027563A (en) Text detection method, device and recognition system
US20120076423A1 (en) Near-duplicate image detection
RU2697649C1 (en) Methods and systems of document segmentation
WO2019128254A1 (en) Image analysis method and apparatus, and electronic device and readable storage medium
CN110942004A (en) Handwriting recognition method and device based on neural network model and electronic equipment
CN112101317B (en) Page direction identification method, device, equipment and computer readable storage medium
JP2021166070A (en) Document comparison method, device, electronic apparatus, computer readable storage medium and computer program
WO2023024793A1 (en) Character recognition method and related device thereof
CN113903036B (en) Text recognition method and device, electronic equipment, medium and product
EP3910590A2 (en) Method and apparatus of processing image, electronic device, and storage medium
US20230237633A1 (en) Image processing method and apparatus, system, and storage medium
WO2023147717A1 (en) Character detection method and apparatus, electronic device and storage medium
CN111612004A (en) Image clipping method and device based on semantic content
CN113642584A (en) Character recognition method, device, equipment, storage medium and intelligent dictionary pen
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
CN112581355A (en) Image processing method, image processing device, electronic equipment and computer readable medium
WO2020232866A1 (en) Scanned text segmentation method and apparatus, computer device and storage medium
CN113902899A (en) Training method, target detection method, device, electronic device and storage medium
WO2013097072A1 (en) Method and apparatus for recognizing a character of a video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22860139

Country of ref document: EP

Kind code of ref document: A1