WO2023024793A1 - Procédé de reconnaissance de caractères et dispositif qui lui est associé - Google Patents

Procédé de reconnaissance de caractères et dispositif qui lui est associé Download PDF

Info

Publication number
WO2023024793A1
WO2023024793A1 PCT/CN2022/107728 CN2022107728W WO2023024793A1 WO 2023024793 A1 WO2023024793 A1 WO 2023024793A1 CN 2022107728 W CN2022107728 W CN 2022107728W WO 2023024793 A1 WO2023024793 A1 WO 2023024793A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recognized
text
text image
slice
Prior art date
Application number
PCT/CN2022/107728
Other languages
English (en)
Chinese (zh)
Inventor
蔡悦
张宇轩
黄灿
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023024793A1 publication Critical patent/WO2023024793A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Definitions

  • the present application relates to the technical field of data processing, in particular to a character recognition method and related equipment.
  • the application range of character recognition technology is getting wider and wider.
  • the character recognition technology is used to perform recognition processing on characters appearing in an image.
  • long text recognition refers to a process of character recognition for an image including long text.
  • the present application provides a character recognition method and related equipment, which can improve the recognition accuracy of long text recognition.
  • An embodiment of the present application provides a character recognition method, the method comprising:
  • the text image to be recognized is acquired, the text image to be recognized is subjected to a first segmentation process according to preset slice parameters to obtain at least one image slice and position information of the at least one image slice; wherein, the text image to be recognized Text images include long text;
  • the word detection result of the at least one image slice and the position information of the at least one image slice determine the actual image cutting position corresponding to the text image to be recognized
  • the character recognition result of the at least one image to be used is determined.
  • the determining the actual image cutting position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice includes:
  • the position information of the at least one image slice, and the preset cut position corresponding to the text image to be recognized determine the actual cut position corresponding to the text image to be recognized .
  • the process of determining the actual cutting position corresponding to the text image to be recognized includes:
  • the word detection result of described at least one image slice is spliced according to the positional information of described at least one image slice, obtains the word detection result of described to-be-recognized text image;
  • the actual cut position corresponding to the text image to be recognized is determined.
  • the preset slice parameters include a segmentation interval and a segmentation offset length; wherein, the segmentation offset length is smaller than the segmentation interval;
  • the process of determining the at least one image slice includes:
  • Segmenting the image to be segmented according to the segmentation interval to obtain at least one image slice.
  • the preset slicing parameters also include a resection starting position
  • the determination process of the image to be segmented includes:
  • the process of determining the word detection result of the at least one image slice includes:
  • the embodiment of the present application also provides a text recognition device, including:
  • the first segmentation unit is configured to perform a first segmentation process on the text image to be recognized according to preset slice parameters after acquiring the text image to be recognized, to obtain at least one image slice and the position of the at least one image slice Information; wherein, the text image to be recognized includes long text;
  • a position determination unit configured to determine the actual cut-out position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice;
  • the second segmentation unit is configured to perform a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one image to be used;
  • the result determination unit is configured to determine the character recognition result of the text image to be recognized according to the character recognition result of the at least one image to be used.
  • the embodiment of the present application also provides a device, the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the character recognition method provided in the embodiments of the present application according to the computer program.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
  • Fig. 1 is the flowchart of a kind of character recognition method that the embodiment of the present application provides;
  • FIG. 2 is a schematic diagram of a text image to be recognized provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of another text image to be recognized provided in the embodiment of the present application.
  • FIG. 4 is a schematic diagram of an image slice processing process provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a comparison of two character recognition processes provided by the embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a single word detection model provided in the embodiment of the present application.
  • FIG. 7 is a schematic diagram of a character recognition process provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a character recognition device provided by an embodiment of the present application.
  • substantial reduction usually greatly reduces the image definition, the reduced image is prone to blurred image content, so that the character recognition result determined based on the reduced image is inaccurate , so that the recognition accuracy of long text recognition is low.
  • the embodiment of the present application provides a text recognition method, the method includes: after obtaining the text image to be recognized including long text, first the text image to be recognized according to Performing the first segmentation process with preset slice parameters to obtain at least one image slice and the position information of the at least one image slice; cutting position; then, according to the actual cutting position corresponding to the text image to be recognized, the text image to be recognized is subjected to a second segmentation process to obtain at least one picture to be used; finally, according to the text of the at least one picture to be used
  • the recognition result determines the character recognition result of the text image to be recognized, so that the character recognition process for long text can be realized.
  • the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition.
  • the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
  • the embodiment of the present application does not limit the subject of the character recognition method.
  • the character recognition method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers.
  • the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer.
  • the server can be an independent server, a cluster server or a cloud server.
  • FIG. 1 this figure is a flow chart of a character recognition method provided by an embodiment of the present application.
  • the character recognition method provided in the embodiment of this application includes S1-S5:
  • the text image to be recognized refers to an image that requires character recognition processing (especially long text recognition processing); and the text image to be recognized includes long text (especially super long text).
  • long text refers to a text whose number of characters exceeds the first threshold; moreover, the first threshold can be preset.
  • Extremely long text refers to text whose number of characters exceeds a second threshold; the second threshold can be preset, and the second threshold is greater than the above-mentioned "first threshold”.
  • the text image to be recognized may be the image to be processed as shown in FIG. 2 , or the text image corresponding to the image to be processed as shown in FIG. 3 .
  • the "text image corresponding to the image to be processed” refers to an image cut from the image to be processed according to the text detection result of the image to be processed.
  • Example 1 may specifically include: after the image to be processed is acquired, the image to be processed may be directly determined as the text image to be recognized.
  • S1 in order to avoid as much as possible other image information other than text in the image to be processed from causing adverse effects on long text recognition, S1 can specifically include S11-S12:
  • the image to be processed refers to an image that requires image processing (such as text detection and/or character recognition); and the embodiment of the present application does not limit the image to be processed, for example, the image to be processed may be a frame of video image.
  • the text detection result of the image to be processed is used to describe the position of the text in the image to be processed (for example, "this is an image including long text") in the image to be processed.
  • the embodiment of the present application does not limit the implementation of "text detection” in S11, and any existing or future method capable of text detection for images can be used for implementation.
  • the image area corresponding to the text detection result in the image to be processed is cut off to obtain the text image to be recognized (as shown in Figure 2 ). 3), so that the text image to be recognized can more accurately represent the character information carried by the image to be processed.
  • the text image to be recognized can be determined according to the image to be processed, so that the text image to be recognized is used to represent the image to be processed
  • the character information carried by the image so that the character information carried by the image to be processed can be accurately determined subsequently based on the text image to be recognized.
  • S2 Perform a first segmentation process on the text image to be recognized according to preset slice parameters to obtain at least one image slice and position information of the at least one image slice.
  • the "preset slice parameter” refers to the parameter that needs to be referred to when performing the first segmentation process on the text image to be recognized; and the embodiment of the present application does not limit the “preset slice parameter", for example, it may include the segmentation interval.
  • Segmentation interval is used to indicate the distance between two adjacent segmentation positions when the first segmentation process is performed on the text image to be recognized; and the embodiment of the present application does not limit the “segmentation interval” (for example, FIG. 4 512 pixels shown).
  • the "first slicing process” is used to indicate the slicing process implemented according to the above-mentioned “preset slicing parameters”.
  • At least one image slice refers to at least one image segment obtained after the first segmentation process of the text image to be recognized; and "position information of at least one image slice” is used to describe the position of each image slice in the text image to be recognized location.
  • the embodiment of the present application does not limit the process of determining "at least one image slice".
  • two possible implementation manners are described below in combination.
  • the determination process of "at least one image slice” may specifically include: first segmenting the text image to be recognized according to the segmentation interval Processing to obtain at least one image slice, so that the length of each image slice is the above-mentioned "segment interval” (eg, 512 pixels as shown in FIG. 4 ).
  • the embodiment of the present application also provides another possible implementation of determining "at least one image slice".
  • the determination process of "at least one image slice” may specifically include S21-S22:
  • the "segmentation offset length" is used to indicate the segmentation offset to be used when performing the first segmentation process on the text image to be recognized; and the “segmentation offset length” can be smaller than the above-mentioned “segmentation interval ".
  • the embodiment of the present application does not limit the "segmentation offset length". For example, as shown in FIG. 4, when the above-mentioned "segmentation interval" is 512 pixels, the “segmentation offset length" can be 256 pixels .
  • the embodiment of the present application does not limit the position of the above-mentioned "image area with segmentation offset length", for example, it may be located in the leftmost area of the text image to be recognized (as shown in Figure 4), or it may be located in the text image to be recognized The rightmost region of the text image may also be located in a preset inner region of the text image to be recognized.
  • S21 may specifically include S211-S212:
  • S211 Determine the position of the resection area according to the resection starting position and the segmentation offset length.
  • the "cutting start position” is used to indicate the position of a boundary position (such as the left end boundary position) of the above-mentioned "image area with segmentation offset length" in the above-mentioned "text image to be recognized”; and the present application
  • the embodiment does not limit the “cutting start position", for example, as shown in FIG. 4 , it may be the left boundary position of the text image to be recognized.
  • Excision area position is used to indicate the position of the above-mentioned “image area with segmentation offset length” in the "text image to be recognized”; and the length of the "excision area position” is the above-mentioned “segmentation offset length” , the boundary position of the "resection area position” includes the above-mentioned “resection start position”.
  • S212 Execute region cutting processing on the text image to be recognized according to the position of the cutting region to obtain the image to be segmented.
  • the image area occupying the position of the cut-out area (that is, the above-mentioned "image area with a segmentation offset length") can be cut off from the text image to be recognized, and The remaining area of the text image to be recognized is determined as the image to be segmented, so that the image to be segmented is used to represent other image areas in the text image to be recognized except the above-mentioned "image area with a segmentation offset length", Therefore, the image to be segmented does not include the above-mentioned "image region with a segmented offset length".
  • the image area with the segmentation offset length can be cut off from the text image to be recognized to obtain the image to be segmented, so that the image to be segmented does not include
  • image region with segmentation offset length is used to perform segmentation processing on the image to be segmented subsequently.
  • S22 Perform segmentation processing on the image to be segmented according to the segmentation interval to obtain at least one image slice.
  • the image to be segmented may be segmented according to the segmentation interval to obtain at least one image slice (a plurality of image slices as shown in FIG. 4 ).
  • the segmenting position used when performing segmentation processing on the "image to be segmented” is relative to the "text image to be recognized”.
  • Text image has a certain amount of offset, so that the segmentation position used for the above-mentioned “image to be segmented” is almost impossible to be the same as the following “preset image segmentation position” position, which can effectively avoid the above-mentioned adverse effects caused by the phenomenon of broken characters in the "first segmentation process".
  • the text image to be recognized can be subjected to the first segmentation process according to the preset slice parameters to obtain at least one image slice and the position information of the at least one image slice, In order to realize the character recognition result for the "text image to be recognized" based on the at least one image slice later.
  • the processing process for each image slice The time consumption is far less than the time consumption for processing the image of the text to be recognized, which is beneficial to improve the efficiency of text recognition.
  • the "single character detection result of at least one image slice" is used to indicate the position of each character in each image slice.
  • the embodiment of the present application does not limit the determination process of "the word detection result of at least one image slice". For example, word detection processing may be performed on each image slice to obtain the word detection result of each image slice. It should be noted that the embodiment of the present application does not limit the implementation manner of "single word detection processing", for example, any existing or future single word detection method may be used for implementation. As another example, the "single character detection model” shown below can be used for implementation.
  • the embodiment of the present application also provides another possible implementation manner of determining "the word detection result of at least one image slice", which may specifically include: using a pre-built word detection model to at least A parallel word detection process is performed on one image slice to obtain a word detection result of the at least one image slice.
  • the single-character detection model is used to perform character position detection (for example, to perform character boundary position detection) on the input data of the single-character detection model.
  • the word detection model 600 may include a feature extraction layer 601 and a word position determination layer 602, and the The input data of the word position determination layer 602 includes the output data of the feature extraction layer 601 .
  • the process of determining the single character detection result of the above target image is taken as an example to describe below.
  • the "target image” is used to represent any image slice in the above "at least one image slice”.
  • the process of using the word detection model 600 to determine the above-mentioned "single word detection result” may specifically include steps 11-12:
  • Step 11 Input the target image into the feature extraction layer 601 to obtain the image position feature output by the feature extraction layer 601 .
  • the feature extraction layer 601 is used to perform feature extraction for the input data of the feature extraction layer 601; and the embodiment of the present application does not limit the feature extraction layer 601, for example, the feature extraction layer 601 can use any convolutional neural network ( Convolutional Neural Networks, CNN) for implementation (eg, (Visual Geometry Group, VGG) network can be used for implementation).
  • CNN Convolutional Neural Networks
  • VGG Visual Geometry Group
  • the image position feature is used to represent the information carried by each position in the target image (especially, the information carried by each position in the width direction).
  • the embodiment of the present application does not limit the image position feature.
  • the target image is a [C, H, W] matrix
  • the image position feature may be a [1, 1, W/4] matrix.
  • Step 12 Input the feature of the image position into the word position determination layer 602, and obtain the word detection result of the target image output by the word position determination layer 602.
  • the character position determining layer 602 is used for performing character boundary position recognition processing on the input data of the character position determining layer 602 .
  • the embodiment of the present application does not limit the word position determination layer 602.
  • the word location determination layer 602 may include a location classification layer and a location mapping layer, and the input data of the location mapping layer includes the output data of the location classification layer.
  • the determination process of the above-mentioned "single character detection result" may include steps 21-22:
  • Step 21 Input the image position feature into the position classification layer, and obtain the position classification result output by the position classification layer.
  • the position classification layer is used to judge whether the input data of the position classification layer belongs to the character boundary position.
  • the embodiment of the present application does not limit the implementation manner of the position classification layer, and any existing or future classifier (eg, softmax, etc.) may be used for implementation.
  • any existing or future classifier eg, softmax, etc.
  • the position classification result is used to indicate whether each position in the target image belongs to a character boundary (especially, whether each position in the width direction of the target image belongs to a character boundary).
  • Step 22 Input the position classification result into the position mapping layer, and obtain the word detection result of the target image output by the position mapping layer.
  • the position mapping layer is used for performing mapping processing on the position mapping layer.
  • the embodiment of the present application does not limit the working principle of the location mapping layer.
  • the location mapping layer may map each location in the location classification result according to formula (1).
  • y represents the mapped position coordinate corresponding to x
  • "a” represents the ratio between the width of the target image and the width of the image position feature (for example, 4);
  • x represents a position coordinate in the position classification result ( In particular, a position coordinate of the position classification result in the width direction);
  • “b” represents the convolution offset used in the feature extraction layer 601 .
  • the width of the image position feature is smaller than the width of the target image (for example, the width of the image position feature is 1/4 of the width of the target image)
  • the position classification result determined based on the image position feature The width of is also smaller than the width of the target image (for example, the width of the position classification result is also 1/4 of the width of the target image), at this time, in order to describe more accurately whether each position of the target image in the width direction belongs to the character
  • Each position coordinate of the position classification result in the width direction can be mapped to the position coordinates of the target image in the width direction according to formula (1).
  • step 11 to step 12 Based on the relevant content of the above step 11 to step 12, it can be known that for the word detection model 600 shown in FIG. Perform feature extraction processing and single character position determination processing, obtain and output the single character detection result of the target image, so that the single character detection result can accurately represent the boundary position of each character in the target image.
  • the single character detection model can be constructed in advance according to the sample text image and the actual position of each character in the sample text image.
  • the sample text image refers to the image used to construct the word detection model; and the embodiment of the present application does not limit the number of sample text images.
  • the embodiment of the present application does not limit the actual position of each character in the sample text image, for example, it may be the actual boundary position of each character in the sample text image.
  • the embodiment of the present application does not limit the construction process of the single word detection model, for example, in a possible implementation manner, the construction process of the single word detection model may include steps 31-step 34:
  • Step 31 Input the sample text image into the model to be trained, and obtain the predicted character position of the sample text image output by the model to be trained.
  • the model to be trained is used for character position detection (for example, character boundary position detection) for the data input of the model to be trained.
  • the model structure of the model to be trained is the same as the "word detection model" above, so the relevant content of the model structure of the model to be trained can refer to the relevant content of the model structure of the "word detection model” above.
  • the predicted character position of the sample text image is used to describe the predicted position of at least one character in the sample text image.
  • Step 32 Judging whether the preset stop condition is met, if yes, execute step 34; if not, execute step 33.
  • the preset stop condition can be set in advance, for example, the preset stop condition can be that the loss value of the model to be trained is lower than the preset loss threshold, or that the rate of change of the loss value of the model to be trained is lower than the preset
  • the rate-of-change threshold (that is, the character position detection performance of the model to be trained is in a convergent state) may also be that the number of updates of the model to be trained reaches a preset threshold.
  • the loss value of the model to be trained is used to characterize the character position detection performance of the model to be trained; moreover, the embodiment of the present application does not limit the method for determining the loss value of the model to be trained.
  • the preset loss threshold, the preset change rate threshold, and the preset number of times threshold can all be preset.
  • Step 33 Update the model to be trained according to the predicted character position of the sample text image and the actual position of each character in the sample text image, and return to step 31.
  • the prediction of the sample text image can be based on The difference between the character position and the actual position of each character in the sample text image, update the model to be trained in the current round, so that the updated model to be trained has better character position detection performance, and return to continue execution Step 31 and its subsequent steps.
  • Step 34 Determine a word detection model according to the model to be trained.
  • the model to be trained in the current round after it is determined that the model to be trained in the current round has reached the preset stop condition, it can be determined that the model to be trained in the current round has good character position detection performance, so it can be based on the model to be trained in the current round, Determine the single-word detection model (for example, the model to be trained in the current round can be directly determined as the single-word detection model.
  • the model structure and the model of the single-word detection model can be determined according to the model structure and model parameters of the model to be trained in the current round parameters, so that the model structure and model parameters of the word detection model are consistent with the model structure and model parameters of the current round of the model to be trained), so that the word detection model also has better character position detection performance, so that Subsequent word detection results determined for at least one image slice using the word detection model can accurately represent the positions of each character in each image slice.
  • the embodiment of the present application does not limit the determination process of "the actual cutting position corresponding to the text image to be recognized" (That is, the embodiment of S3), for example, can first determine the word position information of the text image to be recognized according to the word detection result of at least one image slice and the position information of the at least one image slice; then according to the text image to be recognized
  • the single character position information of the text image to be recognized is determined to determine the actual cutting position corresponding to the text image to be recognized, so that the "actual cutting position corresponding to the text image to be recognized" will not appear inside the character as much as possible.
  • end users may set character recognition efficiency requirements; or, different application scenarios may correspond to different character recognition efficiency requirements.
  • the embodiment of the present application also provides a possible implementation of S3, which may specifically include: according to the word detection result of at least one image slice, the at least one image slice The position information of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized determine the actual cutting position corresponding to the text image to be recognized.
  • the "preset cutting position corresponding to the text image to be recognized” refers to the cutting position preset for the text image to be recognized; and the “preset cutting position corresponding to the text image to be recognized” is based on the above “text Identification of efficiency needs” identified.
  • this embodiment of the present application does not limit the preset cut position corresponding to the text image to be recognized, for example, it may include at least one hard cut position.
  • the "hard cutting position" is used to indicate a preset cutting position corresponding to the text image to be recognized.
  • the text image to be recognized shown in FIG. 7 is taken as an example for description below.
  • the preset cut position corresponding to the text image to be recognized may be ⁇ 512, 1024, 1536, 2048 ⁇ .
  • "512", “1024", “1536", and “2048” are hard-cut positions corresponding to the text image to be recognized.
  • the embodiment of the present application does not limit the process of determining the preset cutting position corresponding to the text image to be recognized.
  • it may specifically include steps 41-42:
  • Step 41 Obtain preset segmentation parameters.
  • the "preset segmentation parameter” is used to indicate the maximum width of a cut image (that is, the distance between two adjacent hard cut positions in the above-mentioned "preset cut image position”); and the preset cut image
  • the sub-parameters can be preset according to the application scenario (in particular, they can be set according to the character recognition efficiency requirement in the application scenario).
  • the preset segmentation parameter may be 512 pixel values.
  • Step 42 According to the preset segmentation parameters and the text image to be recognized, determine the preset cutting position corresponding to the text image to be recognized.
  • the preset segmentation parameters can be referred to to determine the preset cutting position corresponding to the text image to be recognized (as ⁇ 512, 1024, 1536, 2048 ⁇ in Fig. 7) , so that the position interval between adjacent positions in the preset image cutting position does not exceed the preset cutting parameter.
  • the preset cut-out position corresponding to the text image to be recognized can be determined according to the application scenario (in particular, it can be determined according to the text recognition efficiency requirements in the application scenario), so that based on the preset
  • the actual image cutting position determined by the image cutting position can perform image segmentation on the premise of meeting the text recognition efficiency requirements in this application scenario, so that the text recognition method provided by this application can meet the text recognition efficiency requirements in this application scenario.
  • this embodiment of the present application does not limit the above-mentioned implementation of determining the actual cutting position corresponding to the text image to be recognized by referring to the above-mentioned "preset cutting position". For example, it may specifically include steps 51-52:
  • Step 51 Concatenate the word detection result of at least one image slice according to the position information of the at least one image slice, to obtain the word detection result of the text image to be recognized.
  • the "single word detection result of the text image to be recognized" is used to describe the position of at least one character in the text image to be recognized.
  • this embodiment of the present application does not limit the "single character detection result of the text image to be recognized", for example, the single character detection result may include at least one boundary position.
  • boundary position is used to indicate the edge position of a character.
  • the word detection result of the text image to be recognized may be ⁇ 43, 82, 293, 309, . . . ⁇ .
  • “43” represents the left boundary of "this”
  • “82” represents the right boundary of “this”
  • “293” represents the left boundary of "yes”
  • "309” represents the right boundary of “yes”
  • the word detection result of the at least one image slice can be spliced according to the position information of the at least one image slice to obtain the text to be recognized
  • the single-character detection result of the image so that the "single-character detection result of the text image to be recognized" is used to describe the position of at least one character in the text image to be recognized.
  • Step 52 Determine the actual cutting position corresponding to the text image to be recognized according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
  • the actual cutting position corresponding to the text image to be recognized can be determined by referring to the above two and the determination process can specifically include: as shown in Figure 7, a preset algorithm can be used to match the preset cut-out position corresponding to the text image to be recognized with the word detection result of the text image to be recognized to obtain the text image to be recognized The actual cutting position corresponding to the text image.
  • the preset algorithm may be preset, for example, the preset algorithm may be a greedy algorithm or a Hungarian algorithm.
  • step 52 In order to facilitate the understanding of step 52, the following will be described in conjunction with an example.
  • Step 52 may specifically include Step 61-Step 63:
  • Step 61 Determine a first position set and a second position set according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
  • the number of positions in the first position set is not less than the number of positions in the second position set. That is, the first position set refers to a set with more image cutting positions, and the second position set refers to a set with fewer image cutting positions.
  • step 61 may specifically include step 611-step 612:
  • Step 611 If the number of boundary positions is not less than the number of hard-cut positions, determine the set of "at least one boundary position" as the first set of positions, and determine the set of "at least one hard-cut position” Set for the second position.
  • Step 612 If the number of boundary positions is lower than the number of hard-cut positions, then determine the set of "at least one hard-cut position" as the first set of positions, and determine the set of "at least one boundary position” as Second location set.
  • the first position set and the second position set can be based on the number of cut-out positions (that is, boundary positions) represented by the word detection result and the number of cut-out positions represented by the preset cut-out position
  • the size relationship between the positions (that is, the hard cut position) number is determined, so that the first position set is used to represent the position in the cut picture position represented by the word detection result and the preset cut picture position represented by the cut picture position.
  • the set with a larger number, and the second set of positions is used to represent the set with a smaller number of positions among the image cut positions indicated by the word detection result and the preset image cut positions.
  • the word detection result of the text image to be recognized is the position set ⁇ 43, 82, 293, 309, ... ⁇ shown in Figure 7, and the preset cutting position corresponding to the text image to be recognized is as shown in Figure 4
  • the set of locations ⁇ 512, 1024, 1536, 2048 ⁇ the first set of locations may be ⁇ 43, 82, 293, 309, ... ⁇ , and the second set of locations may be ⁇ 512, 1024, 1536, 2048 ⁇ .
  • Step 62 Match each position in the second position set with at least one position in the first position set, and obtain matching results corresponding to each position in the second position set.
  • the position that successfully matches the nth position in the second position set can be searched from the first position set (for example, from the first position set to find The closest position to the nth position in the second position set), to obtain the matching result corresponding to the nth position in the second position set, so that the matching result corresponding to the nth position in the second position set is used Yu represents the location in the first location set that successfully matches the nth location.
  • the matching result corresponding to 512 may be that "512" and "335" match successfully, ... (and so on).
  • Step 63 According to the matching results corresponding to each position in the second position set, determine the actual cutting position corresponding to the text image to be recognized.
  • the actual image cutting position corresponding to the text image to be recognized can be determined by referring to the matching result corresponding to each position in the second position set ( For example, the matching result corresponding to each position in the second position set is directly determined as the actual cutting position corresponding to the text image to be recognized).
  • the number of cutting positions indicated by the word detection result can be determined first. number and the number of cut-out positions indicated by the preset cut-out positions; The image cut positions are matched to obtain the matching results corresponding to each image cut position in the set of cut image positions with fewer image cut positions; finally, according to the matching results, the actual image cut position corresponding to the text image to be recognized is determined.
  • step 52 may specifically include step 71-step 74:
  • Step 71 Determine a first position set and a second position set according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized.
  • step 71 for the relevant content of step 71, please refer to S21 above.
  • Step 72 If it is determined that the second set of positions includes at least one boundary position, then determine the second set of positions as the actual cut-out position corresponding to the text image to be recognized.
  • the second set of positions includes at least one boundary position
  • it can be determined that the second set of positions is determined according to the single word detection result of the text image to be recognized, so that each position in the second set of positions is not will appear inside the character
  • the second set of positions can be directly determined as the actual cut-out position corresponding to the text image to be recognized, so that the actual cut-out position will not appear inside the character, so that based on the actual cut-out position
  • Step 73 If it is determined that the second set of positions includes at least one hard-cut position, then match each position in the second set of positions with at least one position in the first set of positions, and obtain matching results corresponding to each position in the second set of positions .
  • step 73 can be implemented by using any implementation manner of S62 above.
  • the second set of positions includes at least one hard cut position, it can be determined that the second set of positions is determined according to the preset cut position corresponding to the text image to be recognized, so that the second set of positions may appear in the character Therefore, the positions that can be successfully matched with each position in the second position set can be searched from the first position set, so that these found positions can be used to determine the actual cut-out position corresponding to the text image to be recognized, so that the The actual image cutting position will not appear inside the character, so that the character will not be damaged when cutting the image based on the actual image cutting position, so that it can effectively avoid the occurrence of incomplete characters, which is beneficial to improve the recognition accuracy of long text recognition.
  • Step 74 According to the matching results corresponding to each position in the second position set, determine the actual image cutting position corresponding to the text image to be recognized.
  • step 74 please refer to the above S63.
  • step 71 to step 74 Based on the relevant content of the above step 71 to step 74, after obtaining the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, it should be selected from the word detection result as much as possible.
  • the actual image cutting position corresponding to the text image to be recognized so that the actual image cutting position can meet the character recognition efficiency requirement in the application scenario as much as possible without cutting the characters.
  • the above two can be combined to determine the actual text image corresponding to the text image to be recognized.
  • the image cutting position so that the actual image cutting position corresponding to the text image to be recognized can meet the text recognition efficiency requirements in the application scenario as much as possible without cutting the characters.
  • the preset cutting position corresponding to the text image to be recognized is determined according to the preset segmentation parameters corresponding to the application scene, so that The actual image cutting position determined based on the preset image cutting position also meets the text recognition efficiency requirements in this application scenario, so that the text recognition process based on the preset image cutting position can meet the text recognition efficiency requirements in this application scenario In this way, on the premise of ensuring the recognition accuracy of long text recognition, the text recognition efficiency requirements in different application scenarios can be met as much as possible.
  • the text image to be recognized can be determined by referring to the word detection result and position information of the at least one image slice The corresponding actual cutting position.
  • S4 Perform a second segmentation process on the text image to be recognized according to the actual picture cutting position corresponding to the text image to be recognized, to obtain at least one picture to be used.
  • the "second segmentation processing” refers to the process of performing segmentation processing on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized.
  • the text image to be recognized can be segmented according to the actual cutout position to obtain each cutout image corresponding to the text image to be recognized, and each cutout
  • the pictures are respectively determined as pictures to be used.
  • S5 Determine the text recognition result of the text image to be recognized according to the text recognition result of at least one image to be used.
  • the text recognition result of the picture to be used is used to describe the character information carried by the picture to be used; and the embodiment of the present application does not limit the determination process of the text recognition result of the picture to be used, any existing or future A character recognition method is implemented (for example, an OCR model can be used for implementation).
  • character recognition processing can be performed on all the pictures to be used in parallel to obtain a character recognition result of each picture to be used.
  • the character recognition result of the text image to be recognized is used to describe the character information carried by the text image to be recognized.
  • S5 may specifically include: splicing the text recognition results of at least one picture to be used according to the sequence corresponding to the at least one picture to be used to obtain the text image to be recognized. Text recognition results.
  • the arrangement order corresponding to at least one picture to be used is used to represent the positional adjacent relationship of the at least one picture to be used in the text image to be recognized;
  • the pictures to be used are adjacent to each other, the picture to be used with the sequence number 2 is adjacent to the picture to be used with the sequence number 3, ... (and so on), the picture to be used with the sequence number T-1 is adjacent to the picture to be used with the sequence number T
  • the images to be used are adjacent to each other.
  • T is a positive integer
  • T represents the number of pictures to be used.
  • the text image to be recognized is firstly processed according to the preset slice parameters. Segmentation processing to obtain at least one image slice and the position information of the at least one image slice; then according to the word detection result and the position information of the at least one image slice, determine the actual cutting position corresponding to the text image to be recognized; then, According to the actual cutting position corresponding to the text image to be recognized, the text image to be recognized is subjected to a second segmentation process to obtain at least one picture to be used; finally, according to the text recognition result of the at least one picture to be used, determine the text to be recognized The character recognition result of recognizing the text image, so that the character recognition process for long text can be realized.
  • the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition.
  • the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
  • the embodiment of the present application also provides a character recognition device, which will be explained and described below with reference to the accompanying drawings.
  • FIG. 8 is a schematic structural diagram of a character recognition device provided by an embodiment of the present application.
  • the character recognition device 800 provided in the embodiment of the present application includes:
  • the first segmentation unit 801 is configured to, after acquiring the text image to be recognized, perform the first segmentation process on the text image to be recognized according to preset slice parameters, to obtain at least one image slice and the at least one image slice Location information; wherein, the text image to be recognized includes long text;
  • a position determination unit 802 configured to determine the actual cut-out position corresponding to the text image to be recognized according to the word detection result of the at least one image slice and the position information of the at least one image slice;
  • the second segmentation unit 803 is configured to perform a second segmentation process on the text image to be recognized according to the actual image cutting position corresponding to the text image to be recognized to obtain at least one picture to be used;
  • the result determining unit 804 is configured to determine a character recognition result of the to-be-recognized text image according to the character recognition result of the at least one image to be used.
  • the position determination unit 802 is specifically configured to: according to the word detection result of the at least one image slice, the position information of the at least one image slice, and the text image to be recognized
  • the preset image cutting position is used to determine the actual image cutting position corresponding to the text image to be recognized.
  • the position determination unit 802 is specifically configured to: splicing the word detection results of the at least one image slice according to the position information of the at least one image slice to obtain the to-be-recognized The word detection result of the text image; according to the word detection result of the text image to be recognized and the preset cutting position corresponding to the text image to be recognized, determine the actual cutting position corresponding to the text image to be recognized.
  • the preset slice parameters include a segmentation interval and a segmentation offset length; wherein, the segmentation offset length is smaller than the segmentation interval;
  • the first dividing unit 801 includes:
  • a region cutting subunit configured to cut an image region having the segmentation offset length from the text image to be recognized to obtain the image to be segmented
  • the image slice subunit is configured to perform segmentation processing on the image to be segmented according to the segmentation interval to obtain at least one image slice.
  • the preset slicing parameters also include a resection starting position
  • the region cutting subunit is specifically configured to: determine the position of the cutting region according to the cutting start position and the segmentation offset length; perform region cutting processing on the text image to be recognized according to the cutting region position, Obtain the image to be segmented.
  • the process of determining the word detection result of the at least one image slice includes: using a pre-built word detection model to perform parallel word detection processing on the at least one image slice to obtain the at least one The single character detection result of the image slice; wherein, the single character detection model is constructed according to the sample text image and the actual position of each character in the sample text image.
  • the first segmentation process is performed on the text image to be recognized according to the preset slice parameters , to obtain at least one image slice and the position information of the at least one image slice; then according to the word detection result and the position information of the at least one image slice, determine the actual cut-out position corresponding to the text image to be recognized; then, according to the to-be-recognized
  • the text image corresponds to the actual image cutting position
  • the second segmentation process is performed on the text image to be recognized to obtain at least one image to be used; finally, according to the text recognition result of the at least one image to be used, determine the text image to be identified Text recognition results, so that the text recognition process for long texts can be implemented.
  • the actual cutting position determined based on the single character detection result is as close as possible. will appear inside the character, so that when cutting the image based on the actual image cutting position, the phenomenon of cutting the character will not appear as much as possible, so that each cutting image corresponding to the text image to be recognized can be avoided as much as possible (that is, , incomplete characters appear in each picture to be used), which is conducive to improving the recognition accuracy of long text recognition.
  • the processing time for each image slice is much shorter than the processing time for the text image to be recognized, which is conducive to improving the efficiency of text recognition.
  • the embodiment of the present application also provides a device, the device includes a processor and a memory:
  • the memory is used to store computer programs
  • the processor is configured to execute any implementation of the character recognition method provided in the embodiments of the present application according to the computer program.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any of the character recognition methods provided in the embodiment of the present application. implementation.
  • the embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the character recognition method provided in the embodiment of the present application.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.

Abstract

La présente demande concerne un procédé de reconnaissance de caractères et un dispositif qui lui est associé. Le procédé consiste à : après l'acquisition d'une image de texte à reconnaître comprenant un texte long, effectuer un premier traitement de segmentation sur l'image de texte à reconnaître en fonction d'un paramètre de fractionnement prédéfini, obtenir au moins une fraction d'image et des informations de position sur la ou les fractions d'image ; en fonction d'un résultat de détection de caractère unique et des informations de position de la ou des fractions d'image, déterminer une position de fractionnement réelle correspondant à l'image de texte à reconnaître ; puis, selon la position de fractionnement réelle correspondant à l'image de texte à reconnaître, effectuer un second traitement de segmentation sur l'image de texte à reconnaître, obtenir au moins une image à utiliser ; enfin, en fonction du résultat de reconnaissance de caractères de la ou des images à utiliser, déterminer un résultat de reconnaissance de caractère de l'image de texte à reconnaître, de telle sorte qu'un processus de reconnaissance de caractère pour le texte long peut être mis en œuvre.
PCT/CN2022/107728 2021-08-26 2022-07-26 Procédé de reconnaissance de caractères et dispositif qui lui est associé WO2023024793A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110988932.1 2021-08-26
CN202110988932.1A CN113657369A (zh) 2021-08-26 2021-08-26 一种文字识别方法及其相关设备

Publications (1)

Publication Number Publication Date
WO2023024793A1 true WO2023024793A1 (fr) 2023-03-02

Family

ID=78492998

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107728 WO2023024793A1 (fr) 2021-08-26 2022-07-26 Procédé de reconnaissance de caractères et dispositif qui lui est associé

Country Status (2)

Country Link
CN (1) CN113657369A (fr)
WO (1) WO2023024793A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657369A (zh) * 2021-08-26 2021-11-16 北京有竹居网络技术有限公司 一种文字识别方法及其相关设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738602A (zh) * 2019-09-12 2020-01-31 北京三快在线科技有限公司 图像处理方法、装置、电子设备及可读存储介质
WO2020155763A1 (fr) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Procédé de reconnaissance par roc et dispositif électronique associé
CN111582085A (zh) * 2020-04-26 2020-08-25 中国工商银行股份有限公司 单据拍摄图像识别方法及装置
CN113657369A (zh) * 2021-08-26 2021-11-16 北京有竹居网络技术有限公司 一种文字识别方法及其相关设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298982B (zh) * 2013-07-16 2019-03-08 深圳市腾讯计算机系统有限公司 一种文字识别方法及装置
CN105046254A (zh) * 2015-07-17 2015-11-11 腾讯科技(深圳)有限公司 字符识别方法及装置
CN105678293A (zh) * 2015-12-30 2016-06-15 成都数联铭品科技有限公司 一种基于cnn-rnn的复杂图像字序列识别方法
CN106056114B (zh) * 2016-05-24 2019-07-05 腾讯科技(深圳)有限公司 名片内容识别方法和装置
CN110991437B (zh) * 2019-11-28 2023-11-14 嘉楠明芯(北京)科技有限公司 字符识别方法及其装置、字符识别模型的训练方法及其装置
CN113139629A (zh) * 2020-01-16 2021-07-20 武汉金山办公软件有限公司 一种字体识别方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020155763A1 (fr) * 2019-01-28 2020-08-06 平安科技(深圳)有限公司 Procédé de reconnaissance par roc et dispositif électronique associé
CN110738602A (zh) * 2019-09-12 2020-01-31 北京三快在线科技有限公司 图像处理方法、装置、电子设备及可读存储介质
CN111582085A (zh) * 2020-04-26 2020-08-25 中国工商银行股份有限公司 单据拍摄图像识别方法及装置
CN113657369A (zh) * 2021-08-26 2021-11-16 北京有竹居网络技术有限公司 一种文字识别方法及其相关设备

Also Published As

Publication number Publication date
CN113657369A (zh) 2021-11-16

Similar Documents

Publication Publication Date Title
US11275961B2 (en) Character image processing method and apparatus, device, and storage medium
CN109146892B (zh) 一种基于美学的图像裁剪方法及装置
WO2021017260A1 (fr) Procédé et appareil de reconnaissance de texte multilingue, dispositif informatique, et support d'informations
US20190188222A1 (en) Thumbnail-Based Image Sharing Method and Terminal
CN110136198B (zh) 图像处理方法及其装置、设备和存储介质
CN111027563A (zh) 一种文本检测方法、装置及识别系统
US20120076423A1 (en) Near-duplicate image detection
CN112101317B (zh) 页面方向识别方法、装置、设备及计算机可读存储介质
RU2697649C1 (ru) Способы и системы сегментации документа
WO2019128254A1 (fr) Appareil et procédé d'analyse d'image, et dispositif électronique et support d'informations lisible
CN110942004A (zh) 基于神经网络模型的手写识别方法、装置及电子设备
JP2021166070A (ja) 文書比較方法、装置、電子機器、コンピュータ読取可能な記憶媒体及びコンピュータプログラム
WO2023024793A1 (fr) Procédé de reconnaissance de caractères et dispositif qui lui est associé
CN113903036B (zh) 一种文本识别方法、装置、电子设备、介质及产品
EP3910590A2 (fr) Procede et appareil de traitement d'image, dispositif electronique, et support de stockage
US20230237633A1 (en) Image processing method and apparatus, system, and storage medium
WO2023147717A1 (fr) Procédé et appareil de détection de caractères, dispositif électronique et support de stockage
CN111612004A (zh) 一种基于语义内容的图像裁剪方法及装置
CN111368632A (zh) 一种签名识别方法及设备
CN113642584A (zh) 文字识别方法、装置、设备、存储介质和智能词典笔
CN114187595A (zh) 基于视觉特征和语义特征融合的文档布局识别方法及系统
CN112581355A (zh) 图像处理方法、装置、电子设备和计算机可读介质
WO2020232866A1 (fr) Procédé et appareil de segmentation de texte scanné, dispositif informatique et support de stockage
CN113902899A (zh) 训练方法、目标检测方法、装置、电子设备以及存储介质
WO2013097072A1 (fr) Procédé et appareil de reconnaissance d'un caractère d'une vidéo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22860139

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE