WO2022127384A1 - 文字识别方法、电子设备和计算机可读存储介质 - Google Patents

文字识别方法、电子设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2022127384A1
WO2022127384A1 PCT/CN2021/126164 CN2021126164W WO2022127384A1 WO 2022127384 A1 WO2022127384 A1 WO 2022127384A1 CN 2021126164 W CN2021126164 W CN 2021126164W WO 2022127384 A1 WO2022127384 A1 WO 2022127384A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
binary mask
straight
recognized
Prior art date
Application number
PCT/CN2021/126164
Other languages
English (en)
French (fr)
Inventor
吕燕
童俊文
王佳
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022127384A1 publication Critical patent/WO2022127384A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/162Quantising the image signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Definitions

  • the embodiments of the present application relate to the field of character detection and recognition, and in particular, to a character recognition method, an electronic device, and a computer-readable storage medium.
  • An embodiment of the present application provides a method for character recognition.
  • the method includes: acquiring a binary mask of an image to be recognized; wherein, the binary mask is used to distinguish a character area in the image to be recognized from a non-recognized image.
  • a text area ; perform a connected domain analysis on the binary mask graph to obtain a connected domain mark; obtain a straight text effect map according to the connected domain mark; and identify text according to the straight text effect map.
  • An embodiment of the present application further provides an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a program that can be executed by the at least one processor instructions, the instructions being executed by the at least one processor to enable the at least one processor to execute the above-described method of character recognition.
  • the embodiments of the present application further provide a readable storage medium storing a computer program, and when the computer program is executed by a processor, the above-mentioned character recognition method is implemented.
  • FIG. 1 is a flowchart of a method for character recognition according to a first embodiment of the present application
  • Fig. 2 is an image to be recognized provided according to the first embodiment of the present application
  • FIG. 3 is a binary mask diagram of a to-be-recognized image provided according to the first embodiment of the present application
  • Fig. 4 is a kind of straight text effect diagram provided according to the first embodiment of the present application.
  • FIG. 5 is a flow chart of obtaining an effect diagram of flat text according to a connected domain mark according to the first embodiment of the present application
  • FIG. 7 is a binary mask diagram of yet another image to be recognized according to the first embodiment of the present application.
  • FIG. 8 is a schematic diagram of a target text area of another binary mask image of an image to be recognized according to the first embodiment of the present application.
  • Fig. 9 is another kind of straight text effect diagram provided according to the first embodiment of the present application.
  • FIG. 10 is a schematic diagram of a character recognition device according to the first embodiment of the present application.
  • FIG. 11 is a flowchart of a character recognition method according to a second embodiment of the present application.
  • FIG. 13 is a schematic diagram of an axis point and a starting and ending edge point provided according to the second embodiment of the present application;
  • 16 is a binary mask diagram of an image to be recognized according to the third embodiment of the present application.
  • 17 is a horizontal target text area of a binary mask image of a to-be-recognized image provided according to the third embodiment of the present application.
  • Fig. 18 is a kind of straight text effect diagram provided according to the third embodiment of the present application.
  • FIG. 19 is a polygon fitting diagram provided according to the third embodiment of the present application.
  • 21 is a flowchart of a method for training a detection model provided according to the fourth embodiment of the present application.
  • 22 is a flowchart of a method for training a recognition model provided according to the fourth embodiment of the present application.
  • Fig. 23 is a flow chart of character recognition on a steel coil provided according to the fourth embodiment of the present application.
  • FIG. 24 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
  • the main purpose of the embodiments of the present application is to provide a character recognition method, an electronic device and a computer-readable storage medium, which can effectively improve the recognition accuracy and recognition speed of curved characters, and at the same time improve the anti-noise capability of the curved character process, greatly improving the User experience.
  • the first embodiment of the present application relates to a character recognition method, which is applied to an electronic device, where the electronic device may be a terminal or a server.
  • the electronic device in this embodiment and the following embodiments is described by taking a server as an example.
  • the implementation details of the character recognition method of the present embodiment will be specifically described below, and the following contents are only provided for the convenience of understanding, and are not necessary for implementing this solution.
  • the application scenarios of the embodiments of the present application may include, but are not limited to: automatic acquisition of product information when the factory purchases goods; detection of printed information of products when the factory ships out, such as production date, place of origin, etc.; the industrial and commercial department entering the merchant's trademark information; using optical recognition technology (Optical Character Recognition, referred to as: OCR technology) scan paper documents to form electronic documents; identification of ID cards, passports, driving licenses, bank cards and other documents; license plate number, license plate color, license plate type and other license plate identification and tracking; value-added tax invoices Recognition of bill classes, etc.
  • OCR technology Optical Character Recognition
  • the specific process of the character recognition method of this embodiment may be as shown in FIG. 1 , including:
  • Step 101 obtaining a binary mask image of the image to be recognized
  • the server can obtain the binary mask map of the image to be recognized.
  • the binary mask map is used to distinguish text areas and non-text areas in the image to be recognized.
  • a binary image refers to an image in which each pixel is black or white, that is, the gray value of any pixel in the image has only two values of 0 or 255, and 0 in the binary image means black, 255 means white.
  • the binary mask map is to recalculate the value of each pixel in the image through the mask operator, and convert the image to be recognized into a binary mask map with only two pixel values of 0 and 255.
  • the position of the pixel value of 0 indicates that there is no detection. To the text, the position where the pixel value is 255 means that there is text detected here.
  • the image to be recognized is a trademark as shown in FIG. 2 .
  • the server can detect the position with text and the position without text in the image to be recognized, and recalculate it through the mask algorithm. The value of each pixel in the image is obtained to obtain the binary mask map of the image to be recognized as shown in FIG. 3 .
  • Step 102 performing a connected domain analysis on the binary mask graph to obtain a connected domain label
  • the server can perform a connected domain analysis on the binary mask image to obtain a connected domain label.
  • the connected domain refers to the adjacent areas with the same pixel value in the image
  • the connected domain analysis is to find and mark the connected domain in the image, that is, to obtain the connected domain label. Since the binary mask image has only two pixel values of 0 and 255, the position of the pixel value of 0 indicates that there is no text, and the text of the pixel value of 255 indicates that there is text here, so through the connected domain analysis, the image to be recognized can be digitized. , making the difference between text area and non-text area more clear and intuitive.
  • Step 103 obtain the rendering of the straight text
  • the server analyzes the connected domain of the binary mask graph and obtains the connected domain mark, it can obtain the rendering of the flat text according to the connected domain mark.
  • the server may perform interpolation on the binary mask image according to the connected domain mark to obtain the effect image of the flat text. For example, after the server obtains the connected domain label, it uses Lagrangian interpolation, Thin Plate Spline (TPS) and other interpolation methods to interpolate the binary mask graph shown in Figure 3. In this way, the structure of the to-be-recognized image as shown in Figure 2 is changed, a straight text effect diagram is obtained on the basis of the binary mask map, and the text is straightened, and the obtained straight text effect diagram can be as shown in Figure 4 .
  • TPS Thin Plate Spline
  • obtaining the rendering of the flat text can be achieved by the sub-steps shown in Figure 5, as follows:
  • Sub-step 1031 perform minimum frame fitting on the binary mask image to obtain the target text area
  • the server after the server performs a connected domain analysis on the binary mask graph and obtains the connected domain label, it can perform minimum frame fitting on the binary mask graph to obtain the target text area.
  • the server can use the minimum frame fitting technology to perform minimum frame fitting on the binary mask image, and the area within the frame is the area where the text is located, that is, the target text area.
  • the to-be-recognized image may contain multiple regions with text
  • the binary mask image is fitted with a minimum frame, and multiple text regions can be extracted, and the target text region to be recognized can be selected from them.
  • the image to be recognized is as shown in FIG. 6
  • the binary mask of the image to be recognized can be as shown in FIG. 7 .
  • the mask image performs minimum frame fitting, extracts the target text area, and obtains two target text areas S1 and S2 as shown in FIG. 8 .
  • Sub-step 1032 performing interpolation on the target text area of the binary mask image to obtain a straight text rendering
  • the server may perform interpolation on the target text area of the binary mask image to obtain a flat text effect image. Interpolating only the target text area of the binary mask image to obtain the rendering of the flat text can make the interpolation process simpler and more efficient.
  • the target text area obtained by the server is shown in FIG. 8
  • the server interpolates the target text area according to the image to be recognized, and obtains the rendering of the straight text.
  • the text rendering can be shown in Figure 9.
  • Lagrangian interpolation can be used to interpolate the target text region of the binary mask image.
  • the thin-plate spline interpolation method can be used to interpolate the target text area of the binary mask image, and the thin-plate spline interpolation method can effectively improve the robustness of the curved text recognition process.
  • Step 104 Identify the text according to the rendering of the straight text.
  • the server may recognize the text according to the rendering of the flat text.
  • the server may input the rendering of the flat text into the text recognition model, perform text recognition, and output the recognized text.
  • the text recognition model may be a Tesseract model, an AdvancedEAST model, or the like.
  • the rendering of the straight text obtained by the server may be as shown in FIG. 4 , and the server inputs the rendering of the straight text into the Tesseract model for text recognition, and the output recognized text is: Zi Chou Yin Mao Chen Si Wu not yet.
  • the character recognition method of the present application can be implemented by each module as shown in FIG. 10 , which specifically includes:
  • the detection module 201 is used to obtain the binary mask of the image to be recognized
  • the correction module 202 is configured to perform a connected domain analysis on the binary mask graph, obtain a connected domain mark, and obtain a straight text rendering diagram according to the connected domain mark;
  • the recognition module 203 is used for recognizing text according to the rendering of the straight text.
  • modules involved in this embodiment are logical modules.
  • a logical unit may be a physical unit, a part of a physical unit, or multiple physical units.
  • a composite implementation of the unit in order to highlight the innovative part of the present application, the present embodiment does not introduce units that are not closely related to solving the technical problems raised by the present application, but this does not mean that there are no other units in this embodiment.
  • a binary mask map of an image to be recognized is obtained, wherein the binary mask map is used to distinguish text areas and non-text areas in the to-be-recognized image, and a binary mask is used. It can accurately and quickly determine which positions in the to-be-recognized image have text, preventing part of the text in the to-be-recognized image from being lost during recognition. Perform a connected domain analysis on the binary mask graph to obtain a connected domain label. Since the binary mask graph has only two pixel values of 0 and 255, perform a connected domain analysis on the binary mask graph to obtain a connected domain label.
  • the image to be recognized is digitized, which brings convenience for analysis and recognition.
  • the rendering of the straight text is obtained.
  • the related curved text recognition technology needs to calculate the angle of each text according to the existing dictionary, it is also necessary to separate each curved text, and each text itself needs to be separated. Perform complex calculations, which also leads to a complicated and time-consuming recognition process.
  • the binary mask image is directly processed without considering the situation of each character, and the curved characters can be easily and quickly Convert to flat text.
  • the recognition of text can effectively improve the recognition accuracy and recognition speed of curved text, and at the same time improve the anti-noise capability of the curved text process, thereby greatly improving the user experience.
  • FIG. 11 is the character recognition method described in the second embodiment of the present application, including:
  • Step 301 obtaining a binary mask of the image to be recognized
  • Step 302 performing a connected domain analysis on the binary mask graph to obtain a connected domain label
  • Step 303 perform minimum frame fitting on the binary mask image to obtain the target text area
  • steps 301 to 303 have been similarly described in the first embodiment, and will not be repeated here.
  • Step 304 determine M interpolation coordinate points in the target text area
  • the server may determine M interpolation coordinate points in the target text area.
  • M is an integer greater than 1.
  • Interpolated coordinate points can be used as the reference for interpolation.
  • the server may select all or part of the coordinate points of the boundary of the target text area as the interpolation coordinate points.
  • the server may determine a horizontal line with the center point of the target text area, and use the coordinate point on the horizontal line as the interpolation coordinate point.
  • the determination of M interpolated coordinate points in the target text area can be implemented by the sub-steps shown in FIG. 12 , and the details are as follows:
  • Sub-step 3041 obtain the width of the target text area
  • the server can acquire the width of the target text area.
  • the frame can be retained, that is, the target text area is surrounded by the minimum frame, and the server can determine the abscissa of each point on the minimum frame.
  • the maximum value of the coordinates minus the minimum value of the abscissa is the width of the target text area.
  • Sub-step 3042 according to the width of the target text area, determine N axis points and start and end edge points in the target text area;
  • the server may determine N axis points and start and end edge points in the target text area according to the width of the target text area.
  • the server may traverse the entire target text area according to the width of the target text area, determine N axis points in the target text area, and obtain the start and end edge points of the target text area through gradient calculation based on the N axis points. .
  • the server may traverse the entire target text area according to the width of the target text area, determine five equidistant pivot points in the target text area, and obtain the target text area through gradient calculation based on the five pivot points.
  • Sub-step 3043 according to the N axis points and the start and end edge points, determine M interpolation coordinate points in the target text area.
  • the server may determine M interpolation coordinate points in the target text area according to the N axis points and the starting and ending edge points, where M is less than or equal to N. According to the axis point and the starting and ending edge points, determining the transformation coordinate point can make the determined transformation coordinate point more reasonable, thereby making the process of curved character recognition more reasonable.
  • the server determines 5 axis points and 1 start and end edge point in the target text area, and determines 14 interpolation coordinate points according to these 7 points.
  • Step 305 Interpolate the target text area of the binary mask image according to the M interpolation coordinate points to obtain a straight text effect map
  • the server may perform interpolation on the target text area of the binary mask image according to the M interpolation coordinate points to obtain a flat text effect image. Interpolating according to the interpolation coordinate points can further improve the speed of the interpolation process, thereby improving the speed of curved text recognition.
  • Step 306 Identify the text according to the rendering of the straight text.
  • the step 306 has been described in the first embodiment, and will not be repeated here.
  • performing interpolation on the target text area of the binary mask image to obtain a flat text effect image includes: determining M interpolation coordinate points in the target text area; wherein, M is an integer greater than 1 ;According to the M interpolation coordinate points, interpolate the target text area of the binary mask image to obtain a straight text rendering, and perform interpolation according to the interpolation coordinate points, which can further improve the speed of the interpolation process, thereby improving the speed of curved text recognition .
  • Determining M interpolation coordinate points in the target text area includes: obtaining the width of the target text area; according to the width, determining N axis points and starting and ending edge points in the target text area; wherein, N is an integer greater than 0; according to N M number of interpolation coordinate points are determined in the target text area, which can make the determined transformation coordinate points more reasonable, thereby making the process of curved text recognition more reasonable.
  • FIG. 14 is a schematic diagram of the character recognition method described in the third embodiment of the present application, including:
  • Step 401 obtaining a binary mask image of the image to be recognized
  • Step 402 performing a connected domain analysis on the binary mask graph to obtain a connected domain label
  • steps 401 to 402 have been described in the first embodiment, and will not be repeated here.
  • Step 403 according to the connected domain mark, perform perspective transformation and minimum frame fitting on the binary mask image to obtain a horizontal target text area;
  • the server can perform perspective transformation and minimum frame fitting on the binary mask image according to the connected domain mark, so as to obtain the horizontal target text area.
  • the target text area can be further improved by the perspective transformation technology to the horizontal position, which can further improve the accuracy of curved text recognition.
  • the server may first perform perspective transformation on the binary mask image, and then perform minimum frame fitting; or may first perform minimum frame fitting on the binary mask image, and then perform perspective transformation.
  • the embodiment of the present application There is no specific limitation on this.
  • the image to be recognized may be as shown in FIG. 15
  • the binary mask of the image to be recognized obtained by the server may be as shown in FIG. 16 .
  • the server first performs perspective transformation on the binary mask, and converts the binary
  • the value mask image is placed horizontally, and then the minimum border fitting is performed on the horizontal binary mask image to obtain the horizontal target text area as shown in Figure 17.
  • Step 404 performing interpolation on the horizontal target text area of the binary mask image to obtain a straight text effect map
  • the server may perform interpolation on the horizontal target text area of the binary mask image to obtain a straight text effect image.
  • Step 405 identify the text according to the rendering of the straight text
  • the step 405 has been described in the first embodiment, and will not be repeated here.
  • Step 406 Perform inverse perspective transformation on the rendering of the flat text to obtain a polygonal fitting image.
  • the server can perform inverse perspective transformation on the rendering of the flat text to obtain a polygonal fitting map.
  • the inverse perspective transformation is the inverse transformation of the perspective transformation.
  • the image is restored to its original position, that is, the polygon fitting graph. Since the server has done the minimum frame fitting before, the target text area on the polygon fitting graph is wrapped by the minimum frame.
  • the minimum frame can be a rectangle or any polygon.
  • Obtaining a polygonal fitting diagram can visualize the target text area, and correspond the target text area with the recognized text, which is convenient for staff to perform operations such as viewing, verification, and input.
  • the image to be recognized may be as shown in Figure 15
  • the rendering of the flat text may be as shown in Figure 18
  • the text output recognized by the server is: "Oklahoma”
  • the server performs inverse perspective transformation on the rendering of the flat text
  • the obtained polygon fitting diagram can be shown in Figure 19.
  • performing minimum frame fitting on the binary mask graph according to the connected domain mark to obtain the target text area including: performing perspective transformation and minimum frame fitting on the binary mask graph according to the connected domain mark combined to obtain the horizontal target text area; interpolate the target text area of the binary mask image to obtain the effect drawing of straight text, including: interpolating the horizontal target text area of the binary mask image to obtain the straight text
  • the renderings considering that in some scenarios, the text may not exist horizontally in the picture, and the target text area is attributed to the horizontal position through perspective transformation technology, which can further improve the accuracy of curved text recognition.
  • the method further includes: performing inverse perspective transformation on the rendering of the flat text, obtaining a polygonal fitting map, and obtaining a polygonal fitting map, which can visualize the target text area, and compare the target text area with the
  • the recognized characters correspond to each other, which is convenient for staff to perform operations such as viewing, verification, and input.
  • FIG. 20 It is a schematic diagram of the character recognition method described in the fourth embodiment of the present application, including:
  • Step 501 obtaining a binary mask of the image to be recognized according to a preset detection model
  • the server may obtain a binary mask map of the image to be recognized according to a preset detection model.
  • the preset detection model can be constructed and trained by those skilled in the art.
  • the process of obtaining the binary mask map of the image to be recognized is performed online, the server loads a preset detection model, receives the image to be recognized for analysis, and obtains the binary mask map of the image to be recognized.
  • the preset detection model can be obtained by training each sub-step as shown in FIG. 21 , as follows:
  • Step 601 obtaining a detection model training set
  • the server may obtain a detection model training set, wherein the detection model training set includes several training images for training the detection model, and the training images are marked with the extension direction of the curved text.
  • the server can obtain a large number of images with text through the Internet or real-time acquisition and shooting, clean these images, remove obvious noise samples, and mark the text on the image with a polygon frame or an ellipse. , and obtain the detection model training set based on all the labeled images.
  • Step 602 Perform iterative training on the initial instance segmentation network Mask-RCNN according to the detection model training set to obtain a detection model.
  • the server can perform iterative training on the initial instance segmentation network Mask-RCNN according to the detection model training set to obtain the detection model.
  • the Mask-RCNN network includes a convolutional layer and a layer of interest. Using a Mask-RCNN-based detection model can improve the accuracy and speed of detection.
  • the training process of the detection model can be performed offline, and the server obtains the detection model by means of supervised learning.
  • the server can call the initial Mask-RCNN network, configure the initial parameters, input the training images and their labels in the training set of the detection model, perform iterative training, and iteratively update the parameters until the detection model meets the accuracy requirements and obtains a trained Check the model.
  • Step 502 performing a connected domain analysis on the binary mask graph to obtain a connected domain label
  • Step 503 according to the connected domain mark, obtain the rendering of the straight text
  • steps 502 to 503 have been described in the first embodiment, and will not be repeated here.
  • Step 504 Recognize the text according to the rendering of the flat text and the preset recognition model.
  • the server may call a preset recognition model, and recognize the text according to the rendering of the flat text and the preset recognition model, wherein the preset recognition model can be determined by a technology in the art personnel to build and train.
  • the process of recognizing text is performed online, the server loads a preset recognition model, receives a rendering of straight text for recognition, and outputs the recognized text.
  • the preset detection model can be obtained by training each sub-step as shown in FIG. 22 , as follows:
  • Step 701 obtaining a recognition model training set
  • the server may obtain a training set of the recognition model, wherein the training set of the recognition model includes several training images for training the recognition model, and the training images are marked with the extension direction of the curved text, and the extension direction of the curved text. content.
  • the server can obtain a large number of images with text through the Internet or real-time acquisition and shooting, clean these images, remove obvious noise samples, and mark the text on the image with a polygonal frame or an ellipse. , and mark the content of the curved text, and obtain the training set of the recognition model according to all the marked images.
  • the recognition model training set can be obtained by further labeling the detection model training set, that is, labeling the content of curved text on the training image of the detection model.
  • Step 702 Perform iterative training on the initial region convolutional neural network CRNN+CTC according to the recognition model training set to obtain the recognition model;
  • the server can perform iterative training on the regional convolutional neural network CRNN+CTC according to the recognition model training set to obtain the recognition model.
  • the CRNN+CTC network consists of convolutional layers, recurrent layers and transcription layers.
  • the convolutional layer first scales the input image to the same size, then uses the deep convolutional neural network for feature extraction, and finally performs feature extraction from left to right on the feature map.
  • the feature sequence is extracted with a uniform width; the recurrent layer predicts the label distribution of each feature sequence output by the convolutional layer through a deep bidirectional long short-term memory network; the transcription layer links the label distribution output by the recurrent layer to the CTC model (Connectionist temporal classification, referred to as: Time series classification model) to achieve the alignment of input data and label data, and finally output sequence recognition results of indeterminate length.
  • CTC model Connectionist temporal classification, referred to as: Time series classification model
  • acquiring the binary mask of the image to be recognized includes: acquiring the binary mask of the image to be recognized according to a preset detection model; the detection model is trained by the following steps: acquiring the detection model training Among them, the detection model training set includes several training images for training the detection model, and the training images are marked with the extension direction of the curved text; according to the detection model training set, the initial instance segmentation network Mask-RCNN is iteratively trained to obtain detection Model, using Mask-RCNN-based detection model, can improve the accuracy and speed of detection.
  • Obtaining the recognized text according to the rendering of the flat text includes: obtaining the recognized text according to the rendering of the flat text and a preset recognition model; training the recognition model through the following steps; obtaining a training set of the recognition model; wherein the training set of the recognition model includes Several training images used to train the recognition model, the training images are marked with the extension direction of the curved text and the content of the curved text; according to the recognition model training set, the initial region convolutional neural network CRNN+CTC is iteratively trained to obtain the recognition model, Using the recognition model based on CRNN+CTC network, it can break away from the limitation of the dictionary, and has the ability to recognize various characters such as Chinese, English and special characters, with good versatility, faster and more accurate recognition.
  • a steel plant needs to identify the characters on the steel coils.
  • the steel coils are cylindrical and stacked on the warehouse floor.
  • the characters on the steel coils are in Chinese and English, the shape of the characters is in a curved state, and the direction of the characters rotates along the center of the steel coil. .
  • the iron and steel plant uses the text recognition method provided by the embodiment of the present application to recognize the text on the steel coil, so as to know the information such as the batch and model of the steel coil.
  • Step 801 collecting steel coil pictures for training
  • the server can collect pictures of steel coils for training in real time.
  • One or more rotatable cameras are deployed in the warehouse where the steel coils are stacked in the factory. These cameras can be used as surveillance cameras at ordinary times to ensure the safety of the warehouse.
  • the detection model needs to be trained
  • the server can change the parameters of the camera, point the camera at the steel coil, periodically collect the steel coil pictures used for training, and store them in the database inside the server.
  • Step 802 performing data cleaning on the collected steel coil pictures for training
  • the server collects the steel coil pictures for training, it can perform data cleaning on the collected steel coil pictures for training to remove pictures with high similarity, extremely blurry, and wrong format.
  • Step 803 marking the cleaned picture
  • the cleaned pictures can be marked.
  • the server marks the text on the picture tightly with a polygon frame or an oval frame, and marks the content of the curved text to form a training set.
  • Step 804 train and obtain the detection model and the recognition model
  • the server can train and obtain a detection model and a recognition model according to the marked images.
  • the detection model can be iteratively trained based on the instance segmentation network Mask-RCNN; the recognition model can be iteratively trained based on the regional convolutional neural network CRNN+CTC.
  • Step 805 start the character recognition service, load the detection model and the recognition model
  • the server can call a text recognition service, such as a hypertext transfer service (HyperText Transfer Protocol, HTTP for short), to load the detection model and the recognition model.
  • a text recognition service such as a hypertext transfer service (HyperText Transfer Protocol, HTTP for short)
  • HTTP HyperText Transfer Protocol
  • Step 806 acquiring the picture of the steel coil to be identified, and performing text area detection on the picture of the steel coil to be identified;
  • the server can call the camera interface to photograph the newly stored steel coil, obtain the picture of the steel coil to be recognized, and use the detection model to detect the text area of the steel coil image to be recognized.
  • Step 808 correcting the picture of the steel coil to be identified
  • the server may correct the picture of the steel coil to be recognized.
  • the server can correct each text area in turn, that is, perform connected domain analysis on the binary mask image obtained by the detection module, and perform minimum frame fitting and perspective transformation on each connected domain in turn to obtain the horizontal
  • the target text area of is determined in the horizontal target text area to determine the axis point and the start and end edge points to determine the interpolation coordinate points, and use the thin-plate spline interpolation method to flatten the curved text based on the interpolation coordinate points to obtain the rendering of the straight text.
  • Step 809 performing text recognition on the picture of the steel coil to be recognized
  • the server can perform text recognition on the picture of the steel coil to be recognized according to the rendering of the flat text, perform inverse perspective transformation on the rendering of the flat text, obtain a polygonal fitting map, and compare the target text area with the recognized text. correspond.
  • Step 809 determine whether the recognition of the last text area is completed, if yes, end directly; otherwise, return to step 807 .
  • the server can determine whether the detection of the next text area on the picture of the steel coil to be recognized is completed, that is, to ensure that the text on the picture of the steel coil to be recognized has been recognized. If the recognition of the last text area is completed, the recognition process ends. If the last text area is not recognized, continue to recognize.
  • the character recognition method of this embodiment has the following advantages:
  • Both the detection model and the recognition model use the industry-leading deep neural network model, which improves noise resistance and versatility compared with the use of traditional image processing for curved text recognition.
  • the correction process neatly straightens the curved text, which greatly improves the Improved text recognition accuracy.
  • the character recognition method provided by the embodiment of the present application can realize the function of recognizing curved characters of various characters, unlimited sizes, colors, and fonts, and the recognition accuracy is very high.
  • the detection model and the recognition model use the pruned lightweight network, which improves the inference speed without losing the recognition accuracy, reduces time consumption, and facilitates deployment.
  • machine vision is used to recognize text in natural scenes, which can be applied to multiple scenes, such as: ID cards, passports, driving licenses, bank cards, etc.
  • Document recognition can realize automated office and speed up work efficiency; license plate number, license plate color, license plate type and other license plate recognition can record road violations, count road vehicle types, and automatically track fugitives based on license plates; value-added tax invoices, receipts and other bills Recognition; document text recognition such as books, newspapers, magazines, etc., can scan paper documents electronically, or perform real-time voice broadcast of paper documents.
  • the fifth embodiment of the present application relates to an electronic device, as shown in FIG. 24 , comprising: at least one processor 901 ; and a memory 902 communicatively connected to the at least one processor 901 ; wherein the memory 902 stores Instructions executable by the at least one processor 901, where the instructions are executed by the at least one processor 901, so that the at least one processor 901 can execute the character recognition method in each of the foregoing embodiments.
  • the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory.
  • the bus may also connect together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein.
  • the bus interface provides the interface between the bus and the transceiver.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other devices over a transmission medium.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory may be used to store data used by the processor in performing operations.
  • the sixth embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the above method embodiments are implemented when the computer program is executed by the processor.
  • a storage medium includes several instructions to make a device ( It may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, Read-Only Memory (ROM for short), Random Access Memory (RAM for short), magnetic disk or optical disk, etc. medium of program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)

Abstract

一种文字识别方法、电子设备和计算机可读存储介质。上述文字识别方法包括:获取待识别图像的二值掩码图(101);其中,所述二值掩码图用于区分所述待识别图像中的文字区域和非文字区域;对所述二值掩码图进行连通域分析,获取连通域标记(102);根据所述连通域标记,获取平直文字效果图(103);根据所述平直文字效果图,识别文字(104)。

Description

文字识别方法、电子设备和计算机可读存储介质
相关申请的交叉引用
本申请基于申请号为“202011480273.2”、申请日为2020年12月15日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请实施例涉及文字检测和识别领域,特别涉及一种文字识别方法、电子设备和计算机可读存储介质。
背景技术
随着机器学习技术的飞速发展,人类已进入机器学习的时代,机器学习在机器视觉、自然语言处理、语音识别等领域大放异彩。文字在人类的生产生活中无处不在,文字通常具有字符众多、语言众多、字体多样、排布不均、尺寸不一、颜色各异等特点,即文字在自然场景中经常是以弯曲状态存在的。
这种现象尤其存在于工业场景中,工业场景对文字识别需求很大,通过文字识别,工厂进货时可自动获取产品信息,诸如批次号、产地、质量层级等;出货时可检测产品印刷信息的准确性,保证产品质量,文字识别可助力工厂进行自动化生产、智能化转型,在以数字化为主题的今天,这将是企业、工厂不可或缺的能力。但在工厂环境中,文字形状多随工业产品的表面曲率变化而呈现弯曲变形,这给文字检测和识别带来了巨大的困难。
然而,相关的弯曲文字识别算法,识别精度不高,抗噪能力较低,不能满足用户日益增长的弯曲文字识别需求。
发明内容
本申请实施例提供了一种文字识别方法,该方法包括:获取待识别图像的二值掩码图;其中,所述二值掩码图用于区分所述待识别图像中的文字区域和非文字区域;对所述二值掩码图进行连通域分析,获取连通域标记;根据所述连通域标记,获取平直文字效果图;根据所述平直文字效果图,识别文字。
本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的文字识别方法。
本申请实施例还提供了一种可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的文字识别方法。
附图说明
图1是根据本申请第一实施例的文字识别方法的流程图;
图2是根据本申请第一实施例提供的一种待识别图像;
图3是根据本申请第一实施例提供的一种待识别图像的二值掩码图;
图4是根据本申请第一实施例提供的一种平直文字效果图;
图5是根据本申请第一实施例中,根据连通域标记,获取平直文字效果图的流程图;
图6是根据本申请第一实施例提供的又一种待识别图像;
图7是根据本申请第一实施例提供的又一种待识别图像的二值掩码图;
图8是根据本申请第一实施例提供的又一种待识别图像的二值掩码图的目标文字区域示意图;
图9是根据本申请第一实施例提供的又一种平直文字效果图;
图10是根据本申请第一实施例的一种文字识别装置的示意图;
图11是根据本申请第二实施例的文字识别方法的流程图;
图12是根据本申请第二实施例中,在目标文字区域中确定M个插值坐标点的流程图;
图13是根据本申请第二实施例提供的一种轴点和起止边缘点的示意图;
图14是根据本申请第三实施例的文字识别方法的流程图;
图15是根据本申请第三实施例提供的一种待识别图像;
图16是根据本申请第三实施例提供的一种待识别图像的二值掩码图;
图17是根据本申请第三实施例提供的一种待识别图像的二值掩码图的水平的目标文字区域;
图18是根据本申请第三实施例提供的一种平直文字效果图;
图19是根据本申请第三实施例提供的一种多边形拟合图;
图20是根据本申请第四实施例的文字识别方法的流程图;
图21是根据本申请第四实施例提供的一种训练检测模型方法的流程图;
图22是根据本申请第四实施例提供的一种训练识别模型方法的流程图;
图23是根据本申请第四实施例提供的一种对钢卷上的文字进行识别的流程图;
图24是根据本申请第五实施例的电子设备的结构示意图。
具体实施方式
本申请实施例的主要目的在于提出一种文字识别方法、电子设备和计算机可读存储介质,可以有效提升弯曲文字的识别准确率和识别速度,同时提高了弯曲文字过程的抗噪能力,大幅提升用户的使用体验。
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请的第一实施例涉及一种文字识别方法,应用于电子设备,其中,电子设备可以为终端或服务器,本实施例以及以下各个实施例中电子设备以服务器为例进行说明。下面对本实施例的文字识别方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。
本申请的实施例的应用场景可以包括但不限于:工厂进货时自动获取产品信息;工厂出货时检测产品的印刷信息,如生产日期、产地等;工商部门录入商家商标信息;使用光学识别技术(Optical Character Recognition,简称:OCR技术)将纸质文件扫描形成电子文件;身 份证、护照、行驶证、银行卡等证件识别;车牌号码、车牌颜色、车牌类型等车牌识别与追踪;增值税发票的票据类识别等。
本实施例的文字识别方法的具体流程可以如图1所示,包括:
步骤101,获取待识别图像的二值掩码图;
具体而言,服务器可以获取待识别图像的二值掩码图。其中,二值掩码图用于区分待识别图像中的文字区域和非文字区域。
在具体实现中,二值图像是指每个像素点均为黑色或者白色的图像,也就是图像中的任何像素点的灰度值只有0或255两个值,二值图中0表示黑色,255表示白色。二值掩码图是通过掩码核算子重新计算图像中各个像素的值,将待识别图像转换成只有0和255两种像素值的二值掩码图,像素值为0的位置表示没有检测到文字,像素值为255的位置表示检测到此处有文字。
在一个例子中,待识别图像为如图2所示的商标,服务器获取该待识别图像后,可以检测出待识别图像中有文字的位置和没有文字的位置,并通过掩码核算子重新计算图像中各个像素的值,获得如图3所示的该待识别图像的二值掩码图。
步骤102,对二值掩码图进行连通域分析,获取连通域标记;
具体而言,服务器获取到待识别图像的二值掩码图后,可以对二值掩码图进行连通域分析,获取连通域标记。
在具体实现中,连通域指的是图像中具有相同的像素值且相邻的区域,连通域分析,即将图像中的连通域找出来并标记,也就是获取连通域标记。由于二值掩码图只有0和255两种像素值,像素值为0的位置表示没有文字,像素值为255的文字表示此处有文字,所以通过连通域分析,可以将待识别图像数据化,使得文字区域与非文字区域的区别更加清楚、直观。
步骤103,根据连通域标记,获取平直文字效果图;
具体而言,服务器在对二值掩码图进行连通域分析,获取连通域标记后,可以根据连通域标记,获取平直文字效果图。
在一个例子中,服务器可以根据连通域标记,对二值掩码图进行插值,获取平直文字效果图。比如:服务器获取连通域标记后,使用拉格朗日插值法、薄板样条插值法(Thin Plate Spline,简称:TPS)等插值法,对如图3所示的二值掩码图进行插值,以此改变如图2所示的待识别图像的结构,在二值掩码图的基础上获得平直文字效果图,将文字拉平直,获得的平直文字效果图可以如图4所示。
在一个例子中,根据连通域标记,获取平直文字效果图,可以由如图5所示的各子步骤实现,具体如下:
子步骤1031,根据连通域标记,对二值掩码图进行最小边框拟合,获取目标文字区域;
具体而言,服务器在对二值掩码图进行连通域分析,获取连通域标记后,可以对二值掩码图进行最小边框拟合,获取目标文字区域。
在具体实现中,服务器可以使用最小边框拟合技术,对二值掩码图进行最小边框拟合,边框内的区域就是文字所在的区域,即目标文字区域。考虑到待识别图像中可能包含多个有文字的区域,对二值掩码图进行最小边框拟合,可以提取出多个文字区域,并从中选出需要识别的目标文字区域。
在一个例子中,待识别图像如图6所示,该待识别图像上有两处文字,该待识别图像的二值掩码图可以如图7所示,服务器对该待识别图像的二值掩码图进行最小边框拟合,将目标文字区域提取出来,获取如图8所示的两个目标文字区域S1和S2。
子步骤1032,对二值掩码图的目标文字区域进行插值,获取平直文字效果图;
具体而言,服务器在获取目标文字区域后,可以对二值掩码图的目标文字区域进行插值,获取平直文字效果图。只对二值掩码图的目标文字区域进行插值,获取平直文字效果图,可以使插值过程更加简单、高效。
在一个例子中,服务器获取的目标文字区域如图8所示,服务器根据待识别图像对目标文字区域进行插值,获取平直文字效果图,对待识别图像对目标文字区域进行插值而获取的平直文字效果图可以如图9所示。
在一个例子中,对二值掩码图的目标文字区域进行插值可以采用拉格朗日插值法。
在另一个例子中,对二值掩码图的目标文字区域进行插值可以采用薄板样条插值法,使用薄板样条插值法,可以有效提升弯曲文字识别过程的鲁棒性。
步骤104,根据平直文字效果图,识别文字。
具体而言,服务器在获取平直文字效果图后,可以根据平直文字效果图,识别文字。
在具体实现中,服务器获取平直文字效果图后,可以将平直文字效果图输入至文字识别模型中,进行文字识别,输出识别后的文字。其中,文字识别模型可以是Tesseract模型、AdvancedEAST模型等。
[根据细则9.2改正19.11.2021] 
在一个例子中,服务器获取的平直文字效果图可以如图4所示,服务器将该平直文字效果图输入至用于文字识别的Tesseract模型中,输出识别的文字为:子丑寅卯辰巳午未。
在一个例子中,本申请的文字识别方法可以由如图10所示的各模块实现,具体包括:
检测模块201,用于获取待识别图像的二值掩码图;
矫正模块202,用于对二值掩码图进行连通域分析,获取连通域标记,根据连通域标记,获取平直文字效果图;
识别模块203,用于根据平直文字效果图,识别文字。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施方式中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
本申请的第一实施例,获取待识别图像的二值掩码图,其中,所述二值掩码图用于区分所述待识别图像中的文字区域和非文字区域,使用二值掩码图,可以准确、迅速地确定待识别图像中哪些位置有文字,防止在识别时丢失待识别图像中的部分文字。对所述二值掩码图进行连通域分析,获取连通域标记,由于二值掩码图只有0和255两种像素值,对二值掩码图进行连通域分析,获取连通域标记,可以将待识别图像数据化,为分析和识别带来了方便。根据连通域标记,获取平直文字效果图,考虑到相关的弯曲文字识别技术需要根据已有的词典,计算每个文字的角度,还需要将各个弯曲文字之间进行分离,对每个文字自身进行复杂的计算,这也就导致识别过程比较复杂而且费时,而本申请的实施例,直接对二值掩码图进行处理,不需要考虑每个文字的情况,可以简单、快速地将弯曲文字变换成平直文字。根据所述平直文字效果图,识别文字,可以有效提升弯曲文字的识别准确率和识别速度,同时提 高了弯曲文字过程的抗噪能力,从而大幅提升用户的使用体验。
本申请的第二实施例涉及一种文字识别方法,下面对本实施例的文字识别方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须,图11是本申请第二实施例所述的文字识别方法,包括:
步骤301,获取待识别图像的二值掩码图;
步骤302,对二值掩码图进行连通域分析,获取连通域标记;
步骤303,根据连通域标记,对二值掩码图进行最小边框拟合,获取目标文字区域;
其中,步骤301至步骤303在第一实施例中已有类似说明,此处不再赘述。
步骤304,在目标文字区域中确定M个插值坐标点;
具体而言,服务器在获取目标文字区域后,可以在目标文字区域中确定M个插值坐标点。其中,M是大于1的整数。插值坐标点可以作为插值的基准。
在一个例子中,服务器可以选定目标文字区域边界的全部或部分坐标点作为插值坐标点。
在一个例子中,服务器可以以目标文字区域的中心点确定一条水平线,将该水平线上的坐标点作为插值坐标点。
在另一个例子中,在目标文字区域中确定M个插值坐标点,可以由如图12所示的各子步骤实现,具体如下:
子步骤3041,获取目标文字区域的宽度;
具体而言,服务器获取到目标文字区域后,可以获取目标文字区域的宽度。
在具体实现中,服务器对二值掩码图进行最小边框拟合后,可以保留边框,即目标文字区域周围由最小边框包围,服务器可以确定最小边框上每一点的横坐标,该最小边框上横坐标的最大值减去横坐标的最小值,即为目标文字区域的宽度。
子步骤3042,根据目标文字区域的宽度,在目标文字区域中确定N个轴点和起止边缘点;
具体而言,服务器在获取目标文字区域的宽度后,可以根据目标文字区域的宽度,在目标文字区域中确定N个轴点和起止边缘点。
在具体实现中,服务器可以根据目标文字区域的宽度,遍历整个目标文字区域,在目标文字区域中确定N个轴点,并基于这N个轴点,通过梯度计算获取目标文字区域的起止边缘点。
在一个例子中,服务器可以根据目标文字区域的宽度,遍历整个目标文字区域,在目标文字区域中确定5个等距离的轴点,并基于这5个轴点,通过梯度计算获取目标文字区域的起止边缘点各1个,确定的5个轴点和起止边缘点可以如图13所示。
子步骤3043,根据N个轴点和起止边缘点,在目标文字区域中确定M个插值坐标点。
具体而言,服务器在确定N个轴点和起止边缘点后,可以根据N个轴点和起止边缘点,在目标文字区域中确定M个插值坐标点,其中,M小于或等于N。根据轴点和起止边缘点,确定变换坐标点,可以使确定的变换坐标点更加合理,从而使得弯曲文字识别的过程更加合理。
在一个例子中,服务器在确定N个轴点和起止边缘点后,可以穿过轴点和起止边缘点做铅垂线,每条铅垂线分别交目标文字区域的最小边框两个点(上下边界各1个),服务器选定这2(N+2)=2N+4个点为插值坐标点,即M=2N+4。
比如:服务器在目标文字区域中确定了5个轴点和起止边缘点各1个,根据这7个点确定14个插值坐标点。
步骤305,根据M个插值坐标点,对二值掩码图的目标文字区域进行插值,获取平直文字效果图;
具体而言,服务器在目标文字区域中确定M个插值坐标点后,可以根据M个插值坐标点,对二值掩码图的目标文字区域进行插值,获取平直文字效果图。根据插值坐标点进行插值,可以进一步提升插值过程的速度,从而提升弯曲文字识别的速度。
步骤306,根据平直文字效果图,识别文字。
其中,步骤306在第一实施例中已有说明,此处不再赘述。
本申请的第二实施例,对二值掩码图的目标文字区域进行插值,获取平直文字效果图,包括:在目标文字区域中确定M个插值坐标点;其中,M是大于1的整数;根据M个插值坐标点,对二值掩码图的目标文字区域进行插值,获取平直文字效果图,根据插值坐标点进行插值,可以进一步提升插值过程的速度,从而提升弯曲文字识别的速度。在目标文字区域中确定M个插值坐标点,包括:获取目标文字区域的宽度;根据宽度,在目标文字区域中确定N个轴点和起止边缘点;其中,N为大于0的整数;根据N个轴点和起止边缘点,在目标文字区域中确定M个插值坐标点,可以使确定的变换坐标点更加合理,从而使得弯曲文字识别的过程更加合理。
本申请的第三实施例涉及一种文字识别方法,下面对本实施例的文字识别方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须,图14是本申请第三实施例所述的文字识别方法的示意图,包括:
步骤401,获取待识别图像的二值掩码图;
步骤402,对二值掩码图进行连通域分析,获取连通域标记;
其中,步骤401至步骤402在第一实施例中已有说明,此处不再赘述。
步骤403,根据连通域标记,对二值掩码图进行透视变换和最小边框拟合,获取水平的目标文字区域;
具体而言,服务器在获取连通域标记后,可以根据连通域标记,对二值掩码图进行透视变换和最小边框拟合,获取水平的目标文字区域。考虑到在一些场景下,文字在图片中可能不是水平存在的,将目标文字区域通过透视变换技术归于水平位置,可以进一步提升弯曲文字识别的精度。
在具体实现中,服务器可以先对二值掩码图进行透视变换,再进行最小边框拟合;也可以先对二值掩码图进行最小边框拟合,再进行透视变换,本申请的实施例对此不做具体限定。
在一个例子中,待识别图像可以如图15所示,服务器获取的该待识别图像的二值掩码图可以如图16所示,服务器先对该二值掩码图进行透视变换,将二值掩码图置于水平,再对置于水平的二值掩码图进行最小边框拟合,获取如图17所示的水平的目标文字区域。
步骤404,对二值掩码图的水平的目标文字区域进行插值,获取平直文字效果图;
具体而言,服务器在获取水平的目标文字区域后,可以对二值掩码图的水平的目标文字区域进行插值,获取平直文字效果图。
步骤405,根据平直文字效果图,识别文字;
其中,步骤405在第一实施例中已有说明,此处不再赘述。
步骤406,对平直文字效果图进行逆透视变换,获取多边形拟合图。
具体而言,服务器根据平直文字效果图,识别文字后,可以对平直文字效果图进行逆透视变换,获取多边形拟合图,逆透视变换是透视变换的逆向变换,将置于水平位置的图像还原成原来位置,即多边形拟合图,由于之前服务器做过最小边框拟合,因此多边形拟合图上的目标文字区域被最小边框包裹,最小边框可以为矩形,也可以为任意的多边形。获取多边形拟合图,可以使目标文字区域可视化,将目标文字区域与识别出的文字对应起来,方便工作人员进行查看、校验、录入等操作。
在一个例子中,待识别图像可以如图15所示,平直文字效果图可以如图18所示,服务器输出识别的文字为:“Oklahoma”,服务器对平直文字效果图进行逆透视变换,获取的多边形拟合图可以如图19所示。
本申请的第三实施例,根据连通域标记,对二值掩码图进行最小边框拟合,获取目标文字区域,包括:根据连通域标记,对二值掩码图进行透视变换和最小边框拟合,获取水平的目标文字区域;对二值掩码图的目标文字区域进行插值,获取平直文字效果图,包括:对二值掩码图的水平的目标文字区域进行插值,获取平直文字效果图,考虑到在一些场景下,文字在图片中可能不是水平存在的,将目标文字区域通过透视变换技术归于水平位置,可以进一步提升弯曲文字识别的精度。在根据平直文字效果图,识别文字之后,还包括:对平直文字效果图进行逆透视变换,获取多边形拟合图,获取多边形拟合图,可以使目标文字区域可视化,将目标文字区域与识别出的文字对应起来,方便工作人员进行查看、校验、录入等操作。
本申请的第四实施例涉及一种文字识别方法,下面对本实施例的文字识别方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须,图20是本申请第四实施例所述的文字识别方法的示意图,包括:
步骤501,根据预设的检测模型,获取待识别图像的二值掩码图;
具体而言,服务器可以根据预设的检测模型,获取待识别图像的二值掩码图。其中,预设的检测模型可以由本领域的技术人员进行构建和与训练。
在具体实现中,获取待识别图像的二值掩码图的过程在线进行,服务器加载预设的检测模型,接收待识别图像进行分析,获取待识别图像的二值掩码图。
在一个例子中,预设的检测模型可以通过如图21所示的各子步骤训练获得,具体如下:
步骤601,获取检测模型训练集;
具体而言,服务器在训练检测模型前,可以获取检测模型训练集,其中,检测模型训练集中包括若干张用于训练检测模型的训练图像,训练图像标注有弯曲文字的延展方向。
在具体实现中,服务器可以通过互联网或实时采集拍摄获得大量带有文字的图像,将这些图像进行数据清洗,去除明显的噪声样本,并将图像上的文字用多边形框或者椭圆形紧实地标记出来,根据所有标记完成的图像,获取检测模型训练集。
步骤602,根据检测模型训练集,对初始实例分割网络Mask-RCNN进行迭代训练,获得检测模型。
具体而言,服务器获取检测模型训练集后,可以根据检测模型训练集,对初始实例分割 网络Mask-RCNN进行迭代训练,获得检测模型。Mask-RCNN网络包括卷积层和感兴趣层,使用基于Mask-RCNN的检测模型,可以提高检测的准确性和速度。
在一个例子中,检测模型的训练过程可以离线进行,服务器通过监督学习的方式获取检测模型。
在具体实现中,服务器可以调用初始的Mask-RCNN网络,配置初始参数,输入检测模型训练集中的训练图像及其标注,进行迭代训练,迭代更新参数,直至检测模型满足精度要求,获得训练好的检测模型。
步骤502,对二值掩码图进行连通域分析,获取连通域标记;
步骤503,根据连通域标记,获取平直文字效果图;
其中,步骤502至步骤503在第一实施例中已有说明,此处不再赘述。
步骤504,根据平直文字效果图和预设的识别模型,识别文字。
具体而言,服务器在获取平直文字效果图后,可以调用预设的识别模型,根据平直文字效果图和预设的识别模型,识别文字,其中,预设的识别模型可以由本领域的技术人员进行构建和与训练。
在具体实现中,识别文字的过程在线进行,服务器加载预设的识别模型,接收平直文字效果图进行识别,输出识别的文字。
在一个例子中,预设的检测模型可以通过如图22所示的各子步骤训练获得,具体如下:
步骤701,获取识别模型训练集;
具体而言,服务器在训练识别模型前,可以获取识别模型训练集,其中,识别模型训练集中包括若干张用于训练识别模型的训练图像,训练图像标注有弯曲文字的延展方向,以及弯曲文字的内容。
在具体实现中,服务器可以通过互联网或实时采集拍摄获得大量带有文字的图像,将这些图像进行数据清洗,去除明显的噪声样本,并将图像上的文字用多边形框或者椭圆形紧实地标记出来,并标记弯曲文字的内容,根据所有标记完成的图像,获取识别模型训练集。
在一个例子中,识别模型训练集可以由检测模型训练集进一步标注获得,即在检测模型的训练图像上标注弯曲文字的内容。
步骤702,根据识别模型训练集,对初始区域卷积神经网络CRNN+CTC进行迭代训练,获得识别模型;
具体而言,服务器获取识别模型训练集后,可以根据识别模型训练集,对区域卷积神经网络CRNN+CTC进行迭代训练,获得识别模型。CRNN+CTC网络包含卷积层、循环层和转录层,其中:卷积层首先对输入图像缩放到同一尺寸,然后利用深度卷积神经网络进行特征提取,最后在特征图上从左至右按照统一的宽度提取特征序列;循环层通过深度双向长短期记忆网络预测卷积层输出的每一个特征序列的标签分布;转录层将循环层输出的标签分布链接到CTC模型(Connectionist temporal classification,简称:时序分类模型),以达到输入数据与标签数据对齐,最终输出不定长的序列识别结果。使用基于CRNN+CTC网络的识别模型,可以脱离词典的限制,具备中文、英文和特殊字符等多种文字的识别能力,通用性好,识别速度更快,更加准确。
本申请的第四实施例,获取待识别图像的二值掩码图,包括:根据预设的检测模型,获取待识别图像的二值掩码图;检测模型通过以下步骤训练:获取检测模型训练集;其中,检 测模型训练集中包括若干张用于训练检测模型的训练图像,训练图像标注有弯曲文字的延展方向;根据检测模型训练集,对初始实例分割网络Mask-RCNN进行迭代训练,获得检测模型,使用基于Mask-RCNN的检测模型,可以提高检测的准确性和速度。根据平直文字效果图,获取识别文字,包括:根据平直文字效果图和预设的识别模型,获取识别文字;识别模型通过以下步骤训练;获取识别模型训练集;其中,识别模型训练集中包括若干张用于训练识别模型的训练图像,训练图像标注有弯曲文字的延展方向和弯曲文字的内容;根据识别模型训练集,对初始区域卷积神经网络CRNN+CTC进行迭代训练,获得识别模型,使用基于CRNN+CTC网络的识别模型,可以脱离词典的限制,具备中文、英文和特殊字符等多种文字的识别能力,通用性好,识别速度更快,更加准确。
在一个例子中,某钢铁厂需要对钢卷上的文字进行识别,钢卷呈圆柱型,堆积在仓库地上,钢卷上文字是中英文,文字形状是弯曲状态,文字方向沿钢卷心旋转。钢铁厂采用本申请的实施例提供的文字识别方法,对钢卷上的文字进行识别,以便于获知钢卷批次、型号等信息。
对钢卷上的文字进行识别可由如图23的各步骤所示,具体包括:
步骤801,采集用于训练的钢卷图片;
具体而言,服务器可以实时采集用于训练的钢卷图片,工厂堆放钢卷的仓库部署有一个或多个可转动摄像头,这些摄像头平时可以作为监控摄像头使用,保证仓库安全,在需要训练检测模型和识别模型时,服务器可以改动摄像头的参数,将摄像头对准钢卷,定时采集用于训练的钢卷图片,并存储在服务器内部的数据库中。
步骤802,对采集的用于训练的钢卷图片进行数据清洗;
具体而言,服务器采集完用于训练的钢卷图片后,可以对采集的用于训练的钢卷图片进行数据清洗,去除相似度高、极度模糊、错误格式的图片。
步骤803,对清洗后的图片进行标注;
具体而言,服务器对采集的用于训练的钢卷图片进行数据清洗后,可以对清洗后的图片进行标注。服务器将图片上的文字用多边形框或者椭圆形框紧实地标记出来,并标记弯曲文字的内容,形成训练集。
步骤804,训练并获得检测模型和识别模型;
具体而言,服务器可以根据标注完成的图片训练并获得检测模型和识别模型。检测模型可以基于实例分割网络Mask-RCNN进行迭代训练;识别模型可以基于区域卷积神经网络CRNN+CTC进行迭代训练。
步骤805,启动文字识别服务,加载检测模型和识别模型;
在具体实现中,服务器可以调用文字识别服务,如超文本传输服务(HyperText Transfer Protocol,简称:HTTP),加载检测模型和识别模型。
步骤806,获取待识别的钢卷图片,对待识别的钢卷图片进行文字区域检测;
在具体实现中,服务器可以调用摄像头接口,拍摄新入库的钢卷,获取待识别的钢卷图片,对待识别的钢卷图片使用检测模型进行文字区域检测。
步骤808,对待识别的钢卷图片进行矫正;
具体而言,服务器在对待识别的钢卷图片进行文字区域检测后,可以对待识别的钢卷图片进行矫正。
在具体实现中,服务器可以依次对每个文字区域进行矫正,即对检测模块获取到的二值掩码图进行连通域分析,依次对每个连通域进行最小边框拟合和透视变换,获得水平的目标文字区域,在水平的目标文字区域中确定轴点和起止边缘点,从而确定插值坐标点,基于插值坐标点使用薄板样条插值法拉平弯曲文字,获取平直文字效果图。
步骤809,对待识别的钢卷图片进行文字识别;
具体而言,服务器可以根据平直文字效果图,对待识别的钢卷图片进行文字识别,并对平直文字效果图进行逆透视变换,获取多边形拟合图,将目标文字区域与识别出的文字对应起来。
步骤809,判断最后一个文字区域是否识别完成,如果是,直接结束,否则,返回步骤807。
具体而言,服务器可以判断待识别的钢卷图片上的随后一个文字区域是否检测完成,即保证待识别的钢卷图片上的文字都完成识别,若最后一个文字区域识别完成,结束识别流程,若最后一个文字区域未识别完成,继续识别。
与相关的弯曲文字识别技术相比,本实施例的文字识别方法,具有以下优势:
(1)检测模型和识别模型皆采用业界领先的深度神经网络模型,相较于使用传统图像处理进行弯曲文字识别提升了抗噪性和通用性,矫正过程巧妙地将弯曲文字拉直,极大地提升了文字识别准确率。
(2)本申请的实施例提供的文字识别方法能实现多种字符、不限尺寸、颜色、字体的弯曲文字识别功能,其识别准确率非常高。
(3)检测模型和识别模型采用剪枝后的轻量级网络,不失识别准确率的同时提升了推理速度,减少时间消耗,方便部署。
(4)具有广泛的市场应用场景,能够带来较大的研究和经济价值。在工业场景领域,弯曲文字大量存在于工业生产现场,例如进口原材料、出口产品、生产设备上等。通过人工方式记录印刻在材料、产品和设备上的文字信息,易造成误记、漏记、丢失等情况,既浪费了人力、物力成本,又存在影响工厂正常生产的可能性,不利于工厂转型和可持续发展。通过算法进行弯曲文字识别,速度快、准确性高,流水线式的生产方式提升了工厂效率,进而直接转化为经济效益。在自然场景领域,结合当下5G和多媒体的发展浪潮,通过机器视觉的方式对自然场景下的文字进行识别,可以应用于多个场景之中,例如:身份证、护照、行驶证、银行卡等证件识别,可实现自动化办公,加快办事效率;车牌号码、车牌颜色、车牌类型等车牌识别,可记录道路违法行为、统计道路车辆类型、根据车牌实现逃犯自动追踪;增值税发票、收据等票据类识别;图书、报纸、杂志等文档文字识别,可将纸质文档进行电子化扫描、或者进行纸质文档实时语音播报。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请第五实施例涉及一种电子设备,如图24所示,包括:至少一个处理器901;以及,与所述至少一个处理器901通信连接的存储器902;其中,所述存储器902存储有可被 所述至少一个处理器901执行的指令,所述指令被所述至少一个处理器901执行,以使所述至少一个处理器901能够执行上述各实施方式中的文字识别方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。
本申请第六实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称:ROM)、随机存取存储器(Random Access Memory,简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (12)

  1. 一种文字识别方法,包括:
    获取待识别图像的二值掩码图;其中,所述二值掩码图用于区分所述待识别图像中的文字区域和非文字区域;
    对所述二值掩码图进行连通域分析,获取连通域标记;
    根据所述连通域标记,获取平直文字效果图;
    根据所述平直文字效果图,识别文字。
  2. 根据权利要求1所述的文字识别方法,其中,所述根据所述连通域标记,获取平直文字效果图,包括:
    根据所述连通域标记,对所述二值掩码图进行插值,获取平直文字效果图。
  3. 根据权利要求2所述的文字识别方法,其中,所述根据所述连通域标记,对所述二值掩码图进行插值,获取平直文字效果图,包括:
    根据所述连通域标记,对所述二值掩码图进行最小边框拟合,获取目标文字区域;
    对所述二值掩码图的所述目标文字区域进行插值,获取平直文字效果图。
  4. 根据权利要求3所述的文字识别方法,其中,所述对所述二值掩码图的所述目标文字区域进行插值,获取平直文字效果图,包括:
    在所述目标文字区域中确定M个插值坐标点;其中,M是大于1的整数;
    根据所述M个插值坐标点,对所述二值掩码图的所述目标文字区域进行插值,获取平直文字效果图。
  5. 根据权利要求4所述的文字识别方法,其中,所述在所述目标文字区域中确定M个插值坐标点,包括:
    获取所述目标文字区域的宽度;
    根据所述宽度,在所述目标文字区域中确定N个轴点和起止边缘点;其中,N为大于0的整数;
    根据所述N个轴点和所述起止边缘点,在所述目标文字区域中确定M个插值坐标点。
  6. 根据权利要求3所述的文字识别方法,其中,所述根据所述连通域标记,对所述二值掩码图进行最小边框拟合,获取目标文字区域,包括:
    根据所述连通域标记,对所述二值掩码图进行透视变换和最小边框拟合,获取水平的目标文字区域;
    所述对所述二值掩码图的所述目标文字区域进行插值,获取平直文字效果图,包括:
    对所述二值掩码图的所述水平的目标文字区域进行插值,获取平直文字效果图。
  7. 根据权利要求1-6中任一项所述的文字识别方法,其中,在所述根据所述平直文字效果图,识别文字之后,还包括:
    对所述平直文字效果图进行逆透视变换,获取多边形拟合图。
  8. 根据权利要求2-7中任一项所述的文字识别方法,其中,所述根据所述连通域标记,对所述二值掩码图进行插值,获取平直文字效果图,包括:
    根据所述连通域标记,对所述二值掩码图进行薄板样条插值TPS,获取平直文字效果图。
  9. 根据权利要求1-8中任一项所述的文字识别方法,其中,所述获取待识别图像的二值掩码图,包括:
    根据预设的检测模型,获取待识别图像的二值掩码图;
    所述检测模型通过以下步骤训练:
    获取检测模型训练集;其中,所述检测模型训练集中包括若干张用于训练所述检测模型的训练图像,所述训练图像标注有弯曲文字的延展方向;
    根据所述检测模型训练集,对初始实例分割网络Mask-RCNN进行迭代训练,获得检测模型。
  10. 根据权利要求1-9中任一项所述的文字识别方法,其中,所述根据所述平直文字效果图,获取识别文字,包括:
    根据所述平直文字效果图和预设的识别模型,获取识别文字;
    所述识别模型通过以下步骤训练;
    获取识别模型训练集;其中,所述识别模型训练集中包括若干张用于训练所述识别模型的训练图像,所述训练图像标注有弯曲文字的延展方向和弯曲文字的内容;
    根据所述识别模型训练集,对初始区域卷积神经网络CRNN+CTC进行迭代训练,获得识别模型。
  11. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至10中任一项所述的文字识别方法。
  12. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至10中任一项所述的文字识别方法。
PCT/CN2021/126164 2020-12-15 2021-10-25 文字识别方法、电子设备和计算机可读存储介质 WO2022127384A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011480273.2A CN114648771A (zh) 2020-12-15 2020-12-15 文字识别方法、电子设备和计算机可读存储介质
CN202011480273.2 2020-12-15

Publications (1)

Publication Number Publication Date
WO2022127384A1 true WO2022127384A1 (zh) 2022-06-23

Family

ID=81991479

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/126164 WO2022127384A1 (zh) 2020-12-15 2021-10-25 文字识别方法、电子设备和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN114648771A (zh)
WO (1) WO2022127384A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4350539A1 (de) * 2022-10-04 2024-04-10 Primetals Technologies Germany GmbH Verfahren und system zum automatischen bildbasierten erkennen einer identifikationsinformation an einem objekt

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984865B (zh) * 2022-12-23 2024-02-27 成方金融科技有限公司 文本识别方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522900A (zh) * 2018-10-30 2019-03-26 北京陌上花科技有限公司 自然场景文字识别方法及装置
CN110287960A (zh) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 自然场景图像中曲线文字的检测识别方法
CN110969129A (zh) * 2019-12-03 2020-04-07 山东浪潮人工智能研究院有限公司 一种端到端税务票据文本检测与识别方法
US20200342172A1 (en) * 2019-04-26 2020-10-29 Wangsu Science & Technology Co., Ltd. Method and apparatus for tagging text based on adversarial learning
CN112001383A (zh) * 2020-08-10 2020-11-27 长沙奇巧匠人软件有限公司 一种基于卷积神经网络技术的水表码智能识别方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522900A (zh) * 2018-10-30 2019-03-26 北京陌上花科技有限公司 自然场景文字识别方法及装置
US20200342172A1 (en) * 2019-04-26 2020-10-29 Wangsu Science & Technology Co., Ltd. Method and apparatus for tagging text based on adversarial learning
CN110287960A (zh) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 自然场景图像中曲线文字的检测识别方法
CN110969129A (zh) * 2019-12-03 2020-04-07 山东浪潮人工智能研究院有限公司 一种端到端税务票据文本检测与识别方法
CN112001383A (zh) * 2020-08-10 2020-11-27 长沙奇巧匠人软件有限公司 一种基于卷积神经网络技术的水表码智能识别方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4350539A1 (de) * 2022-10-04 2024-04-10 Primetals Technologies Germany GmbH Verfahren und system zum automatischen bildbasierten erkennen einer identifikationsinformation an einem objekt

Also Published As

Publication number Publication date
CN114648771A (zh) 2022-06-21

Similar Documents

Publication Publication Date Title
CN110298338B (zh) 一种文档图像分类方法及装置
CN111931664B (zh) 混贴票据图像的处理方法、装置、计算机设备及存储介质
CN110659574B (zh) 文档图像勾选框状态识别后输出文本行内容的方法及系统
CN110008956B (zh) 发票关键信息定位方法、装置、计算机设备及存储介质
WO2022127384A1 (zh) 文字识别方法、电子设备和计算机可读存储介质
CN110348439B (zh) 一种自动识别价签的方法、计算机可读介质及系统
CN110503100B (zh) 一种医疗单据识别方法、装置、计算机装置及计算机可读存储介质
CN113160257A (zh) 图像数据标注方法、装置、电子设备及存储介质
CN113158895B (zh) 票据识别方法、装置、电子设备及存储介质
CN109271980A (zh) 一种车辆铭牌全信息识别方法、系统、终端及介质
US11023720B1 (en) Document parsing using multistage machine learning
CN112989921A (zh) 一种目标图像信息识别方法及其装置
CN111444912A (zh) 一种票据图像文字识别方法及装置
JP2008204184A (ja) 画像処理装置、画像処理方法、プログラムおよび記録媒体
Akanksh et al. Automated invoice data extraction using image processing
CN111414889B (zh) 基于文字识别的财务报表识别方法及装置
JP2022128202A (ja) 情報処理装置、情報処理システム、及び情報処理プログラム
CN113012075A (zh) 一种图像矫正方法、装置、计算机设备及存储介质
CN111414917A (zh) 一种低像素密度文本的识别方法
CN112364863A (zh) 证照文档的文字定位方法及系统
CN112529513A (zh) 一种智能验印方法及系统
US10963687B1 (en) Automatic correlation of items and adaptation of item attributes using object recognition
CN114155540B (zh) 基于深度学习的文字识别方法、装置、设备及存储介质
JP2020095526A (ja) 画像処理装置、方法、及びプログラム
CN115187834A (zh) 一种票据识别的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905313

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.11.2023)