WO2022179471A1 - 卡证文本识别方法、装置和存储介质 - Google Patents

卡证文本识别方法、装置和存储介质 Download PDF

Info

Publication number
WO2022179471A1
WO2022179471A1 PCT/CN2022/077038 CN2022077038W WO2022179471A1 WO 2022179471 A1 WO2022179471 A1 WO 2022179471A1 CN 2022077038 W CN2022077038 W CN 2022077038W WO 2022179471 A1 WO2022179471 A1 WO 2022179471A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
card
area
text area
Prior art date
Application number
PCT/CN2022/077038
Other languages
English (en)
French (fr)
Inventor
洪芳宇
施烈航
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022179471A1 publication Critical patent/WO2022179471A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a card text recognition method, device and storage medium.
  • AI Artificial intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. That is to say, artificial intelligence studies the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • OCR optical character recognition
  • an embodiment of the present application provides a method for recognizing text on a card.
  • the method is used in a terminal device.
  • the method includes: acquiring a first image to be recognized of a card; and detecting the first image to be recognized. , obtain at least one first text area, the first text area represents the area where the text in the first to-be-recognized image is located; according to the first text area, the first to-be-recognized image is rotated and corrected, obtaining a second to-be-recognized image; detecting the second to-be-recognized image to obtain at least one second text area, the second text area representing the area where the text in the second to-be-recognized image is located; The image in the second text area is identified, and the first target text corresponding to the second text area is obtained.
  • the first image to be recognized is obtained by acquiring the first image to be recognized, the first image to be recognized is detected, at least one first text area is obtained, and the first image to be recognized is rotated and corrected according to the first text area to obtain the second to-be-recognized image, the second to-be-recognized image is detected to obtain at least one second text area, the image in the second text area is recognized, and the first target text corresponding to the second text area is obtained, It can realize that the input is a picture, and the output is the card text content. After rotation correction, the angle of the card text can be adjusted to a better state, and the text content recognition of the inclined card image can be realized. After secondary detection, it can be avoided. The missed detection of the text area improves the detection accuracy of the text area of the slanted card image, and also improves the recognition accuracy of the text content. Reduce power consumption, avoid network disconnection and slow response caused by calling methods on the cloud side, and improve user experience.
  • the first image to be recognized is rotated and corrected according to the first text area to obtain a second image to be recognized
  • the method includes: performing rotation correction on the first to-be-recognized image according to the average inclination angle of at least one text area with the longest length in the first text area to obtain the second to-be-recognized image.
  • the accuracy of correction can be improved, thereby improving the accuracy of detection.
  • the second to-be-recognized image is detected to obtain at least one first Two text areas, including: determining the horizontal slope of the second text area; correcting the left edge and the right edge of the second text area according to the horizontal slope of the second text area, wherein, after the correction, the The left edge and the right edge of the second text area are respectively perpendicular to the upper edge and/or the lower edge of the second text area.
  • the left edge and the right edge of the second text area are respectively perpendicular to the upper edge and/or the lower edge of the second text area, which can prevent the text area from being deformed due to the irregularity of the text area after the perspective transformation is performed on the text area. , making the text in the text area easier to recognize, thereby further improving the accuracy of card text recognition.
  • the second image to be recognized is detected, Obtaining at least one second text area includes: determining the horizontal slope and height of the second text area; according to the horizontal slope of the second text area, moving the upper edge and the lower edge of the second text area to two sides respectively The extension is carried out, and the extension distance is determined according to the height.
  • the upper edge and the lower edge of the second text area are respectively extended to both sides, so that the text in the text area is easier to recognize, thereby further improving the accuracy of card text recognition.
  • the second text area is Recognizing the image in the second text area to obtain the first target text corresponding to the second text area, comprising: recognizing the image in the second text area to obtain the second target text corresponding to the second text area; determining the second text area The attribute of the target text; according to the attribute of the second target text, filter the connection sequence classification CTC sequence corresponding to the second text area to obtain the filtered CTC sequence; according to the category in the filtered CTC sequence and the corresponding The confidence level is obtained to obtain the first target text.
  • the second target text corresponding to the second text area is obtained, the attribute of the second target text is determined, and the attribute of the second target text is determined according to the second target text.
  • attribute filter the connection sequence classification CTC sequence corresponding to the second text area, obtain the filtered CTC sequence, and obtain the first target text according to the category in the filtered CTC sequence and the corresponding confidence, which can prevent During recognition, the problem of confusion and recognition errors caused by the similarity of characters reduces the recognition error rate and further improves the recognition accuracy.
  • the method further Including: training the detection model and the recognition model according to the training samples, to obtain the trained detection model and the trained recognition model; wherein, the training samples include positive samples and negative samples, and the positive samples and the negative samples are one One-to-one correspondence, the positive sample includes a card image sample, the card image sample includes a text area, and the negative sample includes a card image sample obtained by covering the text area, wherein the trained detection model uses For detecting the first text region and the second text region, the trained recognition model is used to recognize the first target text and the second target text.
  • the trained detection model and the trained recognition model are obtained, and the trained detection model is used to detect the first text area and the second text area
  • the trained recognition model is used to identify the first target text and the second target text, which can reduce the occupation of ROM in the terminal device and prevent the terminal device from being stuck.
  • the training samples include positive samples and negative samples.
  • the negative samples are in one-to-one correspondence, the positive samples include card image samples, the card image samples include a text area, and the negative samples include card image samples obtained by covering the text area, and a positive sample can be implemented.
  • Adversarial learning with negative samples enhances the detection model's discrimination between text areas and non-text areas, improves the recognition accuracy in complex backgrounds, increases the robustness of the model, and improves the accuracy of the model.
  • an embodiment of the present application provides a card text recognition device, the device is used in a terminal device, and the device includes: an acquisition module for acquiring a first image to be recognized of a card; a first detection module, is used to detect the first image to be recognized to obtain at least one first text area, and the first text area represents the area where the text in the first image to be recognized is located; a correction module is used to a first text area, for performing rotation correction on the first to-be-recognized image to obtain a second to-be-recognized image; a second detection module for detecting the second to-be-recognized image to obtain at least one second text area, The second text area represents the area where the text in the second to-be-recognized image is located; the recognition module is used for recognizing the image in the second text area to obtain the first target text corresponding to the second text area .
  • the correction module includes: a first correction sub-module, configured to The average inclination angle of at least one text area is rotated and corrected for the first image to be recognized to obtain the second image to be recognized.
  • the second detection module includes: a first determination module, configured to determining the horizontal slope of the second text area; a second correction sub-module for correcting the left edge and the right edge of the second text area according to the horizontal slope of the second text area, wherein after the correction , the left edge and the right edge of the second text area are respectively perpendicular to the upper edge and/or the lower edge of the second text area.
  • the second detection module includes: a second determination The module is used to determine the horizontal slope and height of the second text area; the extension module is used to extend the upper edge and the lower edge of the second text area to both sides according to the horizontal slope of the second text area. To extend, the distance to extend is determined according to the height.
  • the recognition module includes: identifying a sub-module for identifying the image in the second text area to obtain the second target text corresponding to the second text area; a third determining module for determining the attributes of the second target text; a filtering module, For filtering the corresponding connection sequence classification CTC sequence of the second text area according to the attribute of the second target text, to obtain the filtered CTC sequence; the fourth determination module is used for according to the filtered CTC sequence.
  • the category and corresponding confidence level are obtained to obtain the first target text.
  • the device further Including: a training module, used for training the detection model and the recognition model according to the training samples, to obtain the trained detection model and the trained recognition model; wherein, the training samples include positive samples and negative samples, and the positive samples are the same as The negative samples are in one-to-one correspondence, the positive samples include card image samples, the card image samples include a text area, and the negative samples include card image samples obtained by covering the text area, wherein the training The latter detection model is used to detect the first text area and the second text area, and the trained recognition model is used to recognize the first target text and the second target text.
  • embodiments of the present application provide a card text recognition device, the device comprising: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions At the same time, the above-mentioned first aspect or one or more of the multiple possible implementation manners of the first aspect is implemented.
  • embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect is implemented One or more of the multiple possible implementations of the card text recognition method.
  • an embodiment of the present application provides a terminal device, which can execute the above-mentioned first aspect or one or more of the card text recognition methods in multiple possible implementation manners of the first aspect.
  • embodiments of the present application provide a computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in an electronic
  • the processor in the electronic device executes the above-mentioned first aspect or one or more of the card card text recognition methods in the multiple possible implementation manners of the first aspect.
  • FIG. 1 shows a schematic diagram of an application scenario according to an embodiment of the present application.
  • FIG. 2 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • FIG. 3 shows a flowchart of generating a negative sample according to an embodiment of the present application.
  • FIG. 4 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • FIG. 5 shows a flowchart of performing rotation correction on a picture according to an embodiment of the present application.
  • FIG. 6 shows a schematic diagram of the effect of performing rotation correction on a picture according to an embodiment of the present application.
  • Figure 7a shows a schematic diagram of a quadrilateral text box obtained through secondary detection.
  • FIG. 7b shows a schematic diagram of the effect of directly performing perspective transformation on the quadrilateral text box obtained through the secondary detection.
  • FIG. 7c shows a schematic diagram of performing edge correction on a quadrilateral text box obtained by secondary detection.
  • FIG. 7d shows a schematic diagram of the effect of performing perspective transformation after performing edge correction on the quadrilateral text box obtained by the secondary detection.
  • Figure 7e shows a schematic diagram of a quadrilateral text box that is too tight.
  • FIG. 7f shows a schematic diagram of performing edge expansion on a quadrilateral text box obtained through secondary detection.
  • FIG. 8 shows a flowchart of edge correction according to an embodiment of the present application.
  • FIG. 9 shows a flowchart of performing edge expansion according to an embodiment of the present application.
  • FIG. 10 shows a schematic diagram of confidence filtering based on a CTC sequence according to an embodiment of the present application.
  • FIG. 11 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • FIG. 12 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • FIG. 13 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • FIG. 14 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • FIG. 15 shows a structural diagram of a card text recognition device according to an embodiment of the present application.
  • FIG. 16 shows a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • FIG. 17 shows a block diagram of a software structure of a terminal device according to an embodiment of the present application.
  • the terminal device when a user performs card text recognition on a terminal device, the terminal device needs to send the image of the card to the server on the cloud side for recognition, and after the recognition result is returned to the user.
  • the card includes any type of certificate with a certain shape and format, such as ID card, bank card, employee card, business card, business license, etc.
  • the picture of the card can include the picture stored by the user in the terminal device, the user's instant shot pictures, pictures scanned by the user holding the card and using the terminal device, etc.
  • the cloud-side server for identification it does not support the card text recognition directly on the terminal side, that is, on the terminal device.
  • the problem is that there is no In the case of the network, the identification cannot be performed, and there will be a network delay in the case of a network, and the response speed is slow. At the same time, in this case, the recognition accuracy is not high for the slanted card text.
  • the accuracy rate of text recognition is very low for the inclined card book, and the letterpress stencil font on the card, the card text and the background are low in distinction, and the illumination is low.
  • the recognition accuracy is very low, and some card text images cannot be recognized after previewing.
  • a dedicated detection and recognition model needs to be trained on the terminal device, it will occupy a large amount of read-only memory (ROM) storage of the terminal device, and it will also cause the problem of the terminal device being stuck.
  • the present application provides a card text recognition method.
  • the card text recognition method in the embodiment of the present application can realize the detection of the text area on the card image and the recognition of the text in the text area.
  • the method can It is applied to terminal equipment, thereby improving the recognition accuracy of text.
  • FIG. 1 shows a schematic diagram of an application scenario according to an embodiment of the present application.
  • the text recognition method provided in this embodiment of the present application may be applied to a scenario where text recognition on a card is performed on a terminal device, such as bank card number recognition or driver's license information recognition.
  • the user can upload or scan the non-horizontal bank card photo as shown in Figure 1(a) with the terminal device, and the recognized bank card number is "6214XXXX73469446";
  • the user can hold the driver's license as shown in Figure 1(b), and use the terminal device to scan or take pictures to upload, and the information of the recognized driver's license is as follows: "Name: Luo Xyan; Gender: Male; license number: 3408111992XXXX6319; permitted type: C1; license date: 2011-02-14; validity period: 2017-02-14 to 2027-02-14”.
  • the identified key information can also be processed, for example, the identified key information such as the driver's license number and name is corresponding to the preset relevant fields one-to-one to form structured text information , greatly improving the efficiency of information processing.
  • the terminal device may refer to a device with a wireless connection function, and the function of wireless connection means that it can be connected to other terminal devices through wireless connection methods such as wifi and Bluetooth.
  • the terminal device of this application may be a touch screen, a non-touch screen, or no screen, and a touch screen can control the terminal device by clicking, sliding, etc. on the display screen with a finger, a stylus pen, etc.
  • Non-touch screen devices can be connected to input devices such as mouse, keyboard, touch panel, etc., and the terminal device can be controlled through the input device.
  • devices without a screen can be a Bluetooth speaker without a screen.
  • the terminal device of the present application can be a smartphone, a netbook, a tablet computer, a notebook computer, a wearable electronic device (such as a smart bracelet, a smart watch, etc.), a TV, a virtual reality device, an audio system, an electronic ink, etc. .
  • This application does not limit the type of terminal equipment, nor does it limit the types of cards that can be recognized by the terminal equipment.
  • the embodiments of this application can be applied to any scene (including complex scenes such as natural scenes and printing scenes) for any card containing In the scenario where the information is identified, the embodiments of the present application may also be applied to other scenarios.
  • FIG. 2 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • the flow of the card text recognition method according to an embodiment of the present application includes:
  • Step S101 training phase.
  • the detection model and the recognition model can be trained by using the training set to obtain the trained detection model and the recognition model.
  • the training set may include samples of card images and their corresponding annotations.
  • the detection model and the recognition model may be general OCR models, and the present application does not limit the categories of the detection model and the recognition model.
  • the training set used for training can include positive samples and negative samples.
  • the positive samples can represent card pictures that contain text content
  • the negative samples can represent card pictures that do not contain text content.
  • the positive samples can also be transformed, and the transformed positive samples can be included in the training set, so that the original model can enhance the adaptability to new scenes.
  • the methods of transforming the positive samples can include: Random translation (simulates the situation where the user is not facing the camera and the lens is horizontally offset), random zoom (simulates the situation where the user's shooting distance is different), random rotation (simulates the situation where the user's shooting angle is tilted on the plane), perspective transformation ( Simulates the situation where the user's shooting angle is tilted back and forth), blurring (simulates the user's inaccurate focus and lens shake) and random aspect ratio (simulates the situation where the pictures taken by different mobile phones have different sizes and aspect ratios) ).
  • FIG. 3 shows a flowchart of generating a negative sample according to an embodiment of the present application. As shown in Figure 3, the process of generating negative samples includes steps S201-S204:
  • step S201 the label and the positive sample are read.
  • the label may be the labeling label of the labeled positive sample in the training set, for example, it may represent the coordinates of the text area and the non-text area in the positive sample.
  • Step S202 generating a mask map of the text area and the non-text area.
  • the corresponding mask map can be generated according to the text area and the non-text area determined by the coordinates marked in the label.
  • the mask map can be a black and white image, for example, the text area in the negative sample is white, and the non-text area is black.
  • Step S203 selecting a non-text area in the neighborhood of the text area.
  • non-text areas can be the black parts of the mask, representing what needs to be preserved.
  • Step S204 covering the text area.
  • the pixels of the non-text area can be intercepted and filled into the text area, thereby covering the text area, normally displaying the non-text area in the positive sample, and forming the image of the processed negative sample.
  • Step S205 save the negative sample.
  • the negative samples after the negative samples are generated, they can be associated with the corresponding positive samples and saved in the training set.
  • step S102 the preprocessing phase is entered.
  • the original input picture can be preprocessed to obtain the processed picture, which can be adjusted as the input of the detection model.
  • images can be normalized to fit the original input size of the detection model.
  • the image input in the preprocessing stage can be the card image uploaded by the user, and the uploading method can include the card image obtained by the user directly, the user uploading the photo stored in the terminal device, or the user scanning the card image through the terminal device, etc. etc., the uploading method can also be any other method, and this application does not limit the method for the user to upload the card picture.
  • Step S103 the text detection stage.
  • the trained detection model can be used to perform text detection on the preprocessed image twice, and the detected quadrilateral text box can be obtained during the first text detection.
  • Rotation correction can make the text lines in the picture tend to be horizontal.
  • the picture after rotation correction can be input to obtain the detected new quadrilateral text box.
  • the text boxes can correspond to the detected text areas, and the text areas may contain relevant card information that needs to be identified, such as bank card numbers.
  • Step S104 the text recognition stage.
  • edge expansion and correction and perspective transformation can be performed on the quadrilateral text box to obtain a rectangular picture block, and the trained recognition model can be used to identify the picture block and output the corresponding text content.
  • the attribute of the obtained text content can be further determined, the text content can be further accurately identified according to the attribute, and the identified text content can be checked and corrected.
  • FIG. 4 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • the process of card text recognition specifically includes:
  • Step S301 input the card picture.
  • the input card picture may be a card picture that is processed in the preprocessing stage and conforms to the input size of the detection model.
  • Step S302 using the detection model to detect the input picture to obtain the detected candidate text area.
  • candidate text regions can be detected, and the candidate text regions can include its coordinates and corresponding confidence levels.
  • candidate text regions with similar coordinates can represent the same text, and the confidence level represents the same text.
  • the corresponding candidate text area is the probability that the most suitable text area points to a certain text corresponding to it.
  • the text representing the card number in the bank card picture there may be multiple candidate text areas and corresponding confidence levels.
  • the confidence level corresponding to the candidate text areas that do not fully contain the card number text is relatively low and can fully contain the card number.
  • the confidence levels corresponding to the candidate text regions of the text are relatively high.
  • the detection model may be a detection model obtained after fine-tuning training using targeted positive and negative samples, and the training process may refer to step S101 in FIG. 2 .
  • the quadrilateral text box corresponding to each text in the picture can also be determined by fusing and filtering the redundant candidate text regions.
  • NMS non-maximum suppression
  • Step S303 rotate and correct the picture according to the first m long quadrilateral text boxes, so that the text lines in the picture tend to be horizontal.
  • step S303 quadrilateral text boxes corresponding to multiple text regions may be determined in the picture.
  • step S304 after the rotation is corrected
  • the picture is input into the detection model, and the secondary detection is performed to obtain a new quadrilateral text box.
  • FIG. 5 shows a flowchart of performing rotation correction on a picture according to an embodiment of the present application, which can be used as an example of steps S301-S304.
  • step S401 can refer to step S301 in FIG. 4
  • step S402 and step S403 can refer to step S302 in FIG. 4
  • post-processing can include fusion and filtering of redundant candidate text regions, as shown in FIG.
  • the process of performing rotation correction also includes:
  • Step S404 acquiring the first m longer text boxes in the plurality of quadrangular text boxes.
  • Step S405 calculate the average inclination angle ⁇ of the m text boxes.
  • Step S406 Rotate the picture by an angle ⁇ to obtain a picture after rotation correction.
  • the lines of text in the picture can tend to be horizontal.
  • the warpAffine function in openCV can be used to rotate the image and fill the border background of the rotated image (for example, copy the edge fill).
  • the inclination angle may be relative to the horizontal direction or relative to the vertical direction.
  • Step S407 Adjust the size of the rotated and corrected picture to obtain a picture suitable for the input size of the detection model.
  • FIG. 6 shows a schematic diagram of the effect of performing rotation correction on a picture according to an embodiment of the present application.
  • Figure 6(a) can represent the bank card picture input during the first detection
  • the text boxes 1, 2, 3, 4 can respectively represent the quadrilateral text box determined in step S303
  • Fig. 6(c) can represent the bank card picture obtained after rotation and correction.
  • the quadrilateral text boxes 1, 2, 3, and 4 can be obtained as shown in Figure 6(b). 2, 3), use the warpAffine function to realize the rotation of the image in the white box as shown in Figure 6(b).
  • step S408 the rotated and corrected picture is input into the detection model, and secondary detection is performed to obtain a new quadrilateral text box.
  • the NMS algorithm can also be used to fuse and filter the candidate text regions contained in the output results. For multiple candidate text regions corresponding to the same text, a most suitable candidate text can be finally determined. The area corresponds to the new quadrilateral text box.
  • a new quadrilateral text box is obtained. Compared with the quadrilateral text box obtained by the first detection, the new quadrilateral text box is more accurate, and the probability of missed detection is lower.
  • step S305 edge correction and expansion are performed on the quadrilateral text box obtained by the secondary detection, and perspective transformation is performed into a rectangular picture block.
  • FIG. 7a shows a schematic diagram of a quadrilateral text box obtained through secondary detection.
  • the quadrilateral text box obtained after secondary detection may have the problem that the left and right edges and the upper and lower edges are not vertical.
  • Figure 7b shows that the quadrilateral text box obtained after secondary detection directly
  • a schematic diagram of the effect of perspective transformation is shown in Figure 7b. If perspective transformation is performed directly on the quadrilateral text box shown in Figure 7a, the obliquely deformed text line as shown in Figure 7b may be obtained. If such a picture block is Entering the recognition model for recognition will deteriorate the recognition effect.
  • FIG. 8 shows a flowchart of performing edge correction according to an embodiment of the present application. As shown in Figure 8, the process of edge correction includes:
  • Step S501 determining the quadrilateral text box detected by the detection model.
  • the quadrilateral text box can be, for example, shown as the white quadrilateral text box in FIG. 7a.
  • Step S502 calculate the horizontal slope k of the quadrilateral.
  • the horizontal slope can be obtained according to the inclination of the upper and lower sides of the quadrilateral text box on the horizontal line.
  • the horizontal slope can be represented by the tangent of the angle between the line segment AD (or line segment BC) and the horizontal line in FIG. 7c.
  • Step S503 making a vertical line through the midpoints of the left and right sides of the quadrilateral frame.
  • Step S504 Calculate the intersection of the vertical line and the upper and lower sides of the quadrilateral frame to determine a new quadrilateral frame.
  • FIG. 7c shows a schematic diagram of performing edge correction on a quadrilateral text box obtained through secondary detection.
  • the horizontal slope k can be obtained according to the degree of inclination of the AD and BC sides.
  • the vertical lines of the upper and lower sides (the vertical line a and the vertical line b as shown in Figure 7c), get the intersection of the vertical lines a and b with the upper and lower sides (as shown in Figure 7c, point A, point B, point C and point D)
  • the quadrilateral text box formed by the intersection point is the quadrilateral text box obtained by edge correction (the quadrilateral ABCD shown in Figure 7c).
  • FIG. 7d shows a schematic diagram of the effect of performing perspective transformation after performing edge correction on a quadrilateral text box obtained through secondary detection. As shown in Figure 7d, after edge correction, a better text effect can be obtained after the perspective transformation of the quadrilateral text box.
  • the method of performing edge correction on the quadrilateral text box is not limited to the above methods.
  • the non-midpoints of the left and right sides can be taken as the vertical lines of the upper and lower sides, as long as the corrected quadrilateral text box is obtained.
  • the left and right sides are perpendicular to the upper and lower sides.
  • Figure 7e shows a schematic diagram of a quadrilateral text box that is too tight. As shown in Figure 7e, in the quadrilateral text box, the first number 6 and the last number 0 of the bank card number are not completely included. Entering such a text box into the recognition model may cause the recognition model to not understand the text. Sensitive, unable to identify the incompletely included number 6 and number 0.
  • FIG. 9 shows a flowchart of performing edge expansion according to an embodiment of the present application. As shown in Figure 9, the process of edge expansion includes:
  • Step S601 determining the quadrilateral text box after edge correction is performed.
  • Step S602 the height h of the quadrilateral text box is calculated.
  • Step S603 Extend each of the upper and lower sides of the quadrilateral text box by h/2 (or other multiples of height) according to the horizontal slope k.
  • Step S604 check the legitimacy of edge correction and edge expansion.
  • FIG. 7f shows a schematic diagram of performing edge expansion on a quadrilateral text box obtained through secondary detection.
  • the height h and the horizontal slope k of the text box can be calculated, and the upper and lower sides of the quadrilateral text box ABCD can be extended by h/2 according to the horizontal slope k of the text box to obtain the quadrilateral text box A 1 B 1 C 1 D 1 (the extended part is shown by the dotted box in Fig. 7f), the quadrilateral text box A 1 B 1 C 1 D 1 contains more content on the picture than the quadrilateral text box ABCD.
  • the text box obtained after the secondary detection can be subjected to edge correction first, and then the edge is expanded to obtain a corrected quadrilateral text box, and the validity of the corrected quadrilateral text box can also be checked.
  • edge correction and edge expansion are performed on the text box
  • perspective transformation can be performed on the text box, and the quadrilateral corresponding to the text box is transformed into a rectangle to obtain a picture block in the shape of a rectangle.
  • the quadrilateral corresponding to the text box may include any quadrilateral such as a parallelogram and a trapezoid.
  • a perspective transformation could be to project a quadrilateral text box in a picture onto a new viewing plane, resulting in a rectangular picture block.
  • step S306 the image block is input into the recognition model to obtain the recognized text content.
  • the recognition model may be a recognition model obtained after fine-tuning training using targeted positive and negative samples, and the training process may refer to step S101 in FIG. 2 .
  • Step S307 Determine the attributes of the text content according to the identified text content and the corresponding coordinates.
  • the 'key' in the key-value match can represent the attribute of the text content
  • the 'value' can represent the text content
  • the text content "Zhang San” is recognized, and its coordinates are confirmed to be in a predetermined specific area of the driver's license (for example, in the area indicating the name), according to the preset
  • the corresponding relationship between the specific area and the attribute can determine that the attribute of the text content is "name", and the attribute can be a preset custom attribute.
  • Step S308 Perform confidence filtering and re-identification according to the attributes of the text content to obtain a re-identification result.
  • connection sequence output in the middle of the recognition model can be On the basis of the classification (connectionist temporal classification, CTC) sequence, confidence filtering and re-identification are performed according to the attributes of the text content, wherein the CTC sequence can represent the intermediate sequence formed on the basis of using the CTC algorithm to solve the problem of character alignment,
  • FIG. 10 shows a schematic diagram of confidence filtering based on a CTC sequence according to an embodiment of the present application.
  • the picture block shown in the leftmost part of Figure 10 can be obtained, and the picture block is input into the recognition model.
  • the CTC sequence including 7357 categories and corresponding confidence levels as shown in the figure can be filtered and screened by categories, where 7357 can indicate that the total number of output categories is 7357, and the confidence level can indicate that the corresponding 7357 categories
  • the probability that each item and the text content is the item according to step S307, it can be determined that the attribute of the text content corresponding to the picture block is "bank card number", in the case of not filtering the interference items, as shown in the figure, it can be seen that the confidence A 'D' with a degree of 0.9 will be recognized as the final text content, which will result in misrecognition of the card text.
  • 7357 categories can be filtered according to the attributes of the text content (bank card number), 7346 non-numeric categories of interference items can be filtered out, and the remaining 10 items of concern items of the category of numbers can be retained. , form a new CTC sequence, and re-identify the new CTC sequence of the remaining 10 items of concern after filtering. For example, according to the corresponding confidence levels of these 10 items of concern, determine the item with the highest confidence as the re-identification result. As shown in the figure, '0' with a confidence level of 0.8 can be output as the final recognition result, thereby further improving the recognition accuracy.
  • step S309 the re-identification result is checked and corrected according to the attributes of the text content and the check rules to obtain the final text content.
  • the verification rule can be, for example, the coding rule of the bank card.
  • the coding rule of the bank card For example, in the case of identifying the card number of the bank card, confirm that the attribute of the text content is the card number, and check whether the re-identified content is a number. Rules, confirm for example whether the starting digits of the card number correspond to the issuing bank of the bank card, etc. If the initial digits of the card number do not match the issuing bank of the bank card, for example, one digit is different, this digit can be corrected according to the initial digits of the card number corresponding to the issuing bank of the bank card.
  • the LUHN (luhn algorithm) algorithm may be used according to the verification rule for verifying the bank card number.
  • FIG. 11 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • the method is used for terminal equipment, as shown in Figure 11, the method includes:
  • Step S1101 obtaining the first image to be recognized of the card
  • Step S1102 Detecting the first image to be recognized to obtain at least one first text area, where the first text area represents the area where the text in the first image to be recognized is located;
  • Step S1103 performing rotation correction on the first to-be-recognized image according to the first text area to obtain a second to-be-recognized image;
  • Step S1104 detecting the second to-be-recognized image to obtain at least one second text area, where the second text area represents the area where the text in the second to-be-recognized image is located;
  • Step S1105 Identify the image in the second text area to obtain the first target text corresponding to the second text area.
  • the first image to be recognized is obtained by acquiring the first image to be recognized, the first image to be recognized is detected, at least one first text area is obtained, and the first image to be recognized is rotated and corrected according to the first text area to obtain the second to-be-recognized image, the second to-be-recognized image is detected to obtain at least one second text area, the image in the second text area is recognized, and the first target text corresponding to the second text area is obtained, Realize that the input is a picture and the output is the card text content. After rotation correction, the angle of the card text can be adjusted to a better state, and the text content recognition of the inclined card image can be realized. After secondary detection, the text can be avoided.
  • the missed detection of the area improves the detection accuracy of the text area of the slanted card picture, and also improves the recognition accuracy of the text content.
  • Power consumption avoids the problem of network disconnection and slow response caused by invoking methods on the cloud side, and improves the user experience when using it.
  • the first image to be recognized may include a card image uploaded by the user to the terminal device, and the card image may include a card image obtained directly by the user, a photo uploaded by the user and stored in the terminal device, or scanned by the user through the terminal device. card pictures, etc., this application does not limit the way users can obtain card pictures.
  • the first text area and the second text area can refer to the quadrilateral text box described above, which can represent the area where any text in the image to be recognized is located, and the number of the second text areas can be greater than or equal to the number of the first text areas.
  • the first target text may include the text on the card, which may be determined according to the purpose of card identification. For example, to identify the card number of the bank card, the first target text may include the card number of the bank card, such as "6214XXXX73469446".
  • performing rotation correction on the first image to be recognized according to the first text region may be based on the average inclination angle of at least one text region with the longest length in the first text region , performing rotation correction on the first to-be-recognized image (for example, rotating the first to-be-recognized image by the average inclination angle) to obtain the second to-be-recognized image.
  • the accuracy of correction can be improved, thereby improving the accuracy of detection.
  • the number of at least one text area can be selected as required, which is not limited in this application.
  • Step S1103 may refer to steps S404 to S407 shown in FIG. 5 .
  • FIG. 12 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • the second to-be-recognized image is detected to obtain at least one second text area, including:
  • Step S1201 determining the horizontal slope of the second text area
  • Step S1202 correcting the left edge and the right edge of the second text area according to the horizontal slope of the second text area, wherein, after the correction, the left edge and the right edge of the second text area are respectively perpendicular to the upper edge and/or the lower edge of the second text area.
  • the left edge and the right edge of the second text area are respectively perpendicular to the upper edge and/or the lower edge of the second text area, which can prevent the text area from being deformed due to the irregularity of the text area after the perspective transformation is performed on the text area. , making the text in the text area easier to recognize, thereby further improving the accuracy of card text recognition.
  • the horizontal slope may indicate the degree of inclination of the second text area
  • the left and right edges of the second text area may refer to the left and right sides of the quadrilateral text box described in Figure 7c above
  • the upper and lower edges of the second text area Referring to the upper and lower sides of the quadrilateral text box described above, after correction, the second text area may represent a new text area in the second image to be recognized.
  • Step 1201 may refer to step S502 shown in FIG. 8
  • step S1201 may refer to steps S503 to S504 shown in FIG. 8 .
  • FIG. 13 shows a flowchart of a card text recognition method according to an embodiment of the present application.
  • the second to-be-recognized image is detected to obtain at least one second text area, including:
  • Step S1301 determining the horizontal slope and height of the second text area
  • Step S1302 according to the horizontal slope of the second text region, the upper edge and the lower edge of the second text region are respectively extended to both sides, and the extended distance is determined according to the height.
  • the upper edge and the lower edge of the second text area are respectively extended to both sides, so that the text in the text area is easier to recognize, thereby further improving the accuracy of card text recognition.
  • the extension may be to expand the range of the second to-be-recognized image that can be represented by the second text area.
  • the extended second text area text that was not originally included in the text area may be included, and the extended distance may be
  • the preset value such as 1/2 of the height or other multiples, is not limited in this application.
  • step S1301 The method for calculating the height in step S1301 may refer to step S602 in FIG. 9 , and step S1302 may refer to step S603 in FIG. 9 .
  • FIG. 14 shows a flowchart of a card text recognition method according to an embodiment of the present application. As shown in Figure 14, the image in the second text area is identified to obtain the first target text corresponding to the second text area, including:
  • Step S1401 identifying the image in the second text area to obtain a second target text corresponding to the second text area;
  • Step S1402 determining the attribute of the second target text
  • Step S1403 according to the attribute of the second target text, filter the connection sequence classification CTC sequence corresponding to the second text area to obtain the filtered CTC sequence;
  • Step S1404 Obtain the first target text according to the categories in the filtered CTC sequence and the corresponding confidence.
  • the second target text corresponding to the second text area is obtained, the attribute of the second target text is determined, and the attribute of the second target text is determined according to the second target text.
  • attribute filter the connection sequence classification CTC sequence corresponding to the second text area, obtain the filtered CTC sequence, and obtain the first target text according to the category in the filtered CTC sequence and the corresponding confidence, which can prevent During recognition, the problem of confusion and recognition errors caused by the similarity of characters reduces the recognition error rate and further improves the recognition accuracy.
  • the first target text may represent the target text after filtering and re-recognizing the CTC sequence on the basis of the second target text.
  • the attribute of the second target text may be user-defined, or obtained through the second target text (eg, obtained according to the correspondence between the position on the card where the second target text is located and the attribute).
  • the confidence level may represent the probability that the category in the corresponding CTC sequence is the first target text.
  • the filtered CTC sequence may only contain categories corresponding to the attributes of the second target text and their confidence levels.
  • the attribute of the second target text is "bank card number”
  • non-numeric items in the CTC sequence can be filtered out, and only numeric items are retained to reduce the possibility of misidentification.
  • Step 1402 can refer to step S307 in FIG. 4 , and an example of the CTC sequence can refer to the CTC sequence shown in FIG. 10 including 7357 categories and corresponding confidence levels.
  • the method further includes: training the detection model and the recognition model according to the training samples to obtain the trained detection model and the trained recognition model; wherein the training samples include positive samples and negative samples Sample, the positive sample corresponds to the negative sample one-to-one, the positive sample includes a card image sample, the card image sample includes a text area, and the negative sample includes a card obtained by covering the text area A certificate picture sample, wherein the trained detection model is used to detect the first text area and the second text area, and the trained recognition model is used to identify the first target text and the second target text.
  • the trained detection model and the trained recognition model are obtained, and the trained detection model is used to detect the first text area and the second text area
  • the trained recognition model is used to identify the first target text and the second target text, which can reduce the occupation of ROM in the terminal device and prevent the terminal device from being stuck.
  • the training samples include positive samples and negative samples.
  • the negative samples are in one-to-one correspondence, the positive samples include card image samples, the card image samples include a text area, and the negative samples include card image samples obtained by covering the text area, and a positive sample can be implemented.
  • Adversarial learning with negative samples enhances the detection model's discrimination between text areas and non-text areas, improves the recognition accuracy in complex backgrounds, increases the robustness of the model, and improves the accuracy of the model.
  • the recognition model and the detection model may include a general OCR model.
  • the application does not limit the type of the model.
  • the training method may include fine-tuning training.
  • the method of training the model please refer to step S101 in FIG. 2 .
  • the method of covering the card image sample may include filling the non-text area of the card image to the pixel of the card image. text area.
  • steps S201 to S205 in FIG. 3 For the manner of generating negative samples, reference may be made to steps S201 to S205 in FIG. 3 .
  • the detection model and the recognition model are trained according to the training samples, and the steps of obtaining the trained detection model and the trained recognition model can be performed on the terminal device or on the server, and the terminal device can download the trained detection model from the server. at least one of a model and an identification model.
  • FIG. 15 shows a structural diagram of a card text recognition device according to an embodiment of the present application. As shown in Figure 15, the device is used for terminal equipment, and the device includes:
  • Obtaining module 1501 used to obtain the first image to be recognized of the card
  • a first detection module 1502 configured to detect the first to-be-recognized image to obtain at least one first text area, where the first text area represents the area where the text in the first to-be-recognized image is located;
  • Correction module 1503 configured to perform rotation correction on the first to-be-recognized image according to the first text area to obtain a second to-be-recognized image
  • the second detection module 1504 is configured to detect the second to-be-recognized image to obtain at least one second text area, where the second text area represents the area where the text in the second to-be-recognized image is located;
  • the recognition module 1505 is configured to recognize the image in the second text area to obtain the first target text corresponding to the second text area.
  • the first image to be recognized is obtained by acquiring the first image to be recognized, the first image to be recognized is detected, at least one first text area is obtained, and the first image to be recognized is rotated and corrected according to the first text area to obtain the second to-be-recognized image, the second to-be-recognized image is detected to obtain at least one second text area, the image in the second text area is recognized, and the first target text corresponding to the second text area is obtained, It can realize that the input is a picture, and the output is the card text content. After rotation correction, the angle of the card text can be adjusted to a better state, and the text content recognition of the inclined card image can be realized. After secondary detection, it can be avoided. The missed detection of the text area improves the detection accuracy of the text area of the slanted card image, and also improves the recognition accuracy of the text content. Reduce power consumption, avoid network disconnection and slow response caused by calling methods on the cloud side, and improve user experience.
  • the correction module includes: a first correction sub-module, configured to, according to the average inclination angle of at least one text region with the longest length in the first text region, correct the first text region for the first text region.
  • the to-be-recognized image is rotated and corrected to obtain the second to-be-recognized image.
  • the accuracy of correction can be improved, thereby improving the accuracy of detection.
  • the second detection module includes: a first determination module for determining the horizontal slope of the second text region; a second correction submodule for determining the horizontal slope of the second text region according to the second text region , the left and right edges of the second text area are corrected, wherein, after the correction, the left and right edges of the second text area are perpendicular to the upper and right edges of the second text area, respectively. / or lower edge.
  • the left edge and the right edge of the second text area are respectively perpendicular to the upper edge and/or the lower edge of the second text area, which can prevent the text area from being deformed due to the irregularity of the text area after the perspective transformation is performed on the text area. , making the text in the text area easier to recognize, thereby further improving the accuracy of card text recognition.
  • the second detection module includes: a second determination module for determining the horizontal slope and height of the second text area; an extension module for For the horizontal slope, the upper edge and the lower edge of the second text area are respectively extended to two sides, and the extended distance is determined according to the height.
  • the upper edge and the lower edge of the second text area are respectively extended to both sides, so that the text in the text area is easier to recognize, thereby further improving the accuracy of card text recognition.
  • the recognition module includes: a recognition sub-module for recognizing the image in the second text area to obtain the second target text corresponding to the second text area; a third determination module , used to determine the attribute of the second target text; the filtering module is used to filter the connection sequence classification CTC sequence corresponding to the second text area according to the attribute of the second target text to obtain the filtered CTC sequence; a fourth determination module, configured to obtain the first target text according to the category and corresponding confidence in the filtered CTC sequence.
  • the second target text corresponding to the second text area is obtained, the attribute of the second target text is determined, and the attribute of the second target text is determined according to the second target text.
  • attribute filter the connection sequence classification CTC sequence corresponding to the second text area, obtain the filtered CTC sequence, and obtain the first target text according to the category in the filtered CTC sequence and the corresponding confidence, which can prevent During recognition, the problem of confusion and recognition errors caused by the similarity of characters reduces the recognition error rate and further improves the recognition accuracy.
  • the device further includes: a training module, configured to train the detection model and the recognition model according to the training samples, and obtain the trained detection model and the trained recognition model; wherein, the training samples It includes a positive sample and a negative sample, the positive sample corresponds to the negative sample one-to-one, the positive sample includes a card image sample, the card image sample includes a text area, and the negative sample includes the text area.
  • the card image sample obtained after covering, wherein the trained detection model is used to detect the first text area and the second text area, and the trained recognition model is used to identify the first target text and the second target text.
  • the trained detection model and the trained recognition model are obtained, and the trained detection model is used to detect the first text area and the second text area
  • the trained recognition model is used to identify the first target text and the second target text, which can reduce the occupation of ROM in the terminal device and prevent the terminal device from being stuck.
  • the training samples include positive samples and negative samples.
  • the negative samples are in one-to-one correspondence, the positive samples include card image samples, the card image samples include a text area, and the negative samples include card image samples obtained by covering the text area, and a positive sample can be implemented.
  • Adversarial learning with negative samples enhances the detection model's discrimination between text areas and non-text areas, improves the recognition accuracy in complex backgrounds, increases the robustness of the model, and improves the accuracy of the model.
  • FIG. 16 shows a schematic structural diagram of a terminal device according to an embodiment of the present application. Taking the terminal device as a mobile phone as an example, FIG. 16 shows a schematic structural diagram of the mobile phone 200 .
  • the mobile phone 200 may include a processor 210, an external memory interface 220, an internal memory 221, a USB interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 251, a wireless communication module 252, Audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, sensor module 280, buttons 290, motor 291, indicator 292, camera 293, display screen 294, SIM card interface 295, etc.
  • a processor 210 an external memory interface 220, an internal memory 221, a USB interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 251, a wireless communication module 252, Audio module 270, speaker 270A, receiver 270B, microphone 270C, headphone jack 270D, sensor module 280, buttons 290, motor 291, indicator 292, camera 293, display screen 294, SIM card interface 295, etc.
  • the sensor module 280 may include a gyroscope sensor 280A, an acceleration sensor 280B, a proximity light sensor 280G, a fingerprint sensor 280H, and a touch sensor 280K (of course, the mobile phone 200 may also include other sensors, such as a temperature sensor, a pressure sensor, a distance sensor, and a magnetic sensor. , ambient light sensor, air pressure sensor, bone conduction sensor, etc., not shown in the figure).
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the mobile phone 200 .
  • the mobile phone 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 210 may include one or more processing units, for example, the processor 210 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or Neural-network Processing Unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller may be the nerve center and command center of the mobile phone 200 . The controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 210 for storing instructions and data.
  • the memory in processor 210 is cache memory.
  • the memory may hold instructions or data that have just been used or recycled by the processor 210 . If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, and the waiting time of the processor 210 is reduced, thereby improving the efficiency of the system.
  • the processor 210 may run the card text recognition method provided by the embodiment of the present application, so as to obtain the first image to be recognized of the card; and detect the first image to be recognized to obtain at least one first text area, the The first text area represents the area where the text in the first to-be-recognized image is located; according to the first text area, the first to-be-recognized image is rotated and corrected to obtain a second to-be-recognized image; Two images to be recognized are detected, and at least one second text area is obtained, and the second text area represents the area where the text in the second image to be recognized is located; the image in the second text area is recognized to obtain
  • the first target text corresponding to the second text area can improve the recognition accuracy of text content, make detection and recognition quicker, reduce power consumption, and avoid network disconnection and slow response caused by invoking methods on the cloud side.
  • the processor 210 may include different devices. For example, when the CPU and the GPU are integrated, the CPU and the GPU may cooperate to execute the card card text recognition method provided by the embodiments of the present application. For example, some algorithms in the card card text recognition method are executed by the CPU, and another part of the algorithm Executed by GPU for faster processing efficiency.
  • Display screen 294 is used to display images, videos, and the like.
  • Display screen 294 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • cell phone 200 may include 1 or N display screens 294, where N is a positive integer greater than 1.
  • the display screen 294 may be used to display information entered by or provided to the user as well as various graphical user interfaces (GUIs).
  • GUIs graphical user interfaces
  • display 294 may display photos, videos, web pages, or documents, and the like.
  • display 294 may display a graphical user interface.
  • the GUI includes a status bar, a hideable navigation bar, a time and weather widget, and an application icon, such as a browser icon.
  • the status bar includes operator name (eg China Mobile), mobile network (eg 4G), time and remaining battery.
  • the navigation bar includes a back button icon, a home button icon, and a forward button icon.
  • the status bar may further include a Bluetooth icon, a Wi-Fi icon, an external device icon, and the like.
  • the graphical user interface may further include a Dock bar, and the Dock bar may include commonly used application icons and the like.
  • the display screen 294 may be an integrated flexible display screen, or a spliced display screen composed of two rigid screens and a flexible screen located between the two rigid screens.
  • the terminal device can establish a connection with other terminal devices through the antenna 1, the antenna 2 or the USB interface, and recognize the card card text according to the embodiment of the present application.
  • the method control display screen 294 displays a corresponding graphical user interface.
  • Camera 293 front camera or rear camera, or a camera that can be both a front camera and a rear camera is used to capture still images or video.
  • the camera 293 may include a photosensitive element such as a lens group and an image sensor, wherein the lens group includes a plurality of lenses (convex or concave) for collecting the light signal reflected by the object to be photographed, and transmitting the collected light signal to the image sensor .
  • the image sensor generates an original image of the object to be photographed according to the light signal.
  • Internal memory 221 may be used to store computer executable program code, which includes instructions.
  • the processor 210 executes various functional applications and data processing of the mobile phone 200 by executing the instructions stored in the internal memory 221 .
  • the internal memory 221 may include a storage program area and a storage data area.
  • the storage program area may store operating system, code of application programs (such as camera application, WeChat application, etc.), and the like.
  • the storage data area may store data created during the use of the mobile phone 200 (such as images and videos collected by the camera application) and the like.
  • the internal memory 221 may also store one or more computer programs 1310 corresponding to the card text recognition method provided in the embodiment of the present application.
  • the one or more computer programs 1304 are stored in the aforementioned memory 221 and configured to be executed by the one or more processors 210, and the one or more computer programs 1310 include instructions that may be used to perform the execution of FIG. 2 - Figure 5, Figure 8- Figure 9, Figure 11- Figure 14 in the respective steps in the corresponding embodiments, the computer program 1310 may include an acquisition module 1501, a first detection module 1502, a correction module 1503, a second detection module 1504 and identification Module 1505.
  • the acquisition module 1501 is used to acquire the first image to be recognized of the card; the first detection module 1502 is used to detect the first image to be recognized to obtain at least one first text area, the first text The region represents the region where the text in the first to-be-recognized image is located; the correction module 1503 is configured to perform rotation correction on the first to-be-recognized image according to the first text region to obtain a second to-be-recognized image; The second detection module 1504 is configured to detect the second to-be-recognized image to obtain at least one second text area, and the second text area represents the area where the text in the second to-be-recognized image is located; the recognition module 1505 , which is used to identify the image in the second text area to obtain the first target text corresponding to the second text area.
  • the processor 210 can control the display screen to display the recognition result.
  • the internal memory 221 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • non-volatile memory such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the code of the card text identification method provided by the embodiment of the present application may also be stored in an external memory.
  • the processor 210 may execute the code of the card card text recognition method stored in the external memory through the external memory interface 220 .
  • the function of the sensor module 280 is described below.
  • the gyro sensor 280A can be used to determine the movement posture of the mobile phone 200 .
  • the angular velocity of cell phone 200 about three axes ie, x, y, and z axes
  • the gyro sensor 280A can be used to detect the current motion state of the mobile phone 200, such as shaking or still.
  • the gyro sensor 280A can be used to detect a folding or unfolding operation acting on the display screen 294 .
  • the gyroscope sensor 280A may report the detected folding operation or unfolding operation to the processor 210 as an event to determine the folding state or unfolding state of the display screen 294 .
  • the acceleration sensor 280B can detect the magnitude of the acceleration of the mobile phone 200 in various directions (generally three axes). That is, the gyro sensor 280A can be used to detect the current motion state of the mobile phone 200, such as shaking or still. When the display screen in the embodiment of the present application is a foldable screen, the acceleration sensor 280B can be used to detect a folding or unfolding operation acting on the display screen 294 . The acceleration sensor 280B may report the detected folding operation or unfolding operation to the processor 210 as an event to determine the folding state or unfolding state of the display screen 294 .
  • Proximity light sensor 280G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the mobile phone emits infrared light outward through light-emitting diodes.
  • Phones use photodiodes to detect reflected infrared light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the phone. When insufficient reflected light is detected, the phone can determine that there are no objects near the phone.
  • the proximity light sensor 280G can be arranged on the first screen of the foldable display screen 294, and the proximity light sensor 280G can detect the first screen according to the optical path difference of the infrared signal.
  • the gyroscope sensor 280A (or the acceleration sensor 280B) may send the detected motion state information (such as angular velocity) to the processor 210 .
  • the processor 210 determines, based on the motion state information, whether the current state is the hand-held state or the tripod state (for example, when the angular velocity is not 0, it means that the mobile phone 200 is in the hand-held state).
  • the fingerprint sensor 280H is used to collect fingerprints.
  • the mobile phone 200 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking photos with fingerprints, answering incoming calls with fingerprints, and the like.
  • Touch sensor 280K also called “touch panel”.
  • the touch sensor 280K may be disposed on the display screen 294, and the touch sensor 280K and the display screen 294 form a touch screen, also called a "touch screen”.
  • the touch sensor 280K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 294 .
  • the touch sensor 280K may also be disposed on the surface of the mobile phone 200 , which is different from the location where the display screen 294 is located.
  • the display screen 294 of the mobile phone 200 displays a main interface, and the main interface includes icons of multiple applications (such as a camera application, a WeChat application, etc.).
  • Display screen 294 displays an interface of a camera application, such as a viewfinder interface.
  • the wireless communication function of the mobile phone 200 can be realized by the antenna 1, the antenna 2, the mobile communication module 251, the wireless communication module 252, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in handset 200 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 251 can provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the mobile phone 200 .
  • the mobile communication module 251 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 251 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 251 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 251 may be provided in the processor 210 .
  • At least part of the functional modules of the mobile communication module 251 may be provided in the same device as at least part of the modules of the processor 210 .
  • the mobile communication module 251 may also be used for information interaction with other terminal devices.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 270A, the receiver 270B, etc.), or displays images or videos through the display screen 294 .
  • the modem processor may be a stand-alone device.
  • the modulation and demodulation processor may be independent of the processor 210, and may be provided in the same device as the mobile communication module 251 or other functional modules.
  • the wireless communication module 252 can provide applications on the mobile phone 200 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 252 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 252 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 210 .
  • the wireless communication module 252 can also receive the signal to be sent from the processor 210 , perform frequency modulation on the signal, amplify the signal, and then convert it into an electromagnetic wave for radiation through the antenna 2 .
  • the wireless communication module 252 is configured to transmit data with other terminal devices under the control of the processor 210 .
  • the mobile phone 200 can implement audio functions through an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, and an application processor. Such as music playback, recording, etc.
  • the cell phone 200 can receive key 290 input and generate key signal input related to user settings and function control of the cell phone 200 .
  • the mobile phone 200 can use the motor 291 to generate vibration alerts (eg, vibration alerts for incoming calls).
  • the indicator 292 in the mobile phone 200 may be an indicator light, which may be used to indicate a charging state, a change in power, and may also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 295 in the mobile phone 200 is used to connect the SIM card. The SIM card can be contacted and separated from the mobile phone 200 by inserting into the SIM card interface 295 or pulling out from the SIM card interface 295 .
  • the mobile phone 200 may include more or less components than those shown in FIG. 16 , which are not limited in this embodiment of the present application.
  • the illustrated handset 200 is merely an example, and the handset 200 may have more or fewer components than those shown, two or more components may be combined, or may have different component configurations.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the software system of the terminal device can adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of a terminal device.
  • FIG. 17 is a block diagram of a software structure of a terminal device according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include applications such as phone, camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the telephony manager is used to provide the communication function of the terminal device. For example, the management of call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the terminal device vibrates, and the indicator light flashes.
  • Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • An embodiment of the present application provides a card text recognition device, comprising: a processor and a memory for storing instructions executable by the processor; wherein the processor is configured to implement the above method when executing the instructions.
  • Embodiments of the present application provide a non-volatile computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, implement the above method.
  • Embodiments of the present application provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read-only memory (Electrically Programmable Read-Only-Memory, EPROM or flash memory), static random access memory (Static Random-Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD - ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices, such as punch cards or raised structures in grooves on which instructions are stored, and any suitable combination of the foregoing .
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read-only memory
  • EPROM Errically Programmable Read-Only-Memory
  • SRAM static random access memory
  • portable compact disk read-only memory Compact Disc Read-Only Memory
  • CD - ROM Compact Disc Read-Only Memory
  • DVD Digital Video Disc
  • memory sticks floppy disks
  • Computer readable program instructions or code described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present application may be assembly instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network—including a Local Area Network (LAN) or a Wide Area Network (WAN)—or, can be connected to an external computer (e.g. use an internet service provider to connect via the internet).
  • electronic circuits such as programmable logic circuits, Field-Programmable Gate Arrays (FPGA), or Programmable Logic Arrays (Programmable Logic Arrays), are personalized by utilizing state information of computer-readable program instructions.
  • Logic Array, PLA the electronic circuit can execute computer readable program instructions to implement various aspects of the present application.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in hardware (eg, circuits or ASICs (Application) that perform the corresponding functions or actions. Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented by a combination of hardware and software, such as firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

本申请涉及人工智能技术领域中的光学字符识别领域,尤其涉及一种卡证文本识别方法、装置和存储介质,所述方法包括:获取卡证的第一待识别图像;对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。根据本申请实施例,可以实现提高卡证文本的识别准确度,提高用户的体验。

Description

卡证文本识别方法、装置和存储介质
本申请要求于2021年02月25日提交中国专利局、申请号为202110213987.5、发明名称为“卡证文本识别方法、装置和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种卡证文本识别方法、装置和存储介质。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。也就是说,人工智能研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
光学字符识别(optical character recognition,OCR)属于人工智能领域中的一个重要方向,OCR基于深度学习技术,提供多种服务,例如支持将图片上的文字内容,智能识别为结构化的文本,拥有广阔的应用场景,例如在当前的卡证OCR应用场景下,用户亟需响应更快、精度更高、通用性更强的OCR解决方案。
发明内容
有鉴于此,提出了一种卡证文本识别方法、装置和存储介质。
第一方面,本申请的实施例提供了一种卡证文本识别方法,该方法用于终端设备,该方法包括:获取卡证的第一待识别图像;对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。
根据本申请实施例,通过获取卡证的第一待识别图像,对第一待识别图像进行检测,得到至少一个第一文本区域,根据第一文本区域对第一待识别图像进行旋转矫正,得到第二待识别图像,对所述第二待识别图像进行检测,得到至少一个第二文本区域,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本,可以实现输入为图片,输出为卡证文本内容,经过旋转矫正可以将卡证文本的角度调整到较佳的状态,可以实现对于倾斜的卡证图片的文本内容识别,经过二次检测,可以避免文本区域的漏检,提高对于倾斜的卡证图片的文本区域的检测的准确度,同时还可以提高文本内容的识别准确度,方法用于终端设备,使得检测与识别时的响应快,还可以降低功耗,避免了云侧调用方法而导致的断网和响应慢的问题,提升了用户使用时的体验。
根据第一方面,在所述卡证文本识别方法的第一种可能的实现方式中,根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像,包括:根据所述第一 文本区域中,长度最长的至少一个文本区域的平均倾斜角度,对所述第一待识别图像进行旋转矫正,得到所述第二待识别图像。
根据本申请实施例,通过以长度最长若干个第一文本区域的平均倾斜角度对第一待识别图像进行旋转矫正,能够提高矫正的准确性,进而提高检测的准确率。
根据第一方面或第一方面的第一种可能的实现方式,在所述卡证文本识别方法的第二种可能的实现方式中,对所述第二待识别图像进行检测,得到至少一个第二文本区域,包括:确定所述第二文本区域的水平斜率;根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,其中,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘。
根据本申请实施例,通过确定所述第二文本区域的水平斜率,根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘,可以防止由于文本区域的不规则、导致的对文本区域进行透视变换后文字变形的情况,使得文本区域中的文本更加易于识别,从而进一步提高了对于卡证文本识别的准确度。
根据第一方面或第一方面的第一种或第二种可能的实现方式,在所述卡证文本识别方法的第三种可能的实现方式中,对所述第二待识别图像进行检测,得到至少一个第二文本区域,包括:确定所述第二文本区域的水平斜率和高度;根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边缘分别向两边进行延长,延长的距离根据所述高度确定。
根据本申请实施例,通过确定所述第二文本区域的水平斜率和高度,根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边缘分别向两边进行延长,可以防止文本区域过于紧贴文本导致的切字漏字的问题,使得文本区域中的文本更加易于识别,从而进一步提高了对于卡证文本识别的准确度。
根据第一方面或第一方面的第一种、第二种或第三种可能的实现方式,在所述卡证文本识别方法的第四种可能的实现方式中,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本,包括:对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本;确定所述第二目标文本的属性;根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列;根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本。
根据本申请实施例,通过对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本,确定所述第二目标文本的属性,根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列,根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本,可以防止在识别时由于字符相似导致混淆、识别错误的问题,降低了识别的错误率,进一步的提高了识别的精度。
根据第一方面或第一方面的第一种、第二种、第三种或第四种可能的实现方式,在所述卡证文本识别方法的第五种可能的实现方式中,该方法还包括:根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型;其中,所述训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对文本区域进行覆盖后得到的卡证图片样本,其中,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识 别第一目标文本和第二目标文本。
根据本申请实施例,通过根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识别第一目标文本和第二目标文本,可以减少终端设备中ROM的占用,防止终端设备的卡顿,通过训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对文本区域进行覆盖后得到的卡证图片样本,可以实现正样本和负样本的对抗学习,增强检测模型对于文本区域和非文本区域的区分度,同时提高了复杂背景下的识别准确度,增加了模型的鲁棒性,提高了模型的精度。
第二方面,本申请的实施例提供了一种卡证文本识别装置,该装置用于终端设备,该装置包括:获取模块,用于获取卡证的第一待识别图像;第一检测模块,用于对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;矫正模块,用于根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;第二检测模块,用于对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;识别模块,用于对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。
根据第二方面,在所述卡证文本识别装置的第一种可能的实现方式中,所述矫正模块包括:第一矫正子模块,用于根据所述第一文本区域中,长度最长的至少一个文本区域的平均倾斜角度,对所述第一待识别图像进行旋转矫正,得到所述第二待识别图像。
根据第二方面或第二方面的第一种可能的实现方式,在所述卡证文本识别装置的第二种可能的实现方式中,所述第二检测模块包括:第一确定模块,用于确定所述第二文本区域的水平斜率;第二矫正子模块,用于根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,其中,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘。
根据第二方面或第二方面的第一种或第二种可能的实现方式,在所述卡证文本识别装置的第三种可能的实现方式中,所述第二检测模块包括:第二确定模块,用于确定所述第二文本区域的水平斜率和高度;延长模块,用于根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边缘分别向两边进行延长,延长的距离根据所述高度确定。
根据第二方面或第二方面的第一种、第二种或第三种可能的实现方式,在所述卡证文本识别装置的第四种可能的实现方式中,所述识别模块包括:识别子模块,用于对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本;第三确定模块,用于确定所述第二目标文本的属性;过滤模块,用于根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列;第四确定模块,用于根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本。
根据第二方面或第二方面的第一种、第二种、第三种或第四种可能的实现方式,在所述卡证文本识别装置的第五种可能的实现方式中,该装置还包括:训练模块,用于根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型;其中,所述训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证 图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对文本区域进行覆盖后得到的卡证图片样本,其中,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识别第一目标文本和第二目标文本。
第三方面,本申请的实施例提供了一种卡证文本识别装置,该装置包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的卡证文本识别方法。
第四方面,本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的卡证文本识别方法。
第五方面,本申请的实施例提供了一种终端设备,该终端设备可以执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的卡证文本识别方法。
第六方面,本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述第一方面或者第一方面的多种可能的实现方式中的一种或几种的卡证文本识别方法。
本申请的这些和其他方面在以下(多个)实施例的描述中会更加简明易懂。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本申请的示例性实施例、特征和方面,并且用于解释本申请的原理。
图1示出根据本申请一实施例的应用场景的示意图。
图2示出根据本申请一实施例的卡证文本识别方法的流程图。
图3示出根据本申请一实施例的生成负样本的流程图。
图4示出根据本申请一实施例的卡证文本识别方法的流程图。
图5示出根据本申请一实施例的对图片进行旋转矫正的流程图。
图6示出根据本申请一实施例的对图片进行旋转矫正的效果示意图。
图7a示出经过二次检测得到的四边形文本框示意图。
图7b示出对经过二次检测得到的四边形文本框直接进行透视变换的效果示意图。
图7c示出对经过二次检测得到的四边形文本框进行边缘矫正的示意图。
图7d示出对经过二次检测得到的四边形文本框进行边缘矫正后再进行透视变换的效果示意图。
图7e示出过于紧贴的四边形文本框的示意图。
图7f示出对经过二次检测得到的四边形文本框进行边缘扩充的示意图。
图8示出根据本申请一实施例的进行边缘矫正的流程图。
图9示出根据本申请一实施例的进行边缘扩充的流程图。
图10示出根据本申请一实施例的基于CTC序列进行置信度过滤的示意图。
图11示出根据本申请一实施例的卡证文本识别方法的流程图。
图12示出根据本申请一实施例的卡证文本识别方法的流程图。
图13示出根据本申请一实施例的卡证文本识别方法的流程图。
图14示出根据本申请一实施例的卡证文本识别方法的流程图。
图15示出根据本申请一实施例的卡证文本识别装置的结构图。
图16示出根据本申请一实施例的终端设备的结构示意图。
图17示出根据本申请一实施例的终端设备的软件结构框图。
具体实施方式
以下将参考附图详细说明本申请的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
另外,为了更好的说明本申请,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本申请同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本申请的主旨。
当前,在一种卡证文本识别的情景下,用户在终端设备上进行卡证文本识别时,终端设备需要将卡证的图片发送至云侧的服务器上进行识别,在识别出结果后返回给用户。其中,卡证包括具有一定形状和格式的、任意类型的证件,例如身份证、银行卡、员工卡、名片、营业执照等,卡证的图片可以包括用户存储在终端设备的图片、用户即时拍摄的图片、用户手持卡证利用终端设备扫描出的图片等等。在使用云侧服务器进行识别的情况下,不支持在端侧,也就是在终端设备上直接进行卡证文本识别,同时由于终端设备和服务器需要通过网络来传输数据,带来的问题是在没有网络的情况下无法进行识别,在有网络的情况下也会存在网络时延,响应速度较慢。同时,在这种情况下,对于倾斜的卡证文本、识别的准确率不高。
在另一种情况中,在终端设备上进行卡证文本识别时,对于倾斜的卡本,文本识别的准确率很低,对于卡证上凸版钢印字体、卡证文本和背景区分度低、光照干扰、卡证图片模糊等复杂场景下的卡证文本识别,识别的准确率很低,部分卡证文本图片经过预览后无法识别。同时,如果需要在终端设备上训练专用的检测和识别模型,会占用终端设备大量的只读存储器(read-only memory,ROM)存储,还会导致终端设备卡顿的问题。
为了解决上述技术问题,本申请提供了一种卡证文本识别方法,本申请实施例的卡证文本识别方法能够实现对于卡证图片上文本区域的检测和文本区域中文本的识别,该方法可以应用于终端设备,从而提高了文本的识别准确度。
图1示出根据本申请一实施例的应用场景的示意图。在一种可能的实现方式中,本申请实施例提供的文本识别方法可以应用于在终端设备上进行卡证上的文本识别、例如银行卡卡号识别或者驾驶证信息识别的场景中,在进行银行卡卡号识别的场景中,用户可以在利用终端设备上传或扫描了如图1(a)所示的非水平的银行卡照片后,得到识别出的银行卡卡号为“6214XXXX73469446”;在进行驾驶证信息识别的场景中,用户可以手持如图1(b)所示的驾驶证,并利用终端设备进行扫描或拍照上传,得到识别出的驾驶证的信息如下:“姓名:罗X颜;性别:男;证号:3408111992XXXX6319;准驾车型:C1;领证日期:2011-02-14;有效期:2017-02-14至2027-02-14”。在识别出相关的信息后,还可以对识别出的关键信息进 行处理,例如将识别出的驾驶证证号、姓名等关键信息与预先设定的相关字段一一对应,形成结构化的文本信息,大幅提升信息处理效率。
其中,终端设备可以是指具有无线连接功能的设备,无线连接的功能是指可以通过wifi、蓝牙等无线连接方式与其他终端设备进行连接,本申请的终端设备也可以具有有线连接进行通信的功能。本申请的终端设备可以是触屏的、也可以是非触屏的、也可以是没有屏幕的,触屏的可以通过手指、触控笔等在显示屏幕上点击、滑动等方式对终端设备进行控制,非触屏的设备可以连接鼠标、键盘、触控面板等输入设备,通过输入设备对终端设备进行控制,没有屏幕的设备比如说可以是没有屏幕的蓝牙音箱等。举例来说,本申请的终端设备可以是智能手机、上网本、平板电脑、笔记本电脑、可穿戴电子设备(如智能手环、智能手表等)、TV、虚拟现实设备、音响、电子墨水,等等。本申请对于终端设备的类型不作限制,对于终端设备可以识别的卡证类型也不作限制,本申请实施例可以应用在对任意场景(包括自然场景、印刷场景等复杂场景)下对任意卡证包含的信息进行识别的场景中,本申请实施例也可以应用在除此以外的场景。
图2示出根据本申请一实施例的卡证文本识别方法的流程图。如图2所示,根据本申请一实施例的卡证文本识别方法的流程包括:
步骤S101,训练阶段。
其中,可以利用训练集对检测模型和识别模型进行训练,得到训练好的检测模型和识别模型。训练集中可以包括卡证图片的样本以及其对应的标注。检测模型和识别模型可以是通用的OCR模型,本申请对于检测模型和识别模型的类别不作限定。
例如,可以在原始的检测模型和识别模型结构和参数的基础上,进行微调训练,由此,由于并未添加新的检测模型和识别模型,仅在通用的模型的基础上进行微调训练,可以实现零ROM增加,减少终端设备的卡顿。训练使用的训练集可以包括正样本和负样本,正样本可以表示包含文本内容的卡证图片,负样本可以表示不包含文本内容的卡证图片。
其中,为了增加模型的鲁棒性,还可以对正样本进行变换,在训练集中包含变换后的正样本,使得原始模型增强对新的场景的适应能力,对正样本进行变换的方式可以包括:随机平移(模拟用户非正对拍摄,镜头水平偏移的情况)、随机缩放(模拟用户拍摄距离存在远近不一的情况)、随机旋转(模拟用户拍摄角度存在平面倾斜的情况)、透视变换(模拟用户拍摄角度存在前后倾斜的情况)、模糊处理(模拟用户对焦不准以及镜头抖动的情况)和随机长宽比(模拟不同的手机拍摄的图片尺寸不一、存在长宽比的差异的情况)。
为了使得模型可以针对正负样本进行对抗学习,使得在文本区域与图片背景深度融合、难以分辨的场景(例如字体为钢印凸版字体、文本为在风景画上印刷的情况)下,避免错框、漏框、多框、少框的情况,使得模型可以得到更好的检测和识别效果。可以对训练集中的正样本分别制作一一对应的负样本,在训练阶段将正样本和对应的负样本关联输入。图3示出根据本申请一实施例的生成负样本的流程图。如图3所示,生成负样本的流程包括步骤S201-S204:
步骤S201,读入标签和正样本。
其中,标签可以是训练集中进行标注后的正样本的标注标签,例如可以表示正样本中的文本区域和非文本区域的坐标。
步骤S202,生成文本区域和非文本区域的掩码图。
例如,可以根据标签中标注的坐标确定的文本区域和非文本区域,生成对应的掩码图。掩码图可以是黑白图片,例如负样本中文本区域是白色的部分,非文本区域是黑色的部分。
步骤S203,选取文本区域邻域的非文本区域。
例如,非文本区域可以是掩码图中黑色的部分,表示需要保留的内容。
步骤S204,覆盖文本区域。
例如,可以根据掩码图,截取非文本区域的像素填充至文本区域,由此覆盖文本区域,正常显示正样本中的非文本区域,形成处理后的负样本的图片。
步骤S205,保存负样本。
其中,在生成负样本后,可以将其与对应的正样本进行关联并保存至训练集中。
由此,可以增强检测模型对于文本区域和非文本区域的区分度。
重新参见图2,在完成步骤S102的训练阶段后,在步骤S102中,进入预处理阶段。
其中,可以对原始输入的图片进行预处理,得到处理后的图片,以调整为检测模型的输入。例如,可以对图片进行归一化处理,以适配检测模型的原始输入尺寸。
预处理阶段中输入的图片可以是用户上传的卡证图片,上传的方式可以包括用户直接拍摄得到的卡证图片、用户上传存储在终端设备中的照片、或者用户通过终端设备扫描卡证图片等等,上传的方式还可以是其他任意方式,本申请对用户上传卡证图片的方式不作限制。
步骤S103,文本检测阶段。
其中,可以使用训练好的检测模型,对预处理后的图片进行两次文本检测,在进行第一次文本检测时可以得到检测到的四边形文本框,根据检测到的四边形文本框可以对图片进行旋转矫正,使得图片内的文本行可以趋于水平,第二次文本检测时可以输入经过旋转矫正后的图片,得到检测出的新的四边形文本框。
其中,对于输入的图片,可以检测到多个文本框,文本框可以对应于检测到的文本区域,文本区域内可能包含需要进行识别的相关的卡证信息,例如银行卡卡号。
步骤S104,文本识别阶段。
其中,可以对四边形文本框进行边缘扩充和矫正、以及透视变换,得到矩形的图片块,并使用训练好的识别模型,对图片块进行识别,输出对应的文本内容。
其中,还可以进一步确定得到的文本内容的属性,根据其属性进一步精确识别文本内容,并对识别到的文本内容进行校验和修正。
在图2的基础上,图4示出根据本申请一实施例的卡证文本识别方法的流程图。如图4所示,卡证文本识别的流程具体包括:
步骤S301,输入卡证图片。
其中,输入的卡证图片可以是经过预处理阶段处理后的,符合检测模型的输入尺寸的卡证图片。
步骤S302,利用检测模型对输入的图片进行检测,得到检测到的候选文本区域。
其中,对于输入的某一张图片,可以检测到其包含的多个候选文本区域,候选文本区域可以包括其坐标和对应的置信度,坐标相近的候选文本区域可表示对应同一文本,置信度表示对应的该候选文本区域为指向其对应的某一文本的最适应的文本区域的概率。
例如,对于银行卡图片中、表示卡号的文本,可能对应着多个候选文本区域,以及相应的置信度,未能完全包含卡号文本的候选文本区域对应的置信度相对较低,能够完全包含卡 号文本的候选文本区域对应的置信度相对较高。
其中,检测模型可以是利用针对性正负样本进行微调训练后得到的检测模型,训练的过程可参照图2中步骤S101。
由上可知,对于图片中的某一文本(例如银行卡中的卡号),可能会检测到对应的多个候选文本区域,因此可能存在大量冗余。还可以通过对多余的候选文本区域进行融合以及过滤处理,确定图片内各文本对应的四边形文本框。
例如,可以使用非极大值抑制(non-maximum suppression,NMS)算法对候选文本区域进行融合以及过滤处理,对于对应同一文本的多个候选文本区域,可以最终确定一个最适应的候选文本区域形成四边形文本框。
步骤S303,根据前m个较长的四边形文本框对图片进行旋转矫正,使得图片内的文本行趋于水平。
经过步骤S303,图片中可能确定了多个文本区域分别对应的四边形文本框,为了解决图片倾斜导致的检测精度下降的问题,在完成步骤S303的旋转图片后,在步骤S304中,将旋转矫正后的图片输入到检测模型中,进行二次检测,得到新的四边形文本框。图5示出根据本申请一实施例的对图片进行旋转矫正的流程图,可作为步骤S301-S304的一个示例。其中步骤S401可参照图4中的步骤S301,步骤S402和步骤S403可参照图4中的步骤S302,后处理可以包括对多余的候选文本区域进行融合以及过滤处理,如图5所示,对图片进行旋转矫正的流程还包括:
步骤S404,获取多个四边形文本框中的前m个较长的文本框。
步骤S405,计算这m个文本框的平均倾斜角度α.
步骤S406,将图片旋转α角度,得到旋转矫正后的图片。
由此,图片中的文本行可以趋于水平。
例如,可以选择驾驶证图片中的长度前五长的文本框,计算得到这五个文本框对应的倾斜角度分别为α 1、α 2、α 3、α 4和α 5,则平均倾斜角度α=(α 12345)/5。在一种可能的实现方式中,可以利用openCV中的warpAffine函数来实现图像的旋转,以及对旋转后的图片边界背景进行填充(例如复制边缘填充)。其中,倾斜角度可以是相对于水平方向、也可以是相对于竖直方向。
步骤S407,对旋转矫正后的图片的尺寸进行调整,得到适应于检测模型输入尺寸的图片。
在一个示例中,图6示出根据本申请一实施例的对图片进行旋转矫正的效果示意图。如图6所示,图6(a)可以表示第一次检测时输入的银行卡图片,图6(b)中银行卡图片上的白点确定的白色边框的文本框1、2、3、4可以分别表示步骤S303中确定的四边形文本框,图6(c)可以表示经过旋转矫正后得到的银行卡图片。通过对银行卡图片的第一次检测可以得到如图6(b)所示的四边形文本框1、2、3、4,可以选择其中长度为前三长的四边形文本框(例如文本框1、2、3),利用warpAffine函数实现对如图6(b)所示的白色框内的图像的旋转,图6(c)中包含银行卡图片的白色框内的区域可以表示对如图6(b)所示的白色框内的图像进行旋转后的区域,图6(c)中的图像旋转后的区域以外的颜色可以由warpAffine函数通过复制边缘填充得到,最终形成旋转矫正后的银行卡图片。还可以对如图6(c)所示的旋转矫正后的银行卡图片进行尺寸的调整,以适应检测模型需要的输入尺寸。
如图6(b)所示,对未进行旋转矫正的图片进行检测,可能会有部分文本区域漏检,而图 6(c)所示的图片中的文本行已趋于水平,对该旋转矫正后的图片进行检测,可以大大的降低文本检测的难度,使得检测出的文本区域更加精确,且可以实现基本不漏检。
步骤S408,将旋转矫正后的图片输入到检测模型中,进行二次检测,得到新的四边形文本框。
其中,对于二次检测的输出结果,还可以使用NMS算法对输出结果中包含的候选文本区域进行融合以及过滤处理,对于对应同一文本的多个候选文本区域,可以最终确定一个最适应的候选文本区域对应新的四边形文本框。
得到新的四边形文本框,新的四边形文本框相对于第一次检测得到的四边形文本框,更加精准,且漏检的几率更低。
重新参见图4,在步骤S304之后,在步骤S305中,对二次检测得到的四边形文本框进行边缘矫正和扩充,并透视变换成矩形的图片块。
其中,图7a示出经过二次检测得到的四边形文本框示意图。如图7a中确定的白色四边形文本框所示,经过二次检测得到的四边形文本框,可能存在左右边缘和上下边缘不垂直的问题,图7b示出对经过二次检测得到的四边形文本框直接进行透视变换的效果示意图,如图7b所示,如果对如图7a所示的四边形文本框直接进行透视变换,可能会得到如图7b所示的倾斜变形的文本行,如果将这样的图片块输入识别模型进行识别,会劣化识别效果。
因此,可以对二次检测得到的四边形文本框进行边缘矫正,使得文本框的左右边缘和上下边缘垂直。图8示出根据本申请一实施例的进行边缘矫正的流程图。如图8所示,边缘矫正的流程包括:
步骤S501,确定检测模型检测到的四边形文本框。
其中,四边形文本框可例如图7a中的白色四边形文本框所示。
步骤S502,计算四边形的水平斜率k。
其中,水平斜率可以根据四边形文本框的上下边的在水平线上的倾斜程度得到,例如,水平斜率可用图7c中线段AD(或线段BC)与水平线之间夹角的正切值表示。
步骤S503,过四边形边框左右边的中点做垂线。
步骤S504,计算垂线与四边形边框上下边的交点,确定新的四边形边框。
在一个示例中,图7c示出对经过二次检测得到的四边形文本框进行边缘矫正的示意图。如图7c所示,对于某一四边形文本框,水平斜率k,可以根据AD边和BC边倾斜的程度得到,根据水平斜率k过中点(如图7c所示点E和点F)、作上下边的垂线(如图7c所示的垂线a和垂线b),得到垂线a和b与上下边的交点(如图7c所示点A、点B、点C和点D),交点形成的四边形文本框即为经过边缘矫正得到的四边形文本框(如图7c所示四边形ABCD)。图7d示出对经过二次检测得到的四边形文本框进行边缘矫正后再进行透视变换的效果示意图。如图7d所示,经过边缘矫正,对四边形文本框进行透视变换后能得到更好的文字效果。
对四边形文本框进行边缘矫正的方式不限于以上方式,例如,也可以取左、右边(AB边和CD边)的非中点,作上下边的垂线,只要使得矫正后得到的四边形文本框,左右边垂直于上下边即可。
在一种可能的情况中,检测模型检测出的文本框会过于紧贴文字,导致漏字缺字的情况。图7e示出过于紧贴的四边形文本框的示意图。如图7e所示,四边形文本框中,银行卡卡号的第一个数字6和最后一个数字0都没有被完全包含进去,将这样的文本框输入识别模型中, 可能会导致识别模型对文本不敏感,无法识别出该未完全被包含的数字6和数字0。
因此,可以对四边形文字框进行边缘扩充,使得进行边缘填充后的四边形文本框可以完全包含相应的文本。图9示出根据本申请一实施例的进行边缘扩充的流程图。如图9所示,边缘扩充的流程包括:
步骤S601,确定进行边缘矫正后的四边形文本框。
步骤S602,计算四边形文本框的高度h。
步骤S603,根据水平斜率k对四边形文本框的上下边各延长h/2(或高度的其他倍数)。
步骤S604,检验边缘矫正以及边缘扩充后的合法性。
其中,对边缘矫正以及边缘扩充后的四边形边框的合法性进行检验,可以确定修正后的四边形文本框是否包含图片外的内容。
在一个示例中,图7f示出对经过二次检测得到的四边形文本框进行边缘扩充的示意图。如图7f所示,可以计算文本框的高度h以及水平斜率k,根据文本框的水平斜率k对四边形文本框ABCD的上下边各延长h/2,得到四边形文本框A 1B 1C 1D 1(延长的部分如图7f中的虚线框所示),四边形文本框A 1B 1C 1D 1比四边形文本框ABCD包含了更多图片上的内容。
其中,可以先对二次检测后得到的文本框进行边缘矫正,再对其进行边缘扩充,得到修正后的四边形文本框,还可以对该修正后的四边形文本框进行合法性检验,例如,可以检验扩充后的四边形文本框的顶点坐标A 1、B 1、C 1、D 1,确定是否有在图片外的坐标。
在对文本框进行边缘矫正和边缘扩充后,可以对其进行透视变换,将文本框对应的四边形变换为矩形,得到形状为矩形的图片块。
其中,文本框对应的四边形可以包括平行四边形、梯形等任意四边形。
例如,透视变换可以是将图片中的四边形文本框投影到一个新的视平面,得到一个矩形的图片块。
重新参见图4,在完成步骤S305的边缘扩充矫正后,在步骤S306中,将图片块输入到识别模型中,得到识别出的文本内容。
其中,识别模型可以是利用针对性正负样本进行微调训练后得到的识别模型,训练的过程可参照图2中步骤S101。
步骤S307,根据识别出的文本内容和对应的坐标,确定文本内容的属性。
其中,key-value匹配中的‘key’可以表示文本内容的属性,‘value’可以表示文本内容。
例如,在对驾驶证的信息进行识别的情况下,识别出文本内容“张三”,且确认其坐标在预先确定的驾驶证的特定区域(例如在表示姓名的区域),根据预先设置的该特定区域与属性的对应关系,可以确定该文本内容的属性为“姓名”,该属性可以是预先设定的自定义属性。
步骤S308,根据文本内容的属性进行置信度过滤和重识别,得到重识别结果。
在复杂场景下,例如银行卡卡号以及身份证号由于字体的原因,数字文本和其他类型的文本极易出现如下的混淆:
(0)和(o、O、D)、(1)和(I、|、!)、(5)和(S、s)、(6)和(b)、(8)和(B)、(9)和(q,g)、(7)和(T)、(4)和(+、H)等等。
而在对卡证文本进行识别的场景中,卡证中包含的文本通常并不包括所有的文字类别(例如银行卡号中仅包含数字类别的文本),因此,可以在识别模型中间输出的连接时序分类(connectionist temporal classification,CTC)序列的基础上,根据文本内容的属性进 行置信度过滤和重识别,其中,CTC序列可以表示在利用CTC算法解决字符对齐的问题的基础上、形成的中间序列,图10示出根据本申请一实施例的基于CTC序列进行置信度过滤的示意图。如图10所示,在对银行卡的卡号进行识别的情况下,经过步骤S305,可以得到如图10最左侧所示图片块,将该图片块输入识别模型,对于卡号中间的某一位‘0’,可以对如图所示的包括7357类和对应的置信度的CTC序列进行类别的过滤筛选,其中,7357可以表示输出的类别的总数为7357个,置信度可以表示对应7357个类别中的每一项、文本内容为该项的概率,根据步骤S307,可以确定该图片块对应的文本内容的属性为“银行卡卡号”,在不对干扰项进行过滤的情况下,如图可知置信度为0.9的‘D’将被确认为最终的文本内容,由此会导致卡证文本的误识别。而由于银行卡的卡号属于数字类别,可以根据文本内容的属性(银行卡卡号)对7357个类别进行过滤,筛除7346个非数字类别的干扰项、保留剩余的10项类别为数字的关注项,形成新的CTC序列,对于过滤后剩余的10项关注项的新的CTC序列,进行重识别,例如根据这10项关注项对应的置信度,确定置信度最高的一项为重识别结果,如图所示,可以输出置信度为0.8的‘0’为最终的识别结果,由此进一步提高识别的精度。
步骤S309,根据文本内容的属性和校验规则对重识别结果进行校验和修正,得到最终的文本内容。
其中,校验规则可以是例如银行卡的编码规则,例如在对银行卡的卡号进行识别的情况下,确认文本内容的属性为卡号,检验重识别内容是否为数字,还可以根据银行卡的编码规则,确认例如卡号起始位数是否与银行卡的发卡行相对应等等。在卡号起始位数与银行卡的发卡行不符、例如有一位数字不同的情况下,可以根据银行卡的发卡行对应的卡号起始位数对这一位数字进行修正。
其中,例如根据对银行卡卡号进行校验的校验规则可以使用LUHN(luhn algorithm)算法。
图11示出根据本申请一实施例的卡证文本识别方法的流程图。该方法用于终端设备,如图11所示,该方法包括:
步骤S1101,获取卡证的第一待识别图像;
步骤S1102,对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;
步骤S1103,根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;
步骤S1104,对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;
步骤S1105,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。
根据本申请实施例,通过获取卡证的第一待识别图像,对第一待识别图像进行检测,得到至少一个第一文本区域,根据第一文本区域对第一待识别图像进行旋转矫正,得到第二待识别图像,对所述第二待识别图像进行检测,得到至少一个第二文本区域,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本,实现输入为图片,输出为卡证文本内容,经过旋转矫正可以将卡证文本的角度调整到较佳的状态,可以实现对于倾斜 的卡证图片的文本内容识别,经过二次检测,可以避免文本区域的漏检,提高对于倾斜的卡证图片的文本区域的检测的准确度,同时还可以提高文本内容的识别准确度,方法用于终端设备,使得检测与识别时的响应快,还可以降低功耗,避免了云侧调用方法而导致的断网和响应慢的问题,提升了用户使用时的体验。
其中,第一待识别图像可以包括用户上传至终端设备的卡证图片,卡证图片可以包括用户直接拍摄得到的卡证图片、用户上传存储在终端设备中的照片、或者用户通过终端设备扫描得到的卡证图片等等,本申请对用户获得卡证图片的方式不作限制。第一文本区域和第二文本区域可以参照上文所述四边形文本框,可以表示待识别图像中任意文本所在的区域,第二文本区域的数量可以大于或者等于第一文本区域的数量。第一目标文本可以包括卡证上的文本,可以根据对卡证识别的目的确定,例如,如需识别银行卡的卡号,第一目标文本可以包括银行卡的卡号,例如“6214XXXX73469446”。
在一种可能的实现方式中,根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,可以是根据第一文本区域中,长度最长的至少一个文本区域的平均倾斜角度,对第一待识别图像进行旋转矫正(例如将第一待识别图像旋转该平均倾斜角度),得到所述第二待识别图像。
根据本申请实施例,通过以长度最长若干个第一文本区域的平均倾斜角度对第一待识别图像进行旋转矫正,能够提高矫正的准确性,进而提高检测的准确率。
其中,至少一个文本区域的数量可根据需要选择,本申请对此不作限制。
步骤S1103可以参照图5所示步骤S404-步骤S407。
图12示出根据本申请一实施例的卡证文本识别方法的流程图。如图12所示,对所述第二待识别图像进行检测,得到至少一个第二文本区域,包括:
步骤S1201,确定所述第二文本区域的水平斜率;
步骤S1202,根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,其中,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘。
根据本申请实施例,通过确定所述第二文本区域的水平斜率,根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘,可以防止由于文本区域的不规则、导致的对文本区域进行透视变换后文字变形的情况,使得文本区域中的文本更加易于识别,从而进一步提高了对于卡证文本识别的准确度。
其中,水平斜率可以表示第二文本区域的倾斜程度,第二文本区域的左边缘和右边缘可以参照上文图7c所述四边形文本框的左边和右边,第二文本区域的上边缘和下边缘可以参照上文所述四边形文本框的上边和下边,矫正后,第二文本区域可以表示第二待识别图像中的新的文本区域。
步骤1201可参照图8所示步骤S502,步骤S1201可参照图8所示步骤S503-步骤S504。
图13示出根据本申请一实施例的卡证文本识别方法的流程图。如图13所示,对所述第二待识别图像进行检测,得到至少一个第二文本区域,包括:
步骤S1301,确定所述第二文本区域的水平斜率和高度;
步骤S1302,根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边 缘分别向两边进行延长,延长的距离根据所述高度确定。
根据本申请实施例,通过确定所述第二文本区域的水平斜率和高度,根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边缘分别向两边进行延长,可以防止文本区域过于紧贴文本导致的切字漏字的问题,使得文本区域中的文本更加易于识别,从而进一步提高了对于卡证文本识别的准确度。
其中,延长可以是扩大第二文本区域可表示的第二待识别图像的范围,例如对于延长后的第二文本区域,可以将原本未包括在文本区域内的文本包括进来,延长的距离可以是预先设定的,例如高度的1/2或其他倍数,本申请对此不作限制。
步骤S1301中计算高度的方法可以参照图9中步骤S602,步骤S1302可以参照图9中步骤S603。
图14示出根据本申请一实施例的卡证文本识别方法的流程图。如图14所示,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本,包括:
步骤S1401,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本;
步骤S1402,确定所述第二目标文本的属性;
步骤S1403,根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列;
步骤S1404,根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本。
根据本申请实施例,通过对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本,确定所述第二目标文本的属性,根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列,根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本,可以防止在识别时由于字符相似导致混淆、识别错误的问题,降低了识别的错误率,进一步的提高了识别的精度。
其中,第一目标文本可以表示在第二目标文本的基础上、进行了过滤CTC序列并重识别后的目标文本。第二目标文本的属性可以是自定义的,也可以是通过第二目标文本获得的(例如根据第二目标文本所在卡证上的位置与属性的对应关系获得)。置信度可以表示对应的CTC序列中的类别为第一目标文本的概率。过滤后的CTC序列可以仅包含与第二目标文本的属性对应的类别及其置信度。
例如,在第二目标文本的属性是“银行卡卡号”的情况下,可以过滤掉CTC序列中非数字类型的项,仅保留数字类型的项,减少误识别的可能。
步骤1402可参照图4中步骤S307,CTC序列的一个示例可参照图10所示的包含7357类和对应的置信度的CTC序列。
在一种可能的实现方式中,该方法还包括:根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型;其中,所述训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对文本区域进行覆盖后得到的卡证图片样本,其中,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识别第一目标文本和第二目标文本。
根据本申请实施例,通过根据训练样本对检测模型和识别模型进行训练,得到训练后的 检测模型和训练后的识别模型,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识别第一目标文本和第二目标文本,可以减少终端设备中ROM的占用,防止终端设备的卡顿,通过训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对文本区域进行覆盖后得到的卡证图片样本,可以实现正样本和负样本的对抗学习,增强检测模型对于文本区域和非文本区域的区分度,同时提高了复杂背景下的识别准确度,增加了模型的鲁棒性,提高了模型的精度。
其中,识别模型和检测模型可以包括通用的OCR模型,本申请对于模型的类型不作限制,训练的方法可以包括微调训练,对模型进行训练的方式可参照图2中步骤S101,卡证图片样本可包括进行随机平移、随机缩放、随机旋转、透视变换、模糊处理和随机长宽比处理后的卡证图片样本,覆盖卡证图片样本的方式可以包括将卡证图片中非文本区域的像素填充至文本区域。
负样本生成的方式可以参照图3中步骤S201-步骤S205。
根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型的步骤可以在终端设备上执行,也可以在服务器上执行,终端设备可从服务器下载训练后的检测模型和识别模型中的至少其中之一。
图15示出根据本申请一实施例的卡证文本识别装置的结构图。如图15所示,该装置用于终端设备,该装置包括:
获取模块1501,用于获取卡证的第一待识别图像;
第一检测模块1502,用于对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;
矫正模块1503,用于根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;
第二检测模块1504,用于对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;
识别模块1505,用于对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。
根据本申请实施例,通过获取卡证的第一待识别图像,对第一待识别图像进行检测,得到至少一个第一文本区域,根据第一文本区域对第一待识别图像进行旋转矫正,得到第二待识别图像,对所述第二待识别图像进行检测,得到至少一个第二文本区域,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本,可以实现输入为图片,输出为卡证文本内容,经过旋转矫正可以将卡证文本的角度调整到较佳的状态,可以实现对于倾斜的卡证图片的文本内容识别,经过二次检测,可以避免文本区域的漏检,提高对于倾斜的卡证图片的文本区域的检测的准确度,同时还可以提高文本内容的识别准确度,装置用于终端设备,使得检测与识别时的响应快,还可以降低功耗,避免了云侧调用方法而导致的断网和响应慢的问题,提升了用户使用时的体验。
在一种可能的实现方式中,所述矫正模块包括:第一矫正子模块,用于根据所述第一文本区域中,长度最长的至少一个文本区域的平均倾斜角度,对所述第一待识别图像进行旋转矫正,得到所述第二待识别图像。
根据本申请实施例,通过以长度最长若干个第一文本区域的平均倾斜角度对第一待识别图像进行旋转矫正,能够提高矫正的准确性,进而提高检测的准确率。
在一种可能的实现方式中,所述第二检测模块包括:第一确定模块,用于确定所述第二文本区域的水平斜率;第二矫正子模块,用于根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,其中,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘。
根据本申请实施例,通过确定所述第二文本区域的水平斜率,根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘,可以防止由于文本区域的不规则、导致的对文本区域进行透视变换后文字变形的情况,使得文本区域中的文本更加易于识别,从而进一步提高了对于卡证文本识别的准确度。
在一种可能的实现方式中,所述第二检测模块包括:第二确定模块,用于确定所述第二文本区域的水平斜率和高度;延长模块,用于根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边缘分别向两边进行延长,延长的距离根据所述高度确定。
根据本申请实施例,通过确定所述第二文本区域的水平斜率和高度,根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边缘分别向两边进行延长,可以防止文本区域过于紧贴文本导致的切字漏字的问题,使得文本区域中的文本更加易于识别,从而进一步提高了对于卡证文本识别的准确度。
在一种可能的实现方式中,所述识别模块包括:识别子模块,用于对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本;第三确定模块,用于确定所述第二目标文本的属性;过滤模块,用于根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列;第四确定模块,用于根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本。
根据本申请实施例,通过对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本,确定所述第二目标文本的属性,根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列,根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本,可以防止在识别时由于字符相似导致混淆、识别错误的问题,降低了识别的错误率,进一步的提高了识别的精度。
在一种可能的实现方式中,该装置还包括:训练模块,用于根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型;其中,所述训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对文本区域进行覆盖后得到的卡证图片样本,其中,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识别第一目标文本和第二目标文本。
根据本申请实施例,通过根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识别第一目标文本和第二目标文本,可以减少终端设备中ROM的占用,防止终端设备的卡顿,通过训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对 文本区域进行覆盖后得到的卡证图片样本,可以实现正样本和负样本的对抗学习,增强检测模型对于文本区域和非文本区域的区分度,同时提高了复杂背景下的识别准确度,增加了模型的鲁棒性,提高了模型的精度。
图16示出根据本申请一实施例的终端设备的结构示意图。以终端设备是手机为例,图16示出了手机200的结构示意图。
手机200可以包括处理器210,外部存储器接口220,内部存储器221,USB接口230,充电管理模块240,电源管理模块241,电池242,天线1,天线2,移动通信模块251,无线通信模块252,音频模块270,扬声器270A,受话器270B,麦克风270C,耳机接口270D,传感器模块280,按键290,马达291,指示器292,摄像头293,显示屏294,以及SIM卡接口295等。其中传感器模块280可以包括陀螺仪传感器280A,加速度传感器280B,接近光传感器280G、指纹传感器280H,触摸传感器280K(当然,手机200还可以包括其它传感器,比如温度传感器,压力传感器、距离传感器、磁传感器、环境光传感器、气压传感器、骨传导传感器等,图中未示出)。
可以理解的是,本申请实施例示意的结构并不构成对手机200的具体限定。在本申请另一些实施例中,手机200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器210可以包括一个或多个处理单元,例如:处理器210可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(Neural-network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。其中,控制器可以是手机200的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器210中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器210中的存储器为高速缓冲存储器。该存储器可以保存处理器210刚用过或循环使用的指令或数据。如果处理器210需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器210的等待时间,因而提高了系统的效率。
处理器210可以运行本申请实施例提供的卡证文本识别方法,以便于获取卡证的第一待识别图像;对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本,以提高文本内容的识别准确度,使得检测与识别时的响应快,同时可以降低功耗,避免了云侧调用方法而导致的断网和响应慢的问题,提升了用户使用时的体验。处理器210可以包括不同的器件,比如集成CPU和GPU时,CPU和GPU可以配合执行本申请实施例提供的卡证文本识别方法,比如卡证文本识别方法中部分算法由CPU执行,另一部分算法由GPU执行,以得到较快的处理效率。
显示屏294用于显示图像,视频等。显示屏294包括显示面板。显示面板可以采用液晶 显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,手机200可以包括1个或N个显示屏294,N为大于1的正整数。显示屏294可用于显示由用户输入的信息或提供给用户的信息以及各种图形用户界面(graphical user interface,GUI)。例如,显示器294可以显示照片、视频、网页、或者文件等。再例如,显示器294可以显示图形用户界面。其中,图形用户界面上包括状态栏、可隐藏的导航栏、时间和天气小组件(widget)、以及应用的图标,例如浏览器图标等。状态栏中包括运营商名称(例如中国移动)、移动网络(例如4G)、时间和剩余电量。导航栏中包括后退(back)键图标、主屏幕(home)键图标和前进键图标。此外,可以理解的是,在一些实施例中,状态栏中还可以包括蓝牙图标、Wi-Fi图标、外接设备图标等。还可以理解的是,在另一些实施例中,图形用户界面中还可以包括Dock栏,Dock栏中可以包括常用的应用图标等。当处理器210检测到用户的手指(或触控笔等)针对某一应用图标的触摸事件后,响应于该触摸事件,打开与该应用图标对应的应用的用户界面,并在显示器294上显示该应用的用户界面。
在本申请实施例中,显示屏294可以是一个一体的柔性显示屏,也可以采用两个刚性屏以及位于两个刚性屏之间的一个柔性屏组成的拼接显示屏。
当处理器210运行本申请实施例提供的卡证文本识别方法后,终端设备可以通过天线1、天线2或者USB接口与其他的终端设备建立连接,并根据本申请实施例提供的卡证文本识别方法控制显示屏294显示相应的图形用户界面。
摄像头293(前置摄像头或者后置摄像头,或者一个摄像头既可作为前置摄像头,也可作为后置摄像头)用于捕获静态图像或视频。通常,摄像头293可以包括感光元件比如镜头组和图像传感器,其中,镜头组包括多个透镜(凸透镜或凹透镜),用于采集待拍摄物体反射的光信号,并将采集的光信号传递给图像传感器。图像传感器根据所述光信号生成待拍摄物体的原始图像。
内部存储器221可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器210通过运行存储在内部存储器221的指令,从而执行手机200的各种功能应用以及数据处理。内部存储器221可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,应用程序(比如相机应用,微信应用等)的代码等。存储数据区可存储手机200使用过程中所创建的数据(比如相机应用采集的图像、视频等)等。
内部存储器221还可以存储本申请实施例提供的卡证文本识别方法对应的一个或多个计算机程序1310。该一个或多个计算机程序1304被存储在上述存储器221中并被配置为被该一个或多个处理器210执行,该一个或多个计算机程序1310包括指令,上述指令可以用于执行如图2-图5、图8-图9、图11-图14相应实施例中的各个步骤,该计算机程序1310可以包括获取模块1501、第一检测模块1502、矫正模块1503、第二检测模块1504以及识别模块1505。其中,获取模块1501,用于获取卡证的第一待识别图像;第一检测模块1502,用于对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;矫正模块1503,用于根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;第二检测模块1504,用于对所述第二待 识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;识别模块1505,用于对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。当内部存储器221中存储的卡证文本识别方法的代码被处理器210运行时,处理器210可以控制显示屏显示识别结果。
此外,内部存储器221可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
当然,本申请实施例提供的卡证文本识别方法的代码还可以存储在外部存储器中。这种情况下,处理器210可以通过外部存储器接口220运行存储在外部存储器中的卡证文本识别方法的代码。
下面介绍传感器模块280的功能。
陀螺仪传感器280A,可以用于确定手机200的运动姿态。在一些实施例中,可以通过陀螺仪传感器280A确定手机200围绕三个轴(即,x,y和z轴)的角速度。即陀螺仪传感器280A可以用于检测手机200当前的运动状态,比如抖动还是静止。
当本申请实施例中的显示屏为可折叠屏时,陀螺仪传感器280A可用于检测作用于显示屏294上的折叠或者展开操作。陀螺仪传感器280A可以将检测到的折叠操作或者展开操作作为事件上报给处理器210,以确定显示屏294的折叠状态或展开状态。
加速度传感器280B可检测手机200在各个方向上(一般为三轴)加速度的大小。即陀螺仪传感器280A可以用于检测手机200当前的运动状态,比如抖动还是静止。当本申请实施例中的显示屏为可折叠屏时,加速度传感器280B可用于检测作用于显示屏294上的折叠或者展开操作。加速度传感器280B可以将检测到的折叠操作或者展开操作作为事件上报给处理器210,以确定显示屏294的折叠状态或展开状态。
接近光传感器280G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。手机通过发光二极管向外发射红外光。手机使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定手机附近有物体。当检测到不充分的反射光时,手机可以确定手机附近没有物体。当本申请实施例中的显示屏为可折叠屏时,接近光传感器280G可以设置在可折叠的显示屏294的第一屏上,接近光传感器280G可根据红外信号的光程差来检测第一屏与第二屏的折叠角度或者展开角度的大小。
陀螺仪传感器280A(或加速度传感器280B)可以将检测到的运动状态信息(比如角速度)发送给处理器210。处理器210基于运动状态信息确定当前是手持状态还是脚架状态(比如,角速度不为0时,说明手机200处于手持状态)。
指纹传感器280H用于采集指纹。手机200可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
触摸传感器280K,也称“触控面板”。触摸传感器280K可以设置于显示屏294,由触摸传感器280K与显示屏294组成触摸屏,也称“触控屏”。触摸传感器280K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏294提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器280K也可以设置于手机200的表面,与显示屏294所处的位置不同。
示例性的,手机200的显示屏294显示主界面,主界面中包括多个应用(比如相机应用、微信应用等)的图标。用户通过触摸传感器280K点击主界面中相机应用的图标,触发处理器 210启动相机应用,打开摄像头293。显示屏294显示相机应用的界面,例如取景界面。
手机200的无线通信功能可以通过天线1,天线2,移动通信模块251,无线通信模块252,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。手机200中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块251可以提供应用在手机200上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块251可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块251可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块251还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块251的至少部分功能模块可以被设置于处理器210中。在一些实施例中,移动通信模块251的至少部分功能模块可以与处理器210的至少部分模块被设置在同一个器件中。在本申请实施例中,移动通信模块251还可以用于与其它终端设备进行信息交互。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器270A,受话器270B等)输出声音信号,或通过显示屏294显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器210,与移动通信模块251或其他功能模块设置在同一个器件中。
无线通信模块252可以提供应用在手机200上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块252可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块252经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器210。无线通信模块252还可以从处理器210接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。本申请实施例中,无线通信模块252,用于在处理器210的控制下与其他终端设备之间传输数据。
另外,手机200可以通过音频模块270,扬声器270A,受话器270B,麦克风270C,耳机接口270D,以及应用处理器等实现音频功能。例如音乐播放,录音等。手机200可以接收按键290输入,产生与手机200的用户设置以及功能控制有关的键信号输入。手机200可以利用马达291产生振动提示(比如来电振动提示)。手机200中的指示器292可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。手机200中的SIM卡接口295用于连接SIM卡。SIM卡可以通过插入SIM卡接口295,或从SIM卡接口295拔出,实现和手机200的接触和分离。
应理解,在实际应用中,手机200可以包括比图16所示的更多或更少的部件,本申请实施例不作限定。图示手机200仅是一个范例,并且手机200可以具有比图中所示出的更多的 或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
终端设备的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明终端设备的软件结构。
图17是本申请实施例的终端设备的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图17所示,应用程序包可以包括电话、相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图17所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供终端设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,终端设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融 合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
本申请的实施例提供了一种卡证文本识别装置,包括:处理器以及用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令时实现上述方法。
本申请的实施例提供了一种非易失性计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
本申请的实施例提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述方法。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Electrically Programmable Read-Only-Memory,EPROM或闪存)、静态随机存取存储器(Static Random-Access Memory,SRAM)、便携式压缩盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、数字多功能盘(Digital Video Disc,DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
这里所描述的计算机可读程序指令或代码可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(Instruction Set Architecture,ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电 路,例如可编程逻辑电路、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或可编程逻辑阵列(Programmable Logic Array,PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本申请的多个实施例的装置、系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行相应的功能或动作的硬件(例如电路或ASIC(Application Specific Integrated Circuit,专用集成电路))来实现,或者可以用硬件和软件的组合,如固件等来实现。
尽管在此结合各实施例对本发明进行了描述,然而,在实施所要求保护的本发明过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其它变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其它单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (10)

  1. 一种卡证文本识别方法,其特征在于,所述方法用于终端设备,所述方法包括:
    获取卡证的第一待识别图像;
    对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;
    根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;
    对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;
    对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。
  2. 根据权利要求1所述的卡证文本识别方法,其特征在于,根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像,包括:
    根据所述第一文本区域中,长度最长的至少一个文本区域的平均倾斜角度,对所述第一待识别图像进行旋转矫正,得到所述第二待识别图像。
  3. 根据权利要求1或2所述的卡证文本识别方法,其特征在于,对所述第二待识别图像进行检测,得到至少一个第二文本区域,包括:
    确定所述第二文本区域的水平斜率;
    根据所述第二文本区域的水平斜率,对所述第二文本区域的左边缘和右边缘进行矫正,其中,矫正后,所述第二文本区域的左边缘和右边缘分别垂直于所述第二文本区域的上边缘和/或下边缘。
  4. 根据权利要求1-3中任意一项所述的卡证文本识别方法,其特征在于,对所述第二待识别图像进行检测,得到至少一个第二文本区域,包括:
    确定所述第二文本区域的水平斜率和高度;
    根据所述第二文本区域的水平斜率,将所述第二文本区域的上边缘和下边缘分别向两边进行延长,延长的距离根据所述高度确定。
  5. 根据权利要求1-4任意一项所述的卡证文本识别方法,其特征在于,对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本,包括:
    对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第二目标文本;
    确定所述第二目标文本的属性;
    根据所述第二目标文本的属性,对所述第二文本区域对应的连接时序分类CTC序列进行过滤,得到过滤后的CTC序列;
    根据过滤后的CTC序列中的类别以及对应置信度,得到所述第一目标文本。
  6. 根据权利要求1-5任意一项所述的卡证文本识别方法,其特征在于,所述方法还包括:
    根据训练样本对检测模型和识别模型进行训练,得到训练后的检测模型和训练后的识别模型;
    其中,所述训练样本包括正样本和负样本,所述正样本与所述负样本一一对应,所述正样本包括卡证图片样本,所述卡证图片样本中包括文本区域,所述负样本包括对文本区域进行覆盖后得到的卡证图片样本,
    其中,训练后的检测模型用于检测第一文本区域和第二文本区域,训练后的识别模型用于识别第一目标文本和第二目标文本。
  7. 一种卡证文本识别装置,其特征在于,所述装置用于终端设备,所述装置包括:
    获取模块,用于获取卡证的第一待识别图像;
    第一检测模块,用于对所述第一待识别图像进行检测,得到至少一个第一文本区域,所述第一文本区域表示所述第一待识别图像中的文本所在的区域;
    矫正模块,用于根据所述第一文本区域,对所述第一待识别图像进行旋转矫正,得到第二待识别图像;
    第二检测模块,用于对所述第二待识别图像进行检测,得到至少一个第二文本区域,所述第二文本区域表示所述第二待识别图像中的文本所在的区域;
    识别模块,用于对所述第二文本区域中的图像进行识别,得到第二文本区域对应的第一目标文本。
  8. 一种卡证文本识别装置,其特征在于,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令时实现权利要求1-6任意一项所述的方法。
  9. 一种非易失性计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1-6中任意一项所述的方法。
  10. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失 性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行权利要求1-6中任意一项所述的方法。
PCT/CN2022/077038 2021-02-25 2022-02-21 卡证文本识别方法、装置和存储介质 WO2022179471A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110213987.5A CN115050037A (zh) 2021-02-25 2021-02-25 卡证文本识别方法、装置和存储介质
CN202110213987.5 2021-02-25

Publications (1)

Publication Number Publication Date
WO2022179471A1 true WO2022179471A1 (zh) 2022-09-01

Family

ID=83048674

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077038 WO2022179471A1 (zh) 2021-02-25 2022-02-21 卡证文本识别方法、装置和存储介质

Country Status (2)

Country Link
CN (1) CN115050037A (zh)
WO (1) WO2022179471A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975466A (zh) * 2024-04-01 2024-05-03 山东浪潮科学研究院有限公司 一种基于版面分析的通用场景卡证识别系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120224765A1 (en) * 2011-03-04 2012-09-06 Qualcomm Incorporated Text region detection system and method
CN108694393A (zh) * 2018-05-30 2018-10-23 深圳市思迪信息技术股份有限公司 一种基于深度卷积的证件图像文本区域提取方法
CN110136069A (zh) * 2019-05-07 2019-08-16 语联网(武汉)信息技术有限公司 文本图像矫正方法、装置与电子设备
CN110647882A (zh) * 2019-09-20 2020-01-03 上海眼控科技股份有限公司 图像校正方法、装置、设备及存储介质
CN111444908A (zh) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 图像识别方法、装置、终端和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120224765A1 (en) * 2011-03-04 2012-09-06 Qualcomm Incorporated Text region detection system and method
CN108694393A (zh) * 2018-05-30 2018-10-23 深圳市思迪信息技术股份有限公司 一种基于深度卷积的证件图像文本区域提取方法
CN110136069A (zh) * 2019-05-07 2019-08-16 语联网(武汉)信息技术有限公司 文本图像矫正方法、装置与电子设备
CN110647882A (zh) * 2019-09-20 2020-01-03 上海眼控科技股份有限公司 图像校正方法、装置、设备及存储介质
CN111444908A (zh) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 图像识别方法、装置、终端和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117975466A (zh) * 2024-04-01 2024-05-03 山东浪潮科学研究院有限公司 一种基于版面分析的通用场景卡证识别系统

Also Published As

Publication number Publication date
CN115050037A (zh) 2022-09-13

Similar Documents

Publication Publication Date Title
US11748054B2 (en) Screen projection method and terminal device
CN111399789B (zh) 界面布局方法、装置及系统
KR102173123B1 (ko) 전자장치에서 이미지 내의 특정 객체를 인식하기 위한 방법 및 장치
US11914850B2 (en) User profile picture generation method and electronic device
CN114520868B (zh) 视频处理方法、装置及存储介质
WO2022042425A1 (zh) 视频数据处理方法、装置、计算机设备及存储介质
US10902277B2 (en) Multi-region detection for images
CN114782296B (zh) 图像融合方法、装置及存储介质
WO2022179471A1 (zh) 卡证文本识别方法、装置和存储介质
CN106506945A (zh) 一种控制方法及终端
KR102303206B1 (ko) 전자장치에서 이미지 내의 특정 객체를 인식하기 위한 방법 및 장치
WO2022105716A1 (zh) 基于分布式控制的相机控制方法及终端设备
CN114489429B (zh) 一种终端设备、长截屏方法和存储介质
CN116954409A (zh) 应用的显示方法、装置及存储介质
CN113273167B (zh) 数据处理设备、方法和存储介质
EP4273679A1 (en) Method and apparatus for executing control operation, storage medium, and control
CN113835582B (zh) 一种终端设备、信息显示方法和存储介质
WO2022105758A1 (zh) 道路识别方法以及装置
WO2022121751A1 (zh) 相机控制方法、装置和存储介质
CN109155080A (zh) 用于处理图像的方法、装置和记录介质
CN114564141A (zh) 文本提取方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22758835

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22758835

Country of ref document: EP

Kind code of ref document: A1