WO2021078133A1 - Systems and methods for image processing - Google Patents

Systems and methods for image processing Download PDF

Info

Publication number
WO2021078133A1
WO2021078133A1 PCT/CN2020/122362 CN2020122362W WO2021078133A1 WO 2021078133 A1 WO2021078133 A1 WO 2021078133A1 CN 2020122362 W CN2020122362 W CN 2020122362W WO 2021078133 A1 WO2021078133 A1 WO 2021078133A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature information
processing
recognition model
key points
Prior art date
Application number
PCT/CN2020/122362
Other languages
French (fr)
Inventor
Hao Wang
Tianming Zhang
Original Assignee
Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201911016421.2A external-priority patent/CN111860527A/en
Priority claimed from CN201911252892.3A external-priority patent/CN111860489A/en
Application filed by Beijing Didi Infinity Technology And Development Co., Ltd. filed Critical Beijing Didi Infinity Technology And Development Co., Ltd.
Publication of WO2021078133A1 publication Critical patent/WO2021078133A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/247Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids

Definitions

  • the present disclosure generally relates to image processing technology, and in particular, to systems and methods for processing an image based on a trained feature recognition model.
  • O2O services e.g. online to offline transportation services
  • a user needs to upload an image of a certificate (e.g., an identity (ID) card, a vehicle certificate, a driving license, a bank card) to the O2O service platform.
  • the O2O service platform may recognize and extract information (e.g., characters) in the image and determine whether to allow the user to register based on the extracted information.
  • information e.g., characters
  • An aspect of the present disclosure relates to a system for image processing.
  • the system may include at least one storage device including a set of instructions and at least one processor configured to communicate with the at least one storage device.
  • the at least one processor is configured to direct the system to perform operations.
  • the operations may include obtaining an image of an object; determining feature information of the object in the image based on the image by using a trained feature recognition model; and processing the image based on the feature information.
  • the feature information may include positions of at least three key points of the object in the image.
  • the processing the image based on the feature information may include obtain reference positions of the at least three key points of the object and processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object.
  • the at least three key points may include corner points of the object.
  • the processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object may include determining a transformation matrix based on the positions of at least three key points of the object in the image and the reference positions of the at least three key points of the object, and processing the image to obtain a processed image by transforming the image using the transformation matrix.
  • the feature information may include direction information of the object.
  • the direction information of the object may include a deflection direction of the object relative to a reference line associated with the image.
  • the processing the image based on the feature information may include processing the image based on the direction information of the object.
  • the feature information may include a confidence level.
  • the confidence level may indicate accuracy of the feature information of the object determined from the image.
  • the processing the image based on the feature information may include determining whether the confidence level of the feature information satisfies a condition, in response to determining that the confidence level of the feature information satisfies the condition, processing the image based on the feature information.
  • the processing the image may include determining a target region of the object from the image by using an object recognition model and processing the target region of the object in the image.
  • the determining the target region of the object from the image by using the object recognition model may include inputting the feature information of the object into the object recognition model and determining the target region of the object from the image based on an output of the object recognition model.
  • the trained feature recognition model may include a plurality of sub-models. Each of the plurality of sub-models may correspond to a reference object type.
  • the determining the feature information of the object in the image based on the image by using the trained feature recognition model may include determining an object type of the object; obtaining a sub-model from the plurality of sub-models based on the object type of the object and the reference object type corresponding to the sub-model; and determining the feature information of the object in the image by using the sub-model.
  • the trained feature recognition model may be obtained by a training process.
  • the training process may include obtaining a plurality of training samples each of which includes an image obtained by performing an angle transformation on a sample image and obtaining the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples.
  • a further aspect of the present disclosure relates to a method for image processing.
  • the method may be implemented on a computing device including at least one processor, at least one storage medium, and a communication platform connected to a network.
  • the method may include obtaining an image of an object; determining feature information of the object in the image based on the image by using a trained feature recognition model; and processing the image based on the feature information.
  • a still further aspect of the present disclosure relates to a non-transitory computer readable medium including executable instructions.
  • the executable instructions When the executable instructions are executed by at least one processor, the executable instructions may direct the at least one processor to perform a method.
  • the method may include obtaining an image of an object; determining feature information of the object in the image based on the image by using a trained feature recognition model; and processing the image based on the feature information.
  • FIG. 1 is a schematic diagram illustrating an exemplary image processing system according to some embodiments of the present disclosure
  • FIG. 2 is a flowchart illustrating an exemplary process for processing an image according to some embodiments of the present disclosure
  • FIG. 3 is a schematic diagram illustrating an exemplary image of an object and exemplary processed images of the object according to some embodiments of the present disclosure
  • FIG. 4 is a flowchart illustrating an exemplary process for obtaining a trained feature recognition model according to some embodiment of the present disclosure
  • FIG. 5 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure
  • FIG. 6 is a schematic diagram illustrating exemplary corner points according to some embodiments of the present disclosure.
  • FIG. 7 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure
  • FIG. 8 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure
  • FIG. 9 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure.
  • FIG. 10 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure
  • FIG. 11 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure.
  • FIG. 12 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure
  • FIG. 13 is a flowchart illustrating an exemplary process for recognizing information in a processed image according to some embodiments of the present disclosure
  • FIG. 14 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure.
  • FIG. 15 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure.
  • system, ” “engine, ” “unit, ” “module, ” and/or “block” used herein are one method to distinguish different components, elements, parts, sections, or assemblies of different levels in ascending order. However, the terms may be displaced by other expressions if they may achieve the same purpose.
  • module, ” “unit, ” or “block” used herein refer to logic embodied in hardware or firmware, or to a collection of software instructions.
  • a module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device.
  • a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts.
  • Software modules/units/blocks configured for execution on computing devices may be provided on a computer-readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) .
  • a computer-readable medium such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) .
  • Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the computing device.
  • Software instructions may be embedded in firmware, such as an EPROM.
  • hardware modules (or units or blocks) may be included in connected logic components, such as gates and flip-flops, and/or can be included in programmable
  • modules (or units or blocks) or computing device functionality described herein may be implemented as software modules (or units or blocks) , but may be represented in hardware or firmware.
  • the modules (or units or blocks) described herein refer to logical modules (or units or blocks) that may be combined with other modules (or units or blocks) or divided into sub-modules (or sub-units or sub-blocks) despite their physical organization or storage.
  • the flowcharts used in the present disclosure illustrate operations that systems implemented according to some embodiments of the present disclosure. It is to be expressly understood that the operations of the flowchart may be implemented not in order. Conversely, the operations may be implemented in inverted order, or simultaneously. Moreover, one or more other operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.
  • An aspect of the present disclosure relates to systems and methods for processing an image.
  • the systems and methods may obtain an image of an object (e.g., a certificate) .
  • the systems and methods may determine feature information (e.g., coordinate of four corner points in the object) of the object in the image by using a trained feature recognition model.
  • the systems and methods may process the image based on the feature information.
  • the systems and methods may determine a transformation matrix based on at least portion of the feature information.
  • the systems and methods may perform on a transformation on the image of the object based on the transformation matrix.
  • the feature information may be identified and extracted from the image accurately and effectively, and the image of the object may be processed (e.g., corrected) based on the feature information, which may improve the efficiency and accuracy of image processing (e.g., correction) , thereby improving the accuracy of effective information recognition based on the image.
  • FIG. 1 is a schematic diagram illustrating an exemplary image processing system according to some embodiments of the present disclosure.
  • the image processing system 100 may include a server 110, a user device 130, and a third party 140.
  • the server 110 may be a single server or a server group.
  • the server group may be centralized or distributed (e.g., server 110 may be a distributed system) .
  • the server 110 may be local or remote.
  • the server 110 may access information and/or data stored in the user device 130, the third party 140.
  • the server 110 may be directly connected to the user device 130, the third party 140 to access stored information and/or data.
  • the server 110 may be implemented on a cloud platform.
  • the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
  • the server 110 may include a processing device 112.
  • the processing device 112 may process information and/or data relating to image processing to perform one or more functions described in the present disclosure. For example, the processing device 112 may obtain an image 120 of an object. According to the image 120, the processing device 112 may determine feature information of the object in the image 120 by using a trained feature recognition model. Further, the processing device 112 may process the image 120 based on the feature information.
  • the processing device 112 may include one or more processing engines (e.g., single-core processing engine (s) or multi-core processor (s) ) .
  • the processing device 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction set computer (RISC) , a microprocessor, or the like, or any combination thereof.
  • CPU central processing unit
  • ASIC application-specific integrated circuit
  • ASIP application-specific instruction-set processor
  • GPU graphics processing unit
  • PPU physics processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • PLD programmable logic device
  • controller a microcontroller unit, a reduced instruction set computer (RISC) , a microprocessor, or the like, or any combination thereof.
  • RISC reduced instruction set
  • the server 110 may be unnecessary and all or part of the functions of the server 110 may be implemented by other components (e.g., the third party 140, the user device 130) of the image processing system 100.
  • the processing device 112 may be integrated into the third party 140 or the user device 130 and the functions (e.g., processing image) of the processing device 112 may be implemented by the third party 140 or the user device 130.
  • the user device 130 may be configured to transmit information and/or data to the server 110 and/or the third party 140 or receive information and/or data from the server 110 and/or the third party 140.
  • the user device 130 may transmit the image 120 to the server 110 for processing.
  • the user device 130 may receive a processed image 150 (also referred to as a processed image) from the server 110.
  • the user device 130 may process information and/or data received from the server 110 and/or the third party 140.
  • the user device 130 may provide a user interface via which a user may view information and/or input data and/or instructions to the image processing system 100.
  • the user may view the processed image 150 via the user interface.
  • the user device 130 may include a mobile device 130-1, a tablet computer 130-2, a laptop computer 130-3, a built-in device 130-4 in a vehicle, or the like, or any combination thereof.
  • the mobile device 130-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof.
  • the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof.
  • the wearable device may include a smart bracelet, a smart footgear, a smart glass, a smart helmet, a smart watch, a smart clothing, a smart backpack, a smart accessory, or the like, or any combination thereof.
  • the smart mobile device may include a smartphone, a personal digital assistance (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof.
  • the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, an augmented reality glass, an augmented reality patch, or the like, or any combination thereof.
  • the virtual reality device and/or the augmented reality device may include a Google Glass TM , an Oculus Rift TM , a Hololens TM , a Gear VR TM , etc.
  • a built-in device 130-4 in the vehicle may include an onboard computer, an onboard television, etc.
  • the third party 140 may be configured to transmit information and/or data to the server 110 and/or the user device 130 or receive information and/or data from the server 110 and/or the user device 130.
  • the third party 140 may obtain the image 120 from the user device 130 after receiving an authorization of the user and transmit the image 120 to the server 110 for processing.
  • the third party 140 may receive the processed image 150 from the server 110.
  • the third party 140 may include a search engine 140-1, a social media 140-2, a news media 140-3, a map website 140-4, etc.
  • the search engine 140-1 may include Google, Yahoo, Baidu, Microsoft, NHN, or the like, or any combination thereof.
  • the social media 140-2 may include Facebook, Youtube, WhatsApp, LinkedIn, Twitter, Weibo, WeChat, QQ, or the like, or any combination thereof.
  • the news media 140-3 may include Phoenix, Tencent News, Netease News, Sohu News, Associated Press, Cable News Network, or the like, or any combination thereof.
  • the map website 140-4 may include Baidu map, Google map, Gaode map, Sogou map, or the like, or any combination thereof.
  • the image processing system 100 may further include a network (not shown) .
  • the network may facilitate exchange of information and/or data.
  • one or more components (e.g., the server 110, the user device 130, the third party 140) of the image processing system 100 may transmit information and/or data to other component (s) of the image processing system 100 via the network.
  • the server 110 may obtain the image 120 from the user device 130 or the third party 140 via the network.
  • the user device 130 may receive the processed image 150 from the server 110 via the network.
  • one or more components e.g., the server 110, the user device 130, the third party 140
  • the server 110 may communicate information and/or data with one or more external resources such as an external database, etc. via the network.
  • the server 110 may obtain a trained feature recognition model from a database of a vendor or manufacture that provides and/or updates the trained feature recognition model.
  • the network may be any type of wired or wireless network, or any combination thereof.
  • the network may include a cable network, a wireline network, an optical fiber network, a telecommunications network, an intranet, an Internet, a local area network (LAN) , a wide area network (WAN) , a wireless local area network (WLAN) , a metropolitan area network (MAN) , a public telephone switched network (PSTN) , a Bluetooth network, a ZigBee network, a near field communication (NFC) network, or the like, or any combination thereof.
  • the network may include one or more network access points.
  • the network may include wired or wireless network access points such as base stations and/or internet exchange points, through which one or more components of the image processing system 100 may be connected to the network to exchange data and/or information.
  • the image processing system 100 may further include a storage device (not shown) .
  • the storage device may store data and/or instructions relating to the image processing.
  • the storage device may store data (e.g., the image 120) obtained from the user device 130 or the third party 140.
  • the storage device may store data and/or instructions that the server 110 may execute or use to perform exemplary methods described in the present disclosure.
  • the storage device may include a mass storage, a removable storage, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof.
  • Exemplary mass storage may include a magnetic disk, an optical disk, a solid-state drive, etc.
  • Exemplary removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc.
  • Exemplary volatile read-and-write memory may include a random access memory (RAM) .
  • Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc.
  • DRAM dynamic RAM
  • DDR SDRAM double date rate synchronous dynamic RAM
  • SRAM static RAM
  • T-RAM thyristor RAM
  • Z-RAM zero-capacitor RAM
  • Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (EPROM) , an electrically erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc.
  • the storage device may be implemented on a cloud platform.
  • the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
  • the storage device may be connected to the network to communicate with one or more components (e.g., the server 110, the user device 130, the third party 140) of the image processing system 100.
  • One or more components of the image processing system 100 may access the data and/or instructions stored in the storage device via the network.
  • the storage device may be directly connected to or communicate with one or more components (e.g., the server 110, the user device 130, the third party 140) of the image processing system 100.
  • the storage device may be part of the server 110.
  • the image processing system 100 may be or be integrated into a system for online to offline services.
  • the image processing system 100 may be integrated into an online transportation service platform for transportation services such as taxi hailing, chauffeur services, delivery vehicles, express car, carpool, bus service, driver hiring, shuttle services, etc.
  • the online transportation service platform may process images received from a user of the online transportation service platform.
  • the image processing system 100 may be integrated into a face scan payment service platform for payment services such as face scan payment in a supermarket, a hospital, a bank, etc.
  • the face scan payment service platform may process images received from a user of the face scan payment service platform.
  • the image processing system 100 may include an obtaining module, a determination module, a processing module, and a training module.
  • the obtaining module may be configured to obtain an image (e.g., the image 120) of an object. More descriptions regarding the obtaining of the image of the object may be found elsewhere in the present disclosure (e.g., operation 510 in FIG. 2 and the description thereof) .
  • the determination module may be configured to determine feature information of the object in the image based on the image by using a trained feature recognition model. More descriptions regarding the determining of the feature information of the object in the image may be found elsewhere in the present disclosure (e.g., operation 520 in FIG. 2 and the description thereof) .
  • the processing module may be configured to process the image based on the feature information. More descriptions regarding the processing of the image may be found elsewhere in the present disclosure (e.g., operation 530 in FIG. 3 and the description thereof) .
  • the training module may be configured to obtain a trained feature recognition model.
  • the training module may obtain a plurality of training samples. Further, the training module may obtain the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples. More descriptions regarding the obtaining of the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 4 and the description thereof) .
  • the training module may be implemented on a separate device (e.g., a processing device independent from the processing device 112) .
  • the training module may be unnecessary and the trained feature recognition model may be obtained from a storage device (e.g., the storage device described above, an external database) disclosed elsewhere in the present disclosure.
  • a storage device e.g., the storage device described above, an external database
  • FIG. 2 is a flowchart illustrating an exemplary process for processing an image according to some embodiments of the present disclosure.
  • process 200 may be executed by the image processing system 100.
  • the process 200 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., an image correction device 1100 illustrated in FIG. 11, an image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 200 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 200 illustrated in FIG. 2 and described below is not intended to be limiting.
  • the processing device 112 may obtain an image (e.g., the image 120) (also referred to as an image to be processed) of an object.
  • the object may include a biological object and/or a non-biological object.
  • the biological object may include people, an animal, a plant, or the like, or any combination thereof.
  • the non-biological object may include a natural product (e.g., a stone, a building) , an artifact, or the like, or any combination thereof.
  • the artifact may include a certificate, a photo, a drawing, or a document with text or patterns, or the like, or any combination thereof.
  • the certificate may include an identity (ID) card, a vehicle certificate, a driving license, a bank card (e.g., a credit card, a debit card) , an employee’s card, etc.
  • the user device 130 may acquire (or capture) the image of the object and send the image of the object to the processing device 112.
  • the third party 140 may send a request for acquiring the image to the user device 130.
  • the user device 130 may send the image to the third party 140 in response to receive an authorization or confirmation from a user of the user device 130.
  • the third party 140 may send the image to the processing device 112.
  • the image may be a frame in a video acquired by the user device 130 or from the third party 140.
  • the processing device 112 may obtain and/or determine the image (i.e., the frame) from the video. For example, the processing device 112 may perform a framing operation on the video to obtain a plurality of frames in the video. The processing device 112 may designate one of the plurality of frames as the image.
  • the image of the object may be acquired by the user device 130 or from the third party 140 and stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 may obtain the image of the object from the storage device via a network disclosed elsewhere in the present disclosure.
  • the processing device 112 may determine feature information of the object in the image based on the image by using a trained feature recognition model (also referred to as a model or a preset correction model) .
  • a trained feature recognition model also referred to as a model or a preset correction model
  • the feature information of the object may include positions of at least three key points of the object in the image, direction information of the object, a confidence level, serial numbers of the at least three key points of the object, or the like, or any combination thereof.
  • a key point may refer to a point representing a portion of the object and may be used to determine or define a position of the object in the image.
  • the at least three key points may include corner points of the object (also referred to as global corner points) , points inside a region in the image representing at least a portion of the object, points at an edge of the object represented in the image, or the like, or any combination thereof.
  • a corner point of the object may refer to an intersection of at least two edge lines of the object in the image.
  • the points inside the region representing at least a portion of the object may include a center point (e.g., a geometric center point) in the region and points at some specific positions.
  • the center points may include a point in the region representing a geometric center of the object (also referred to as a global center point) , a point in the region representing a geometric center of a pattern in the object (also referred to as a local center point) , etc.
  • the points at some specific positions inside the region may include corner points (also referred to as local corner points) (e.g., an upper left corner, an upper right corner, a lower left corner, a lower right corner) of a pattern or a character in the object.
  • the at least three key points may not be on a straight line.
  • the at least three key points may be any three of four vertices of the rectangular ID card.
  • the at least three key points may be a center point and any two vertices of the rectangular ID card that are not on a diagonal line of the rectangular ID card.
  • the positions of the at least three key points of the object may include coordinates of the at least three key points in the image of the object.
  • the direction information of the object may include a deflection direction of the object.
  • the deflection direction may be denoted by a deflection angle of the object.
  • the deflection angle may refer to an angle between a central axis of the object and a reference line associated with the image.
  • the reference line may include a boundary of the image, a center line of the image, an axis of an image coordinate system of the image, etc.
  • the deflection angle may be an angle between a central axis of the object in a vertical direction and a vertical axis of an image coordinate system of the image of the object.
  • the deflection angle may be an angle between a central axis of the object in a horizontal direction and a horizontal axis of the image coordinate system of the image of the object.
  • the deflection direction of the object may include a clockwise and a counterclockwise.
  • the confidence level may indicate the accuracy of the feature information of the object.
  • a confidence level of a corner point may indicate the accuracy of a coordinate of the corner point.
  • a range value of the confidence level may be 0-1. The closer the confidence level of a corner point is to 1, the more accurate of the coordinates of the corner point determined from the image may be. The closer the confidence level of a corner point is to 0, the less accurate of the coordinates of the corner point determined from the image may be.
  • a serial number of a key point may refer to a number used to distinguish the key point from other key points.
  • the all key points of the object may be numbered in a certain order. For example, when the object is face up (e.g., texts in the object are vertically upward) , all key points of the object may be numbered clockwise from an upper left corner of the object. As another example, when the object is upside down, the all key points of the object may be numbered clockwise from a lower left corner of the object. More descriptions regarding the feature information of the object may be found elsewhere in the present disclosure (e.g., FIG. 3 and the description thereof) .
  • the trained feature recognition model may be trained by the processing device 112 or a processing device different from the processing device 112 and stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 may retrieve the trained feature recognition model from the storage device.
  • the trained feature recognition model may include a machine learning model, for example, a deep learning model.
  • the deep learning model may include a convolutional neural network (CNN) model, a recurrent neural network (RNN) model, a deep belief network (DBN) model, a stacked auto-encoder network model, or the like, or any combination thereof.
  • the trained feature recognition model may be trained based on a plurality of training samples. More descriptions regarding the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 4 and the description thereof) .
  • the processing device 112 may input the image of the object into the trained feature recognition model and determine the feature information of the object in the image based on an output of the trained feature recognition model.
  • the trained feature recognition model may include a plurality of sub-models. Each of the plurality of sub-models may correspond to a reference object type.
  • the reference object type may be default settings (e.g., a card, a document, etc. ) of the image processing system 100 or may be adjustable under different situations.
  • the processing device 112 may determine an object type of the object. Merely by way of example, the processing device 112 may determine the object type of the object based on a user input. As another example, the processing device 112 may determine the object type of the object by using a trained type classification model. The processing device 112 may obtain a sub-model from the plurality of sub-models based on the object type of the object and the reference object type corresponding to the sub-model.
  • the processing device 112 may obtain a sub-model corresponding to the card from the plurality of sub-models. Further, the processing device 112 may determine the feature information of the object in the image by using the sub-model. For example, the processing device 112 may input the image of the object into the sub-model and determine the feature information of the object in the image based on an output of the sub-model. In this embodiment, feature information of different objects with different object types may be determined based on different sub-models, which may improve the efficiency and accuracy of the image processing.
  • the processing device 112 may process the image based on the feature information.
  • the processing device 112 may obtain reference positions of the at least three key points of the object.
  • Reference positions of key points (e.g., corner points) of the object may refer to positions where information in the image is easy to be recognized when the at least three key points of the object in the image are at the reference positions.
  • positions of corner points (e.g., four vertices) of the rectangular ID card may be at reference positions of the corner points (e.g., four vertices) the rectangular ID card.
  • Different object types (e.g., a card, a document) of objects may have different reference positions.
  • the reference positions of key points of various objects may be set by the image processing system 100 and stored in a storage device disclosed elsewhere in the present disclosure.
  • the image processing system 100 may set the reference positions according to the object types of various objects in an image coordinate system defined by the system 100. Different object types of objects may correspond to different sizes and/or shapes.
  • the image processing system 100 may set the reference positions of key points (e.g., corner points) of an object with a type corresponding to the sizes and/or shapes in the image coordinate system.
  • the processing device 112 may determine the reference positions of the at least three key points from a reference position database in the storage device.
  • the processing device 112 may determine an object type of the object. According to the object type of the object, the processing device 112 may retrieve reference positions of key points of the object from the reference position database. Further, the processing device 112 may select at least three key points from the key points of the object and obtain the reference positions corresponding to the at least three key points.
  • the feature information of the object may include the positions of at least three key points of the object.
  • the processing device 112 may process the image based on the position of at least one of the at least three key points of the object and the preset reference position of the at least one of the at least three key points of the object. Specifically, the processing device 112 may determine whether the position of the at least one of the at least three key points and the reference position of the at least one of the at least three key points coincide. If the position of the at least one of at least three key points and the reference position of the at least one of the at least three key points do not coincide, the processing device 112 may process the image.
  • the processing device 112 may determine a transformation matrix (also referred to as a perspective correction matrix or a perspective transformation matrix) based on the positions of the at least three key points and the preset reference positions of the at least three key points.
  • the processing device 112 may process the image to obtain a processed image by transforming the image using the transformation matrix. More descriptions regarding the processing of the image using the transformation matrix may be found elsewhere in the present disclosure (e.g., FIG. 5, FIG. 12, and the description thereof) .
  • the feature information of the object may include the serial numbers of the at least three key points of the object.
  • the processing device 112 may process the image based on the serial numbers of the at least three key points of the object. More descriptions regarding the processing of the image based on the serial numbers of the at least three key points of the object may be found elsewhere in the present disclosure (e.g., FIG. 3 and the description thereof) .
  • the feature information of the object may include the direction information of the object.
  • the image of the object is just in a wrong direction.
  • a central axis of the object in the image is mot horizontal or vertical.
  • the object in the image is upside down.
  • the processing device 112 may process the image based on the direction information of the object.
  • a plurality of transformation matrixes corresponding to different direction information e.g., the deflection angle and deflection direction
  • the processing device 112 may retrieve a transformation matrix corresponding to the direction information of the object from the plurality of transformation matrixes in the storage device. Further, the processing device 112 may process the image based on the transformation matrix corresponding to the direction information of the object.
  • the feature information of the object may include a confidence level.
  • the processing device 112 may process the image based on the confidence level of the feature information. Specifically, the processing device 112 may determine whether the confidence level of the feature information satisfies a condition. For example, the processing device 112 may determine whether the confidence level of the feature information is larger than a confidence level threshold.
  • the confidence level threshold may be a default setting (e.g., 0.5) of the image processing system 100 or may be adjustable under different situations. When the confidence level of the feature information is larger than the confidence level threshold, the processing device 112 may determine that the confidence level of the feature information satisfies the condition.
  • the processing device 112 may process the image based on the feature information.
  • the processing device 112 may determine that the confidence level of the feature information does not satisfy the condition.
  • the processing device 112 may return an instruction indicating that the image can not be processed or roughly process the image. For example, the processing device 112 may move or rotate the image at any angle and output the roughly processed image. Further, the roughly processed image may be further processed by the image processing system 100 according to operations 210-230.
  • the processing device 112 may determine a target region (also referred to as a text region or a field region) of the object in the processed image.
  • the target region of the object may refer to a region where a picture and/or at least one character (e.g., texts, symbols) in the object is located. Further, the processing device 112 may determine the target region of the object in the processed image. More descriptions regarding the determination of the target region in the processed image may be found elsewhere in the present disclosure (e.g., FIG. 7, FIG. 13, and the description thereof) .
  • the processing device 112 may determine a target region of the object in the image and process the target region by performing a similar manner as the processing of the image as described above.
  • the processing device 112 may determine the target region of the object in the image by using an object recognition model (also referred to as a region segmentation model) .
  • the processing device 112 may input the feature information of the object into the region segmentation model and determine the target region of the object in the image based on an output of the object recognition model. Further, the processing device 112 may recognize effective information from the processed target region of the object in the image, which may avoid processing the entire image, thereby improving the efficiency and accuracy of the image processing.
  • one or more other optional operations may be added elsewhere in the process 200.
  • the processing device 112 may store information and/or data (e.g., the image of the object, the feature information, the trained feature recognition model, the processed image of the object) associated with the image processing in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 may transmit the processed image of the object to the user device 130.
  • the processing device 112 may further perform an operation including obtaining the trained feature recognition model.
  • FIG. 3 is a schematic diagram illustrating an exemplary image of an object and exemplary processed images of the object according to some embodiments of the present disclosure.
  • the feature information of the object may include positions of at least three key points of the object and the serial numbers of the at least three key points of the object.
  • the image 310 may be an image of a vehicle certificate.
  • the key points of the object in the image 310 may be four vertices 311, 312, 313, and 314 of the vehicle certificate.
  • the positions of the key points may be coordinates of the four vertices 311, 312, 313, and 314 in the image 310.
  • the serial numbers of the key points 311, 312, 313, and 314 may be 1, 2, 3, and 4.
  • reference positions of all key points of the object may be numbered.
  • a manner of numbering the reference positions may be in a same manner as a manner of numbering key points as described in connection with FIG. 2.
  • the reference serial numbers of the reference positions may be previously determined and stored in a storage device disclosed elsewhere in the present disclosure or an external database.
  • the processing device 112 may retrieve the reference serial numbers of the reference positions from the storage device based on the serial numbers of the key points. Further, according to the reference serial numbers of the reference positions, the processing device 112 may obtain the reference positions. For example, according to the serial numbers 1, 2, 3, and 4 of the key points 311, 312, 313, and 314, the processing device 112 may obtain reference serial numbers 1’, 2’, 3’, and 4’.
  • the processing device 112 may obtain reference positions A, B, C, and D of the key points 311, 312, 313, and 314 based on the reference serial numbers 1’, 2’, 3’, and 4’.
  • the processing device 112 may determine a transformation matrix based on the coordinates of the key points 311, 312, 313, and 314 and coordinates of the reference positions A, B, C, and D. According to the transformation matrix, the processing device 112 may obtain a processed image 320.
  • effective information e.g., the license plate number in the vehicle certificate
  • FIG. 4 is a flowchart illustrating an exemplary process for obtaining a trained feature recognition model according to some embodiment of the present disclosure.
  • process 400 may be executed by the image processing system 100.
  • the process 400 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device disclosed elsewhere in the present disclosure) .
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the process 400 may be performed by a computing device of a system of a vendor that provides and/or maintains such a trained feature recognition model.
  • the operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 400 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 400 illustrated in FIG. 4 and described below is not intended to be limiting.
  • the processing device 112 may obtain a plurality of training samples (also referred to as sample images) .
  • Each of the plurality of training samples may be an image of an object, for example, an identity (ID) card, a vehicle certificate, a driving license, a bank card (e.g., a credit card, a debit card) , an employee’s card, etc.
  • each of the plurality of training samples may include at least one whole object.
  • a training sample may be an image of two whole objects.
  • At least one of the plurality of training samples may be previously generated and stored in a storage device disclosed elsewhere in the present disclosure or an external database.
  • the processing device 112 may retrieve the training samples directly from the storage device. Additionally or alternatively, at least one of the plurality of training samples may be acquired by the user device 130 or from the third party 140.
  • each of the plurality of training samples may include an image obtained by performing an angle transformation on a sample image (also referred to as a training image) .
  • the angle transformation may be performed by changing the direction of the object in the sample image.
  • a training sample may be obtained by rotating the object in the sample image with an arbitrary angle.
  • each of the plurality of training samples may include an image obtained by performing a translation operation on a sample image.
  • a training sample may be obtained by translating the object in the sample image with any distance along one or more directions.
  • At least one of the plurality of training samples may be obtained by photographing the object with different placement manners or from different directions, which avoids the image distortion caused by the angle transformation or the translation operation, thereby improving the accuracy of the training samples.
  • feature information of the object in each training sample may be marked or annotated.
  • the feature information of the object may include positions of at least three key points of the object, direction information of the object, a confidence level, serial numbers of the at least three key points of the object, or the like, or any combination thereof.
  • the feature information of the object in each training sample may be marked manually. It is understood that this embodiment described herein is not intended to be limiting. In the present disclosure, the feature information of the object in each training sample may be marked by other improved manners, for example, using a marking model.
  • the processing device 112 may obtain the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples.
  • the preliminary machine learning model may a deep learning model.
  • the deep learning model may include a CNN model, an RNN model, a DBN model, a stacked auto-encoder network model, or the like, or any combination thereof.
  • the preliminary machine learning model may include at least one model parameter with a preliminary value. The preliminary value of the at least one model parameter may be a default setting of the image processing system 100 or may be adjustable under different situations.
  • the processing device 112 may train the preliminary machine learning model based on one or more gradient descent algorithms.
  • Exemplary gradient descent algorithms may include an Adam optimization algorithm, a stochastic gradient descent (SGD) + Momentum optimization algorithm, a Nesterov accelerated gradient (NAG) algorithm, an adaptive gradient (Adagrad) algorithm, an adaptive delta (Adadelta) algorithm, a root mean square propagation (RMSprop) algorithm, an AdaMax algorithm, a Nadam (Nesterov-accelerated adaptive moment estimation) algorithm, an AMSGrad (Adam+SGD) algorithm, or the like, or any combination thereof.
  • SGD stochastic gradient descent
  • NAG Nesterov accelerated gradient
  • Adagrad adaptive gradient
  • Adadelta adaptive delta
  • RMSprop root mean square propagation
  • AdaMax AdaMax algorithm
  • Nadam Nesterov-accelerated adaptive moment estimation
  • AMSGrad AMSGrad
  • the processing device 112 may train the preliminary machine learning model iteratively until a termination condition is satisfied. In response to that the termination condition is satisfied, the trained feature recognition model may be obtained.
  • the termination condition may relate to a value of a loss function. For example, the termination condition may be satisfied if the value of the loss function is minimal or smaller than a threshold. As another example, the termination condition may be satisfied if the value of the loss function converges. In some embodiments, “convergence” may refer to that the variation of the values of the loss function in two or more consecutive iterations is equal to or smaller than a vibration threshold.
  • “convergence” may refer to that a difference between the value of the loss function and a target value is equal to or smaller than a difference threshold.
  • the termination condition may be satisfied when a specified count of iterations have been performed in the training process. In some embodiments, when feature information of the object output by the preliminary machine learning model is the same as the marked feature information in each of the plurality of training samples, the termination condition may be satisfied.
  • one or more operations may be added or omitted.
  • the processing device 112 may update the trained feature recognition model periodically or irregularly based on one or more newly-generated training samples (e.g., new sample images) .
  • the processing device 112 may divide the plurality of training samples into a training set and a test set. The training set may be used to train the model and the test set may be used to determine whether the training process has been completed.
  • FIG. 5 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure.
  • process 500 may be executed by the image processing system 100.
  • the process 500 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative.
  • the process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 500 illustrated in FIG. 5 and described below is not intended to be limiting.
  • an image to be processed and coordinates of corner points in the image to be processed may be obtained. Operation 510 may be performed by the obtaining model and the determination module described in FIG. 1, an information obtaining unit 1110 illustrated in FIG. 11, and an obtaining unit 1410 illustrated in FIG. 14.
  • a transformation matrix may be calculated based on the coordinates of the corner points in the image and coordinates of target corner points (also referred to as reference positions of the corner points) .
  • Operation 520 may be performed by the processing module described in FIG. 1, a calculation unit 1120 illustrated in FIG. 11, and a calculation unit 1420 illustrated in FIG. 14. Points at the reference positions may be referred to as target corner points. More descriptions regarding the reference positions may be found elsewhere in the present disclosure (e.g., FIG. 2 and the description thereof) .
  • a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix. Operation 530 may be performed by the processing module described in FIG. 1, a correction unit 1130 illustrated in FIG. 11, and a perspective transformation unit 1430 illustrated in FIG. 14.
  • the image to be processed may be obtained based on the methods described in the present disclosure (e.g., FIG. 1 and the descriptions thereof) .
  • the image to be processed may be a non-horizontal and tilt image (e.g., the image 310 illustrated in FIG. 3) .
  • Coordinates of four corner points in the image to be processed may be obtained by using a trained feature recognition model (e.g., a deep learning model) .
  • the transformation matrix may be obtained according to the coordinates of four corner points and the coordinates of the target corner points.
  • the coordinates of the target corner points may be coordinates of the corner points in an expected horizontal image. For example, as shown in FIG.
  • the four corner points in the image to be processed may be K, L, M, and N in an image 610 to be processed and the target corner points may be K’, L’, M’, and N’ in an expected horizontal image 620.
  • Coordinates of the corner point A may be denoted as [u, v, w] and coordinates of the target corner point A' may be denoted as [x′, y′, w′] .
  • a relationship between the coordinates of the corner point A and the coordinates of the target corner point A' may be denoted as formula (1) below:
  • the transformation matrix may be determined according to the coordinates of four corner points and the coordinates of the target corner points. Further, the processed image in a horizontal state may be obtained by mapping a value of each pixel in the image to be processed to a value of a pixel corresponding to the pixel in the processed image in the horizontal state using the transformation matrix.
  • the embodiments of the present disclosure may solve the problems of the inaccuracy in calculating a rotation angle of a certificate according to a tilt direction of a text in the certificate and the inability to correct a perspective image, thereby improving the accuracy of the image correction.
  • the image to be processed may include an image of a certificate or a business card that needs to be processed, an image taken directly by a user, an image extracted from images taken by the user, etc.
  • a user when performing an identity verification, a user needs to upload an image of his/her certificate.
  • the user may point a camera directly at the certificate to take an image.
  • the obtained image may only contain the certificate and no other objects. In such cases, the obtained image may be the image to be processed.
  • the user may not point the camera directly at the certificate to take an image.
  • the obtained image may contain other objects (e.g., a desktop where the certificate is placed) except the certificate. In such cases, an image region only including the certificate may be extracted from the obtained image as the image to be processed.
  • the target corner points of the processed image may be defined as an upper left corner point, an upper right corner point, a lower right corner point, and a lower left corner point of the processed image.
  • coordinates of four corner points in the image to be processed and serial numbers of the four corner points may be obtained by using the trained feature recognition model.
  • the serial numbers of the four corner points in the image to be processed may be 1, 2, 3, and 4, respectively.
  • the corner point 1 of the image to be processed may correspond to the upper left corner point of the processed image
  • the corner point 2 may correspond to the upper right corner point of the processed image
  • the corner point 3 may correspond to the lower right corner point of the processed image
  • the corner point 4 may correspond to the lower left corner point of the processed image, which may ensure that the processed image is horizontal and proper.
  • FIG. 7 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure.
  • process 700 may be executed by the image processing system 100.
  • the process 700 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 700 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 700 illustrated in FIG. 7 and described below is not intended to be limiting.
  • an image to be processed and coordinates of corner points in the image to be processed may be obtained. Operation 710 may be performed by the obtaining model and the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
  • a transformation matrix may be calculated based on the coordinates of the corner points and coordinates of target corner points. Operation 720 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
  • a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix. Operation 730 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14.
  • a text region may be obtained by detecting the processed image. Operation 740 may be performed by the processing module described in FIG. 1, a detection unit 1140 illustrated in FIG. 11, and a detection unit 1460 illustrated in FIG. 14.
  • text information may be obtained by performing an optical character recognition (OCR) on the text region.
  • OCR optical character recognition
  • Operation 750 may be performed by the processing module described in FIG. 1, an information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • the text region may be obtained by detecting the processed image.
  • the text information may be obtained by performing the optical character recognition on the text region, which may realize the rapid and accurate extraction of the text information in the processed image and avoid the problems of low efficiency and high error rate caused by manual input.
  • the process described in FIG. 7 may be used to identify a certificate or a business card.
  • the certificate may include an identity (ID) card, a vehicle certificate, a driving license, a bank card, etc.
  • ID identity
  • a user may take and upload an image of a certificate.
  • the image may be a tilt, non-horizontal image, which means that the user’s photography level is not required.
  • Coordinates of four corner points of the certificate in the image and a confidence level of each of the coordinates of the four corner points may be obtained.
  • the transformation matrix may be calculated based on the coordinates of the corner points and the coordinates of the target corner points.
  • a horizontal certificate image (i.e., the processed image) may be obtained by transforming each pixel of the image of the certificate by using the transformation matrix. Further, the text in the horizontal certificate image may be quickly and accurately extracted by the OCR, which may avoid the problems of low efficiency and high error rate caused by manual input.
  • FIG. 8 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure.
  • process 800 may be executed by the image processing system 100.
  • the process 800 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative.
  • the process 800 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 800 illustrated in FIG. 8 and described below is not intended to be limiting.
  • an image to be processed, coordinates of corner points in the image to be processed, and confidence levels of the coordinates of the corner points may be obtained. Operation 810 may be performed by the obtaining model and the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
  • a confidence level of a coordinate of a corner point may indicate the accuracy of the coordinate of the corner point determined from the image to be processed.
  • a transformation matrix may be determined based on the coordinates of the corner points and coordinates of target corner points. Operation 820 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
  • a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix. Operation 830 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14.
  • a text region may be obtained by detecting the processed image. Operation 840 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • text information may be obtained by performing an optical character recognition on the text region. Operation 850 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • the confidence levels of the coordinates of the corner points may be obtained.
  • the transformation matrix may be calculated, which may avoid inaccurate image correction caused by inaccurate coordinates of the corner points, thereby improving the accuracy of the image correction.
  • FIG. 9 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure.
  • process 900 may be executed by the image processing system 100.
  • the process 900 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative.
  • the process 900 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 900 illustrated in FIG. 9 and described below is not intended to be limiting.
  • one or more training images may be obtained. Operation 901 may be performed by the training module described in FIG. 1, a model obtaining unit 1160 illustrated in FIG. 11, a training 1440 illustrated in FIG. 14.
  • Operation 902 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14.
  • a plurality of sample images may be obtained by rotating the training images at any angle. Operation 903 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14.
  • a trained feature recognition model may be generated based on the plurality of sample images and coordinates of corner points in each of the plurality of sample images. Operation 904 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14.
  • an image to be processed may be obtained. Operation 905 may be performed by the obtaining model described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
  • coordinates of corner points in the image to be processed and confidence levels of the coordinates of the corner points may be obtained by using the trained feature recognition model. Operation 906 may be performed by the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
  • a transformation matrix may be determined based on the coordinates of the corner points and coordinates of target corner points. Operation 907 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
  • a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix. Operation 908 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14.
  • a text region may be obtained by detecting the processed image. Operation 909 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • text information may be obtained by performing an optical character recognition on the text region. Operation 910 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • the trained feature recognition model used to obtain the coordinates of the corner points in the image to be processed and confidence levels of the coordinates of the corner points may be established.
  • the coordinates of four corner points of the training image may be manually labeled.
  • a regression training may be performed on a deep learning model using the coordinates of four corner points of the training image to obtain the trained feature recognition model.
  • the training images may be randomly rotated at any angle during the training process.
  • the coordinates of the corner points in the image to be processed and confidence levels of the coordinates of the corner points may be obtained by using the trained feature recognition model directly, which may improve the efficiency of the image correction.
  • FIG. 10 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure.
  • process 1000 may be executed by the image processing system 100.
  • the process 1000 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 1000 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 1000 illustrated in FIG. 10 and described below is not intended to be limiting.
  • a trained feature recognition model may be obtained.
  • One or more training images may be obtained. Operation 1010 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14. Coordinates of corner points of each of the one or more training images may be labeled.
  • a plurality of sample images may be obtained by rotating the training images at any angle.
  • a trained feature recognition model may be generated based on the plurality of sample images and coordinates of corner points in each of the plurality of sample images.
  • an image may be sampled. Operation 1020 may be performed by the obtaining model described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
  • the image to be processed may be obtained by photographing a certificate.
  • corner points may be detected from the image. Operation 1030 may be performed by the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
  • the image to be processed may be input into the trained feature recognition model and the output of the trained feature recognition model may be confidence levels of corner points in the image to be processed and coordinates of the corner points.
  • a perspective transformation may be performed on the image to obtain a horizontal image.
  • Operation 1040 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14.
  • the perspective transformation may be performed on the image using a transformation matrix.
  • the transformation matrix may be calculated based on the coordinates of the corner points.
  • the transformation matrix may be used to perform a perspective transformation on an original non-horizontal image (i.e., the image to be processed) to obtain a horizontal image (i.e., the processed image) .
  • a text region may be detected from the horizontal image. Operation 1050 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • an optical character recognition may be performed on the text region. Operation 1060 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14. The text region may be recognized by the optical character recognition.
  • a rotation angle of the image to be processed is not calculated.
  • the four corner points in the image to be processed may be located by using the deep learning model. According to coordinates of the four corner points, a transformation matrix may be calculated. The transformation matrix may be used to perform the perspective transformation on the original tilt image (i.e., the image to be processed) . After the perspective transformation is performed, a horizontal image (i.e., the processed image) may be obtained.
  • FIG. 11 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure.
  • the image correction device 1100 may include an information obtaining unit 1110, a calculation unit 1120, a correction unit 1130, a detection unit 1140, an information recognition unit 1150, and a model obtaining unit 1160.
  • the information obtaining unit 1110 may be configured to obtain an image to be processed and coordinates of corner points in the image to be processed. More descriptions regarding the obtaining of the image to be processed and the coordinates of the corner points in the image to be processed may be found elsewhere in the present disclosure (e.g., FIG. 5 and the description thereof) .
  • the calculation unit 1120 may be configured to calculate a transformation matrix based on the coordinates of the corner points in the image and coordinates of target corner points. More descriptions regarding the calculation of the transformation matrix may be found elsewhere in the present disclosure (e.g., FIG. 5 and the description thereof) .
  • the correction unit 1130 may be configured to obtain a processed image by performing a perspective transformation on the image to be processed using the transformation matrix. More descriptions regarding the obtaining of the processed image may be found elsewhere in the present disclosure (e.g., FIG. 5 and the description thereof) .
  • the detection unit 1140 may be configured to obtain a text region by detecting the processed image. More descriptions regarding the obtaining of the text region may be found elsewhere in the present disclosure (e.g., FIG. 7 and the description thereof) .
  • the information recognition unit 1150 may be configured to obtain text information by performing an optical character recognition (OCR) on the text region. More descriptions regarding the obtaining of the text information may be found elsewhere in the present disclosure (e.g., FIG. 7 and the description thereof) .
  • OCR optical character recognition
  • the model obtaining unit 1160 may be configured to obtain a trained feature recognition model.
  • the model obtaining unit 1160 may be configured to obtain one or more training images and label coordinates of corner points of each of the one or more training images.
  • the model obtaining unit 1160 may be configured to obtain a plurality of sample images by rotating the training images at any angle. Further, the model obtaining unit 1160 may be configured to generate the trained feature recognition model based on the plurality of sample images and coordinates of corner points in each of the plurality of sample images. More descriptions regarding the obtaining of the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 9 and the description thereof) .
  • the units in the image correction device 1100 may be connected to or communicate with each other via a wired connection or a wireless connection.
  • the wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof.
  • the wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof.
  • LAN Local Area Network
  • WAN Wide Area Network
  • NFC Near Field Communication
  • two or more of the units may be combined as a single unit, and any one of the units may be divided into two or more units.
  • one or more of the units may be omitted.
  • the model obtaining unit 1160 may be omitted.
  • the detection unit 1140, the information recognition unit 1150, and the model obtaining unit 1160 may be omitted.
  • FIG. 12 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure.
  • process 1200 may be executed by the image processing system 100.
  • the process 1200 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative.
  • the process 1200 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 1200 illustrated in FIG. 12 and described below is not intended to be limiting.
  • coordinates and serial numbers of corner points of a certificate in an image may be obtained by using a trained feature recognition model. Operation 1210 may be performed by the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
  • information obtained by using the trained feature recognition model from the image may also include at least one of confidence levels of the corner points, a frame of the certificate, and a rotation direction of the certificate, etc.
  • the trained feature recognition model may be obtained based on a plurality of training samples by a training process.
  • the plurality of training samples may include a plurality of sample images. For each of the plurality of sample images, coordinates and serial numbers of corner points of a certificate in the sample image may be marked.
  • the training process of the trained feature recognition model may be performed in a similar or same manner as process 400 as described in connection with FIG. 4, and the descriptions thereof are not repeated here.
  • a shape of the certificate in the image is a polygon, for example, a quadrilateral (e.g., a rectangular) , a pentagon, a hexagonal
  • the corner points of the certificate may be vertices of the polygon, for example, four vertices of the quadrilateral, five vertices of the pentagon, six vertices of the hexagon.
  • coordinates of the corner point of the certificate may be coordinates of a vertex of the polygon in the image.
  • the image may be taken, using an imaging device (e.g., a camera in the user device 130) , by a user in real-time.
  • the image may be uploaded by the user from an album in the user device 130. It is understood that these embodiments described herein are not intended to be limiting. In the present disclosure, a manner of obtaining the image may be determined according to user needs.
  • the image may be an image that includes at least one complete certificate in the image. If the image does not include a complete certificate, an instruction indicating a correction failure will be returned to the image processing system 100.
  • a transformation matrix may be determined based on the coordinates of the corner points of the certificate in the image. Operation 1220 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
  • the transformation matrix may refer to a matrix used to perform a plurality of transformation operations on an image.
  • the transformation operations may include a linear transformation, a translation, a perspective transformation, or the like, or any combination thereof.
  • the transformation matrix may be determined as where indicates the linear transformation, [a 31 a 32 ] indicates the translation, [a 13 a 23 ] T indicates the perspective transformation, a 33 may be a constant.
  • a processed image may be determined by processing the image based on the transformation matrix. Operation 1230 may be performed by the processing module described in FIG. 1, a correction unit 1130 illustrated in FIG. 11, and a perspective transformation unit 1430 illustrated in FIG. 14.
  • the processed image may be determined by mapping coordinates of pixels in the image using the transformation matrix to coordinates of a pixel in the processed image according to formula (2) below:
  • [x′, y′, w′] refers to coordinates of pixels in the processed image
  • [u, v, w] refers to coordinates of pixels in the processed image.
  • the pixel with the coordinates in the processed image that is mapped by the coordinates of the pixel in the image using the transformation matrix may be designated with a pixel value of the pixel in the image.
  • the processed image may be obtained by designating the pixel value of each pixel in the image as a pixel value of a pixel in the processed image that is mapped with the each pixel in the image.
  • the processed image may be adjusted based on the serial numbers of the corner points of the certificate in the image.
  • Each of the serial numbers of the corner points of the certificate in the image may correspond to a standard serial number of each corner point.
  • a standard serial number of each corner point in the specific certificate may be a default setting of the image processing system 100.
  • standard serial numbers of four vertices 313, 314, 311, and 312 of the rectangle may be 1’, 2’, 3’, and 4’, respectively illustrated in FIG. 3.
  • the obtained serial numbers of the four vertices 313, 314, 311, and 312 in the image may be 3, 4, 1, and 2, respectively illustrated in FIG. 3.
  • the processed image may be adjusted by transforming the processed image so that the serial numbers of the corner points in the image correspond to standard serial numbers.
  • the obtained serial numbers of the corner points 314, 313, 312, and 311 in the image may be 4, 3, 2, and 1, respectively illustrated in FIG. 3.
  • the processed image may be transformed to make the corner point 311 with the serial number 1 coincides with the corner point with the standard serial number 1, the corner point with the serial number 2 coincides with the corner point with the standard serial number 2 coincide, the corner point with the serial number 3 coincides with the corner point with the standard serial number 3, and the corner point with the serial number 4 coincides with the corner point with the standard serial number 4. More descriptions regarding the adjustment of the processed image based on the serial numbers of the corner points may be found elsewhere in the present disclosure (e.g., FIG. 3 and the description thereof) .
  • the processed image may be determined by using a traditional correction algorithm.
  • a traditional correction algorithm detects that a state of the certificate in the image is horizontal and vertical
  • a direction of the certificate in the image may be not corrected.
  • the direction of the certificate may be wrong.
  • the certificate in the image may be upside down, that is the certificate is rotated 180° from a right position.
  • the certificate image may be rotated 90° from a right position, that is two sides of the four sides in the certificate are vertical and the other two sides are horizontal.
  • the traditional algorithm may only guarantee that the certificate in the processed image is horizontal and vertical, but it cannot guarantee that the direction of the certificate in the processed image or the image is correct.
  • the coordinates and serial numbers of corner points of the certificate in the image may be obtained by using the trained feature recognition model.
  • the transformation matrix may be determined based on the coordinates of the corner points of the certificate in the image.
  • the processed image may be determined by processing the image based on the transformation matrix. Compared with the processed image determined based on the traditional correction algorithm, the influence of the image background on the processed image may be reduced or eliminated, the wrong direction of the certificate in the image may be corrected, the perspective problem of the image is solved, thereby improving the image quality of the processed image.
  • FIG. 13 is a flowchart illustrating an exemplary process for recognizing information in a corrected image according to some embodiments of the present disclosure.
  • process 1300 may be executed by the image processing system 100.
  • the process 1300 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure.
  • the processing device 112 and/or an image correction device e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14
  • the operations of the illustrated process presented below are intended to be illustrative.
  • the process 1300 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 1300 illustrated in FIG. 13 and described below is not intended to be limiting.
  • fields may be detected in a processed image and a field region representing each field may be extracted from the processed image. Operation 1310 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • Each of the fields may consist of one or more characters (e.g., words, texts, numbers, symbols) .
  • the field region representing a field may be extracted from the processed image based on an extraction function.
  • the extraction function may be a default setting of the image processing system 100 or may be adjustable under different situations.
  • a target field may be determined according to the user needs.
  • a field region representing the target field may be extracted from the processed image. It is understood that these embodiments described herein are not intended to be limiting. In the present disclosure, the field region may be extracted according to other manners.
  • the one or more characters in each field may be recognized from the field region based on a preset algorithm. Operation 1320 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
  • the one or more characters may be recognized from the field region by using an optical character recognition (OCR) algorithm.
  • OCR optical character recognition
  • the OCR algorithm may be used to automatically recognize characters (e.g., words, texts, numbers, symbols) in a certificate in an image (e.g., the processed image) .
  • the certificate may include an identity (ID) card, a vehicle certificate, a driving license, a bank card, etc.
  • ID identity
  • the OCR algorithm may have a wide range of uses in different fields, for example, the verification of identity information of bank customers and registration information of online car-hailing drivers, etc.
  • the OCR algorithm may quickly and accurately recognize and extract the characters in the certificate in the image, which may solve the problems of low efficiency and high error rate of manual input.
  • the one or more characters may be output to a user device (e.g., the user device 130) or an external device.
  • the processed image may be determined.
  • the one or more characters (i.e., effective information) in the certificate may be recognized by using the OCR algorithm, which not only solves the problem that the traditional correction algorithm is greatly affected by the image background, while solving the problem of inaccurate determination of the direction of the certificate in the image when the traditional correction algorithm is used and the inability to correct the perspective image, thereby improving the image quality of the processed image and the accuracy of character recognition in the processed image.
  • FIG. 14 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure.
  • the image correction device 1400 may include an obtaining unit 1410, a calculation unit 1420, a perspective transformation unit 1430, a training unit 1440, an output unit 1450, and a detection unit 1460.
  • the obtaining unit 1410 may be configured to obtain coordinates and serial numbers of corner points of a certificate in an image by using a trained feature recognition model. More descriptions regarding the obtaining of the coordinates and the serial numbers of the corner points may be found elsewhere in the present disclosure (e.g., operation 1210 in FIG. 12 and the description thereof) .
  • the calculation unit 1420 may be configured to determine a transformation matrix based on the coordinates of the corner points of the certificate in the image. More descriptions regarding the determination of the transformation matrix may be found elsewhere in the present disclosure (e.g., operation 1220 in FIG. 12 and the description thereof) .
  • the perspective transformation unit 1430 may be configured to determine a processed image by processing the image based on the transformation matrix. More descriptions regarding the determination of the processed image may be found elsewhere in the present disclosure (e.g., operation 1230 in FIG. 12 and the description thereof) .
  • the training unit 1440 may be configured to obtain the trained feature recognition model. More descriptions regarding the obtaining of the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 4 and the description thereof) .
  • the detection unit 1460 may be configured to detect fields in the processed image and extract a field region representing each field from the processed image. More descriptions regarding the detection of the fields in the processed image and the extraction of the field region representing each field may be found elsewhere in the present disclosure (e.g., operation 1310 in FIG. 13 and the description thereof) .
  • the detection unit 1460 may be also configured to recognize the one or more characters in each field from the field region based on a preset algorithm. More descriptions regarding the recognition of the one or more characters in each field may be found elsewhere in the present disclosure (e.g., operation 1320 in FIG. 13 and the description thereof) .
  • the output unit 1450 may be configured to output the coordinates and the serial numbers of the corner points of the certificate in the image.
  • the output unit 1450 may be configured to output the processed image.
  • the output unit 1450 may be also configured to output the one or more characters in each field recognized from the field region.
  • the units in the image correction device 1400 may be connected to or communicate with each other via a wired connection or a wireless connection.
  • the wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof.
  • the wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof.
  • two or more of the units may be combined as a single unit, and any one of the units may be divided into two or more units.
  • one or more of the units may be omitted.
  • the detection unit 1460 may be omitted.
  • the training unit 1440, the output unit 1450, and the detection unit 1460 may be omitted.
  • FIG. 15 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure.
  • the image correction device 1500 may include a processor 1510, a memory 1520, and a bus 1530.
  • the memory 1520 may store machine-readable instructions executable by the processor 1510.
  • the processor 1510 may communicate with the memory 1520 through the bus 1530, and the processor 1510 may execute the machine-readable instructions to execute to implement a process (e.g., process 200, process 400, process 500, process 700, process 800, process 900, process 1000, process 1200, process 1300) described elsewhere in the present disclosure.
  • a process e.g., process 200, process 400, process 500, process 700, process 800, process 900, process 1000, process 1200, process 1300
  • the present disclosure may also provide a storage medium storing computer program.
  • the computer program may be executed to implement a process (e.g., process 200, process 400, process 500, process 700, process 800, process 900, process 1000, process 1200, process 1300) described elsewhere in the present disclosure.
  • the storage medium may be a general storage medium, such as a removable disk, a hard disk, etc.
  • aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “unit, ” “module, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
  • a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, or the like, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the "C" programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS) .
  • LAN local area network
  • WAN wide area network
  • SaaS Software as a Service

Abstract

It relates to systems and methods for image processing. The system may obtain an image of an object. The system may determine feature information of the object in the image based on the image by using a trained feature recognition model. The system may process the image based on the feature information.

Description

SYSTEMS AND METHODS FOR IMAGE PROCESSING
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to Chinese Patent Application No. 201911016421.2 filed on October 24, 2019, Chinese Patent Application No. 201911252892.3 filed on December 9, 2019, the contents of each of which are incorporated herein by reference in their entirety.
TECHNICAL FIELD
The present disclosure generally relates to image processing technology, and in particular, to systems and methods for processing an image based on a trained feature recognition model.
BACKGROUND
Online to offline (O2O) services (e.g. online to offline transportation services) utilizing Internet technology have become increasingly popular. Commonly, when registering an O2O service platform, a user needs to upload an image of a certificate (e.g., an identity (ID) card, a vehicle certificate, a driving license, a bank card) to the O2O service platform. The O2O service platform may recognize and extract information (e.g., characters) in the image and determine whether to allow the user to register based on the extracted information. However, in some situations, a size and a shape of the image uploaded by the user, an angle at which the image was taken, etc. do not satisfy a criterion, which may cause the O2O service platform to be unable to recognize or clearly recognize the certificate from the image. Therefore, it is desirable to provide systems and methods for processing or correcting the image, thereby improving the accuracy of recognizing information from the image.
SUMMARY
An aspect of the present disclosure relates to a system for image processing. The system may include at least one storage device including a set of instructions and at least one processor configured to communicate with the at least one storage device. When executing the set of instructions, the at least one processor is configured to direct the system to perform operations. The operations  may include obtaining an image of an object; determining feature information of the object in the image based on the image by using a trained feature recognition model; and processing the image based on the feature information.
In some embodiments, the feature information may include positions of at least three key points of the object in the image. The processing the image based on the feature information may include obtain reference positions of the at least three key points of the object and processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object.
In some embodiments, the at least three key points may include corner points of the object.
In some embodiments, the processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object may include determining a transformation matrix based on the positions of at least three key points of the object in the image and the reference positions of the at least three key points of the object, and processing the image to obtain a processed image by transforming the image using the transformation matrix.
In some embodiments, the feature information may include direction information of the object. The direction information of the object may include a deflection direction of the object relative to a reference line associated with the image. The processing the image based on the feature information may include processing the image based on the direction information of the object.
In some embodiments, the feature information may include a confidence level. The confidence level may indicate accuracy of the feature information of the object determined from the image. The processing the image based on the feature information may include determining whether the confidence level of the feature information satisfies a condition, in response to determining that the confidence level  of the feature information satisfies the condition, processing the image based on the feature information.
In some embodiments, the processing the image may include determining a target region of the object from the image by using an object recognition model and processing the target region of the object in the image.
In some embodiments, the determining the target region of the object from the image by using the object recognition model may include inputting the feature information of the object into the object recognition model and determining the target region of the object from the image based on an output of the object recognition model.
In some embodiments, the trained feature recognition model may include a plurality of sub-models. Each of the plurality of sub-models may correspond to a reference object type. The determining the feature information of the object in the image based on the image by using the trained feature recognition model may include determining an object type of the object; obtaining a sub-model from the plurality of sub-models based on the object type of the object and the reference object type corresponding to the sub-model; and determining the feature information of the object in the image by using the sub-model.
In some embodiments, the trained feature recognition model may be obtained by a training process. The training process may include obtaining a plurality of training samples each of which includes an image obtained by performing an angle transformation on a sample image and obtaining the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples.
A further aspect of the present disclosure relates to a method for image processing. The method may be implemented on a computing device including at least one processor, at least one storage medium, and a communication platform connected to a network. The method may include obtaining an image of an object; determining feature information of the object in the image based on the image by  using a trained feature recognition model; and processing the image based on the feature information.
A still further aspect of the present disclosure relates to a non-transitory computer readable medium including executable instructions. When the executable instructions are executed by at least one processor, the executable instructions may direct the at least one processor to perform a method. The method may include obtaining an image of an object; determining feature information of the object in the image based on the image by using a trained feature recognition model; and processing the image based on the feature information.
Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities, and combinations set forth in the detailed examples discussed below.
BRIEF DESCRIPTION OF THE DRAWINGS
The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:
FIG. 1 is a schematic diagram illustrating an exemplary image processing system according to some embodiments of the present disclosure;
FIG. 2 is a flowchart illustrating an exemplary process for processing an image according to some embodiments of the present disclosure;
FIG. 3 is a schematic diagram illustrating an exemplary image of an object and exemplary processed images of the object according to some embodiments of the present disclosure;
FIG. 4 is a flowchart illustrating an exemplary process for obtaining a trained feature recognition model according to some embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure;
FIG. 6 is a schematic diagram illustrating exemplary corner points according to some embodiments of the present disclosure;
FIG. 7 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure;
FIG. 8 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure;
FIG. 9 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure;
FIG. 10 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure;
FIG. 11 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure;
FIG. 12 is a flowchart illustrating an exemplary process for correcting an image according to some embodiments of the present disclosure;
FIG. 13 is a flowchart illustrating an exemplary process for recognizing information in a processed image according to some embodiments of the present disclosure;
FIG. 14 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure; and
FIG. 15 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure.
DETAILED DESCRIPTION
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the  present disclosure may be practiced without such details. In other instances, well-known methods, procedures, systems, components, and/or circuitry have been described at a relatively high-level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but to be accorded the widest scope consistent with the claims.
It will be understood that the terms “system, ” “engine, ” “unit, ” “module, ” and/or “block” used herein are one method to distinguish different components, elements, parts, sections, or assemblies of different levels in ascending order. However, the terms may be displaced by other expressions if they may achieve the same purpose.
Generally, the words “module, ” “unit, ” or “block” used herein, refer to logic embodied in hardware or firmware, or to a collection of software instructions. A module, a unit, or a block described herein may be implemented as software and/or hardware and may be stored in any type of non-transitory computer-readable medium or other storage device. In some embodiments, a software module/unit/block may be compiled and linked into an executable program. It will be appreciated that software modules can be callable from other modules/units/blocks or from themselves, and/or may be invoked in response to detected events or interrupts. Software modules/units/blocks configured for execution on computing devices may be provided on a computer-readable medium, such as a compact disc, a digital video disc, a flash drive, a magnetic disc, or any other tangible medium, or as a digital download (and can be originally stored in a compressed or installable format that needs installation, decompression, or decryption prior to execution) . Such software code may be stored, partially or fully, on a storage device of the executing computing device, for execution by the  computing device. Software instructions may be embedded in firmware, such as an EPROM. It will be further appreciated that hardware modules (or units or blocks) may be included in connected logic components, such as gates and flip-flops, and/or can be included in programmable units, such as programmable gate arrays or processors. The modules (or units or blocks) or computing device functionality described herein may be implemented as software modules (or units or blocks) , but may be represented in hardware or firmware. In general, the modules (or units or blocks) described herein refer to logical modules (or units or blocks) that may be combined with other modules (or units or blocks) or divided into sub-modules (or sub-units or sub-blocks) despite their physical organization or storage.
It will be understood that when a unit, an engine, a module, or a block is referred to as being “on, ” “connected to, ” or “coupled to” another unit, engine, module, or block, it may be directly on, connected or coupled to, or communicate with the other unit, engine, module, or block, or an intervening unit, engine, module, or block may be present, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a, ” “an, ” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprise, ” “comprises, ” and/or “comprising, ” “include, ” “includes, ” and/or “including, ” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In addition, it should be understood that in the description of the present disclosure, the terms “first” , “second” , or the like, are only used for the purpose of  differentiation, and cannot be interpreted as indicating or implying relative importance, nor can be understood as indicating or implying the order.
These and other features, and characteristics of the present disclosure, as well as the methods of operation, various components of the stated system, functions of the related elements of structure, and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It is understood that the drawings are not to scale.
The flowcharts used in the present disclosure illustrate operations that systems implemented according to some embodiments of the present disclosure. It is to be expressly understood that the operations of the flowchart may be implemented not in order. Conversely, the operations may be implemented in inverted order, or simultaneously. Moreover, one or more other operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.
An aspect of the present disclosure relates to systems and methods for processing an image. The systems and methods may obtain an image of an object (e.g., a certificate) . According to the image of the object, the systems and methods may determine feature information (e.g., coordinate of four corner points in the object) of the object in the image by using a trained feature recognition model. Further, the systems and methods may process the image based on the feature information. For example, the systems and methods may determine a transformation matrix based on at least portion of the feature information. The systems and methods may perform on a transformation on the image of the object based on the transformation matrix.
According to the systems and methods of the present disclosure, using the trained feature recognition model, the feature information may be identified and extracted from the image accurately and effectively, and the image of the object may be processed (e.g., corrected) based on the feature information, which may improve the efficiency and accuracy of image processing (e.g., correction) , thereby improving the accuracy of effective information recognition based on the image.
FIG. 1 is a schematic diagram illustrating an exemplary image processing system according to some embodiments of the present disclosure. As shown in FIG. 1, the image processing system 100 may include a server 110, a user device 130, and a third party 140.
The server 110 may be a single server or a server group. The server group may be centralized or distributed (e.g., server 110 may be a distributed system) . In some embodiments, the server 110 may be local or remote. For example, the server 110 may access information and/or data stored in the user device 130, the third party 140. As another example, the server 110 may be directly connected to the user device 130, the third party 140 to access stored information and/or data. In some embodiments, the server 110 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
In some embodiments, the server 110 may include a processing device 112. The processing device 112 may process information and/or data relating to image processing to perform one or more functions described in the present disclosure. For example, the processing device 112 may obtain an image 120 of an object. According to the image 120, the processing device 112 may determine feature information of the object in the image 120 by using a trained feature recognition model. Further, the processing device 112 may process the image 120 based on the feature information. In some embodiments, the processing device 112 may include one or more processing engines (e.g., single-core processing engine (s) or  multi-core processor (s) ) . The processing device 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction set computer (RISC) , a microprocessor, or the like, or any combination thereof.
In some embodiments, the server 110 may be unnecessary and all or part of the functions of the server 110 may be implemented by other components (e.g., the third party 140, the user device 130) of the image processing system 100. For example, the processing device 112 may be integrated into the third party 140 or the user device 130 and the functions (e.g., processing image) of the processing device 112 may be implemented by the third party 140 or the user device 130.
The user device 130 may be configured to transmit information and/or data to the server 110 and/or the third party 140 or receive information and/or data from the server 110 and/or the third party 140. For example, the user device 130 may transmit the image 120 to the server 110 for processing. As another example, the user device 130 may receive a processed image 150 (also referred to as a processed image) from the server 110. In some embodiments, the user device 130 may process information and/or data received from the server 110 and/or the third party 140. In some embodiments, the user device 130 may provide a user interface via which a user may view information and/or input data and/or instructions to the image processing system 100. For example, the user may view the processed image 150 via the user interface.
In some embodiments, the user device 130 may include a mobile device 130-1, a tablet computer 130-2, a laptop computer 130-3, a built-in device 130-4 in a vehicle, or the like, or any combination thereof. In some embodiments, the mobile device 130-1 may include a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any  combination thereof. In some embodiments, the smart home device may include a smart lighting device, a control device of an intelligent electrical apparatus, a smart monitoring device, a smart television, a smart video camera, an interphone, or the like, or any combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart footgear, a smart glass, a smart helmet, a smart watch, a smart clothing, a smart backpack, a smart accessory, or the like, or any combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistance (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or any combination thereof. In some embodiments, the virtual reality device and/or the augmented reality device may include a virtual reality helmet, a virtual reality glass, a virtual reality patch, an augmented reality helmet, an augmented reality glass, an augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device and/or the augmented reality device may include a Google Glass TM, an Oculus Rift TM, a Hololens TM, a Gear VR TM, etc. In some embodiments, a built-in device 130-4 in the vehicle may include an onboard computer, an onboard television, etc.
The third party 140 may be configured to transmit information and/or data to the server 110 and/or the user device 130 or receive information and/or data from the server 110 and/or the user device 130. For example, the third party 140 may obtain the image 120 from the user device 130 after receiving an authorization of the user and transmit the image 120 to the server 110 for processing. As another example, the third party 140 may receive the processed image 150 from the server 110. In some embodiments, the third party 140 may include a search engine 140-1, a social media 140-2, a news media 140-3, a map website 140-4, etc. The search engine 140-1 may include Google, Yahoo, Baidu, Microsoft, NHN, or the like, or any combination thereof. The social media 140-2 may include Facebook, Youtube, WhatsApp, LinkedIn, Twitter, Weibo, WeChat, QQ, or the like, or any combination thereof. The news media 140-3 may include Phoenix, Tencent News, Netease News, Sohu News, Associated Press, Cable News Network, or the like, or any  combination thereof. The map website 140-4 may include Baidu map, Google map, Gaode map, Sogou map, or the like, or any combination thereof.
In some embodiments, the image processing system 100 may further include a network (not shown) . The network may facilitate exchange of information and/or data. In some embodiments, one or more components (e.g., the server 110, the user device 130, the third party 140) of the image processing system 100 may transmit information and/or data to other component (s) of the image processing system 100 via the network. For example, the server 110 may obtain the image 120 from the user device 130 or the third party 140 via the network. As another example, the user device 130 may receive the processed image 150 from the server 110 via the network. In some embodiments, one or more components (e.g., the server 110, the user device 130, the third party 140) of the image processing system 100 may communicate information and/or data with one or more external resources such as an external database, etc. via the network. For example, the server 110 may obtain a trained feature recognition model from a database of a vendor or manufacture that provides and/or updates the trained feature recognition model.
In some embodiments, the network may be any type of wired or wireless network, or any combination thereof. Merely by way of example, the network may include a cable network, a wireline network, an optical fiber network, a telecommunications network, an intranet, an Internet, a local area network (LAN) , a wide area network (WAN) , a wireless local area network (WLAN) , a metropolitan area network (MAN) , a public telephone switched network (PSTN) , a Bluetooth network, a ZigBee network, a near field communication (NFC) network, or the like, or any combination thereof. In some embodiments, the network may include one or more network access points. For example, the network may include wired or wireless network access points such as base stations and/or internet exchange points, through which one or more components of the image processing system 100 may be connected to the network to exchange data and/or information.
In some embodiments, the image processing system 100 may further include a storage device (not shown) . The storage device may store data and/or instructions relating to the image processing. In some embodiments, the storage device may store data (e.g., the image 120) obtained from the user device 130 or the third party 140. In some embodiments, the storage device may store data and/or instructions that the server 110 may execute or use to perform exemplary methods described in the present disclosure. In some embodiments, the storage device may include a mass storage, a removable storage, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. Exemplary mass storage may include a magnetic disk, an optical disk, a solid-state drive, etc. Exemplary removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc. Exemplary volatile read-and-write memory may include a random access memory (RAM) . Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc. Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (EPROM) , an electrically erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc. In some embodiments, the storage device may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.
In some embodiments, the storage device may be connected to the network to communicate with one or more components (e.g., the server 110, the user device 130, the third party 140) of the image processing system 100. One or more components of the image processing system 100 may access the data and/or instructions stored in the storage device via the network. In some embodiments, the storage device may be directly connected to or communicate with one or more  components (e.g., the server 110, the user device 130, the third party 140) of the image processing system 100. In some embodiments, the storage device may be part of the server 110.
In some embodiments, the image processing system 100 may be or be integrated into a system for online to offline services. For example, the image processing system 100 may be integrated into an online transportation service platform for transportation services such as taxi hailing, chauffeur services, delivery vehicles, express car, carpool, bus service, driver hiring, shuttle services, etc. The online transportation service platform may process images received from a user of the online transportation service platform. As another example, the image processing system 100 may be integrated into a face scan payment service platform for payment services such as face scan payment in a supermarket, a hospital, a bank, etc. The face scan payment service platform may process images received from a user of the face scan payment service platform.
In some embodiments, the image processing system 100 (e.g., the processing device 112) may include an obtaining module, a determination module, a processing module, and a training module.
The obtaining module may be configured to obtain an image (e.g., the image 120) of an object. More descriptions regarding the obtaining of the image of the object may be found elsewhere in the present disclosure (e.g., operation 510 in FIG. 2 and the description thereof) .
The determination module may be configured to determine feature information of the object in the image based on the image by using a trained feature recognition model. More descriptions regarding the determining of the feature information of the object in the image may be found elsewhere in the present disclosure (e.g., operation 520 in FIG. 2 and the description thereof) .
The processing module may be configured to process the image based on the feature information. More descriptions regarding the processing of the image  may be found elsewhere in the present disclosure (e.g., operation 530 in FIG. 3 and the description thereof) .
The training module may be configured to obtain a trained feature recognition model. In some embodiments, the training module may obtain a plurality of training samples. Further, the training module may obtain the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples. More descriptions regarding the obtaining of the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 4 and the description thereof) .
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, the training module may be implemented on a separate device (e.g., a processing device independent from the processing device 112) . As another example, the training module may be unnecessary and the trained feature recognition model may be obtained from a storage device (e.g., the storage device described above, an external database) disclosed elsewhere in the present disclosure. In addition, it should be understood that in the description of the present disclosure, the terms “process” of “correct” are used interchangeably.
FIG. 2 is a flowchart illustrating an exemplary process for processing an image according to some embodiments of the present disclosure. In some embodiments, process 200 may be executed by the image processing system 100. For example, the process 200 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image correction device (e.g., an image correction device 1100 illustrated in FIG. 11, an image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may  accordingly be directed to perform the process 200. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 200 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 200 illustrated in FIG. 2 and described below is not intended to be limiting.
In 210, the processing device 112 (e.g., the obtaining module) may obtain an image (e.g., the image 120) (also referred to as an image to be processed) of an object.
The object may include a biological object and/or a non-biological object. The biological object may include people, an animal, a plant, or the like, or any combination thereof. The non-biological object may include a natural product (e.g., a stone, a building) , an artifact, or the like, or any combination thereof. For example, the artifact may include a certificate, a photo, a drawing, or a document with text or patterns, or the like, or any combination thereof. The certificate may include an identity (ID) card, a vehicle certificate, a driving license, a bank card (e.g., a credit card, a debit card) , an employee’s card, etc.
In some embodiments, the user device 130 (e.g., the mobile device 130-1, the tablet computer 130-2, the laptop computer 130-3, the built-in device 130-4 in a vehicle) may acquire (or capture) the image of the object and send the image of the object to the processing device 112. In some embodiments, the third party 140 may send a request for acquiring the image to the user device 130. The user device 130 may send the image to the third party 140 in response to receive an authorization or confirmation from a user of the user device 130. Additionally or alternatively, the third party 140 may send the image to the processing device 112. Merely by way of example, the image may be a frame in a video acquired by the user device 130 or from the third party 140. The processing device 112 may obtain and/or determine the image (i.e., the frame) from the video. For example, the processing device 112 may perform a framing operation on the video to obtain a plurality of frames in the  video. The processing device 112 may designate one of the plurality of frames as the image.
In some embodiments, the image of the object may be acquired by the user device 130 or from the third party 140 and stored in a storage device disclosed elsewhere in the present disclosure. The processing device 112 may obtain the image of the object from the storage device via a network disclosed elsewhere in the present disclosure.
In 220, the processing device 112 (e.g., the determination module) may determine feature information of the object in the image based on the image by using a trained feature recognition model (also referred to as a model or a preset correction model) .
The feature information of the object may include positions of at least three key points of the object in the image, direction information of the object, a confidence level, serial numbers of the at least three key points of the object, or the like, or any combination thereof.
A key point may refer to a point representing a portion of the object and may be used to determine or define a position of the object in the image. The at least three key points may include corner points of the object (also referred to as global corner points) , points inside a region in the image representing at least a portion of the object, points at an edge of the object represented in the image, or the like, or any combination thereof. As used herein, a corner point of the object may refer to an intersection of at least two edge lines of the object in the image. The points inside the region representing at least a portion of the object may include a center point (e.g., a geometric center point) in the region and points at some specific positions. For example, the center points may include a point in the region representing a geometric center of the object (also referred to as a global center point) , a point in the region representing a geometric center of a pattern in the object (also referred to as a local center point) , etc. As another example, the points at some specific positions inside the region may include corner points (also referred to  as local corner points) (e.g., an upper left corner, an upper right corner, a lower left corner, a lower right corner) of a pattern or a character in the object. The at least three key points may not be on a straight line. For example, when the object is a rectangular ID card, the at least three key points may be any three of four vertices of the rectangular ID card. As another example, the at least three key points may be a center point and any two vertices of the rectangular ID card that are not on a diagonal line of the rectangular ID card. The positions of the at least three key points of the object may include coordinates of the at least three key points in the image of the object.
The direction information of the object may include a deflection direction of the object. The deflection direction may be denoted by a deflection angle of the object. As used herein, the deflection angle may refer to an angle between a central axis of the object and a reference line associated with the image. The reference line may include a boundary of the image, a center line of the image, an axis of an image coordinate system of the image, etc. For example, the deflection angle may be an angle between a central axis of the object in a vertical direction and a vertical axis of an image coordinate system of the image of the object. As another example, the deflection angle may be an angle between a central axis of the object in a horizontal direction and a horizontal axis of the image coordinate system of the image of the object. The deflection direction of the object may include a clockwise and a counterclockwise.
The confidence level may indicate the accuracy of the feature information of the object. For example, a confidence level of a corner point may indicate the accuracy of a coordinate of the corner point. Merely by way of example, a range value of the confidence level may be 0-1. The closer the confidence level of a corner point is to 1, the more accurate of the coordinates of the corner point determined from the image may be. The closer the confidence level of a corner point is to 0, the less accurate of the coordinates of the corner point determined from the image may be.
A serial number of a key point may refer to a number used to distinguish the key point from other key points. The all key points of the object may be numbered in a certain order. For example, when the object is face up (e.g., texts in the object are vertically upward) , all key points of the object may be numbered clockwise from an upper left corner of the object. As another example, when the object is upside down, the all key points of the object may be numbered clockwise from a lower left corner of the object. More descriptions regarding the feature information of the object may be found elsewhere in the present disclosure (e.g., FIG. 3 and the description thereof) .
In some embodiments, the trained feature recognition model may be trained by the processing device 112 or a processing device different from the processing device 112 and stored in a storage device disclosed elsewhere in the present disclosure. The processing device 112 may retrieve the trained feature recognition model from the storage device. In some embodiments, the trained feature recognition model may include a machine learning model, for example, a deep learning model. The deep learning model may include a convolutional neural network (CNN) model, a recurrent neural network (RNN) model, a deep belief network (DBN) model, a stacked auto-encoder network model, or the like, or any combination thereof. In some embodiments, the trained feature recognition model may be trained based on a plurality of training samples. More descriptions regarding the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 4 and the description thereof) .
In some embodiments, the processing device 112 may input the image of the object into the trained feature recognition model and determine the feature information of the object in the image based on an output of the trained feature recognition model.
In some embodiments, the trained feature recognition model may include a plurality of sub-models. Each of the plurality of sub-models may correspond to a reference object type. The reference object type may be default settings (e.g., a  card, a document, etc. ) of the image processing system 100 or may be adjustable under different situations. The processing device 112 may determine an object type of the object. Merely by way of example, the processing device 112 may determine the object type of the object based on a user input. As another example, the processing device 112 may determine the object type of the object by using a trained type classification model. The processing device 112 may obtain a sub-model from the plurality of sub-models based on the object type of the object and the reference object type corresponding to the sub-model. For example, when the object type of the object is a bank card, the processing device 112 may obtain a sub-model corresponding to the card from the plurality of sub-models. Further, the processing device 112 may determine the feature information of the object in the image by using the sub-model. For example, the processing device 112 may input the image of the object into the sub-model and determine the feature information of the object in the image based on an output of the sub-model. In this embodiment, feature information of different objects with different object types may be determined based on different sub-models, which may improve the efficiency and accuracy of the image processing.
In 230, the processing device 112 (e.g., the processing module) may process the image based on the feature information.
In some embodiments, the processing device 112 may obtain reference positions of the at least three key points of the object. Reference positions of key points (e.g., corner points) of the object may refer to positions where information in the image is easy to be recognized when the at least three key points of the object in the image are at the reference positions. For example, it is assumed that the object is a rectangular ID card, when two sides of the rectangular ID card in an image of the rectangular ID card are horizontal, two sides of the rectangular ID card in the image of the rectangular ID card are vertical, the rectangular ID card represented in the image is undistorted, and a direction of texts in the image of the rectangular ID card is upward, positions of corner points (e.g., four vertices) of the rectangular ID card  may be at reference positions of the corner points (e.g., four vertices) the rectangular ID card. Different object types (e.g., a card, a document) of objects may have different reference positions. The reference positions of key points of various objects may be set by the image processing system 100 and stored in a storage device disclosed elsewhere in the present disclosure. For example, the image processing system 100 may set the reference positions according to the object types of various objects in an image coordinate system defined by the system 100. Different object types of objects may correspond to different sizes and/or shapes. The image processing system 100 may set the reference positions of key points (e.g., corner points) of an object with a type corresponding to the sizes and/or shapes in the image coordinate system.
The processing device 112 may determine the reference positions of the at least three key points from a reference position database in the storage device. Merely by way of example, the processing device 112 may determine an object type of the object. According to the object type of the object, the processing device 112 may retrieve reference positions of key points of the object from the reference position database. Further, the processing device 112 may select at least three key points from the key points of the object and obtain the reference positions corresponding to the at least three key points.
As described in connection with operation 220, the feature information of the object may include the positions of at least three key points of the object. The processing device 112 may process the image based on the position of at least one of the at least three key points of the object and the preset reference position of the at least one of the at least three key points of the object. Specifically, the processing device 112 may determine whether the position of the at least one of the at least three key points and the reference position of the at least one of the at least three key points coincide. If the position of the at least one of at least three key points and the reference position of the at least one of the at least three key points do not coincide, the processing device 112 may process the image. For example,  the processing device 112 may determine a transformation matrix (also referred to as a perspective correction matrix or a perspective transformation matrix) based on the positions of the at least three key points and the preset reference positions of the at least three key points. The processing device 112 may process the image to obtain a processed image by transforming the image using the transformation matrix. More descriptions regarding the processing of the image using the transformation matrix may be found elsewhere in the present disclosure (e.g., FIG. 5, FIG. 12, and the description thereof) .
As described in connection with operation 220, the feature information of the object may include the serial numbers of the at least three key points of the object. In some embodiments, the processing device 112 may process the image based on the serial numbers of the at least three key points of the object. More descriptions regarding the processing of the image based on the serial numbers of the at least three key points of the object may be found elsewhere in the present disclosure (e.g., FIG. 3 and the description thereof) .
As described in connection with operation 220, the feature information of the object may include the direction information of the object. In some embodiments, the image of the object is just in a wrong direction. For example, a central axis of the object in the image is mot horizontal or vertical. As another example, the object in the image is upside down. In such cases, the processing device 112 may process the image based on the direction information of the object. For example, a plurality of transformation matrixes corresponding to different direction information (e.g., the deflection angle and deflection direction) of the object may be previously generated and stored in a storage device disclosed elsewhere in the present disclosure or an external database. The processing device 112 may retrieve a transformation matrix corresponding to the direction information of the object from the plurality of transformation matrixes in the storage device. Further, the processing device 112 may process the image based on the transformation matrix corresponding to the direction information of the object.
As described in connection with operation 220, the feature information of the object may include a confidence level. In some embodiments, the processing device 112 may process the image based on the confidence level of the feature information. Specifically, the processing device 112 may determine whether the confidence level of the feature information satisfies a condition. For example, the processing device 112 may determine whether the confidence level of the feature information is larger than a confidence level threshold. The confidence level threshold may be a default setting (e.g., 0.5) of the image processing system 100 or may be adjustable under different situations. When the confidence level of the feature information is larger than the confidence level threshold, the processing device 112 may determine that the confidence level of the feature information satisfies the condition. In response to determining that the confidence level of the feature information satisfies the preset condition, the processing device 112 may process the image based on the feature information. When the confidence level of the feature information is less than the confidence level threshold, the processing device 112 may determine that the confidence level of the feature information does not satisfy the condition. In response to determining that the confidence level of the feature information does not satisfy the preset condition, the processing device 112 may return an instruction indicating that the image can not be processed or roughly process the image. For example, the processing device 112 may move or rotate the image at any angle and output the roughly processed image. Further, the roughly processed image may be further processed by the image processing system 100 according to operations 210-230.
In some embodiments, the processing device 112 may determine a target region (also referred to as a text region or a field region) of the object in the processed image. The target region of the object may refer to a region where a picture and/or at least one character (e.g., texts, symbols) in the object is located. Further, the processing device 112 may determine the target region of the object in the processed image. More descriptions regarding the determination of the target  region in the processed image may be found elsewhere in the present disclosure (e.g., FIG. 7, FIG. 13, and the description thereof) .
In some embodiments, the processing device 112 may determine a target region of the object in the image and process the target region by performing a similar manner as the processing of the image as described above. Merely by way of example, the processing device 112 may determine the target region of the object in the image by using an object recognition model (also referred to as a region segmentation model) . For example, the processing device 112 may input the feature information of the object into the region segmentation model and determine the target region of the object in the image based on an output of the object recognition model. Further, the processing device 112 may recognize effective information from the processed target region of the object in the image, which may avoid processing the entire image, thereby improving the efficiency and accuracy of the image processing.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, one or more other optional operations (e.g., a storing operation, a transmitting operation) may be added elsewhere in the process 200. In the storing operation, the processing device 112 may store information and/or data (e.g., the image of the object, the feature information, the trained feature recognition model, the processed image of the object) associated with the image processing in a storage device disclosed elsewhere in the present disclosure. In the transmitting operation, the processing device 112 may transmit the processed image of the object to the user device 130. As another example, the processing device 112 may further perform an operation including obtaining the trained feature recognition model.
FIG. 3 is a schematic diagram illustrating an exemplary image of an object and exemplary processed images of the object according to some embodiments of the present disclosure. As described in connection with operation 220, the feature information of the object may include positions of at least three key points of the object and the serial numbers of the at least three key points of the object.
As illustrated in FIG. 3, the image 310 may be an image of a vehicle certificate. The key points of the object in the image 310 may be four  vertices  311, 312, 313, and 314 of the vehicle certificate. The positions of the key points may be coordinates of the four  vertices  311, 312, 313, and 314 in the image 310. The serial numbers of the  key points  311, 312, 313, and 314 may be 1, 2, 3, and 4. In some embodiments, reference positions of all key points of the object may be numbered. A manner of numbering the reference positions may be in a same manner as a manner of numbering key points as described in connection with FIG. 2. According to the manner of numbering the reference positions, the reference serial numbers of the reference positions may be previously determined and stored in a storage device disclosed elsewhere in the present disclosure or an external database. The processing device 112 may retrieve the reference serial numbers of the reference positions from the storage device based on the serial numbers of the key points. Further, according to the reference serial numbers of the reference positions, the processing device 112 may obtain the reference positions. For example, according to the  serial numbers  1, 2, 3, and 4 of the  key points  311, 312, 313, and 314, the processing device 112 may obtain reference serial numbers 1’, 2’, 3’, and 4’. Further, the processing device 112 may obtain reference positions A, B, C, and D of the  key points  311, 312, 313, and 314 based on the reference serial numbers 1’, 2’, 3’, and 4’. The processing device 112 may determine a transformation matrix based on the coordinates of the  key points  311, 312, 313, and 314 and coordinates of the reference positions A, B, C, and D. According to the transformation matrix, the processing device 112 may obtain a processed image 320. Compared to the image 310, effective information (e.g., the license plate number in the vehicle  certificate) may be recognized from the processed image 320 more clearly and easily.
FIG. 4 is a flowchart illustrating an exemplary process for obtaining a trained feature recognition model according to some embodiment of the present disclosure. In some embodiments, process 400 may be executed by the image processing system 100. For example, the process 400 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device disclosed elsewhere in the present disclosure) . In some embodiments, the processing device 112 and/or an image correction device (e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 400. Alternatively, the process 400 may be performed by a computing device of a system of a vendor that provides and/or maintains such a trained feature recognition model. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 400 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 400 illustrated in FIG. 4 and described below is not intended to be limiting.
In 410, the processing device 112 (e.g., the training module) may obtain a plurality of training samples (also referred to as sample images) .
Each of the plurality of training samples may be an image of an object, for example, an identity (ID) card, a vehicle certificate, a driving license, a bank card (e.g., a credit card, a debit card) , an employee’s card, etc. In some embodiments, each of the plurality of training samples may include at least one whole object. For example, a training sample may be an image of two whole objects.
In some embodiments, at least one of the plurality of training samples may be previously generated and stored in a storage device disclosed elsewhere in the present disclosure or an external database. The processing device 112 may  retrieve the training samples directly from the storage device. Additionally or alternatively, at least one of the plurality of training samples may be acquired by the user device 130 or from the third party 140.
In some embodiments, each of the plurality of training samples may include an image obtained by performing an angle transformation on a sample image (also referred to as a training image) . The angle transformation may be performed by changing the direction of the object in the sample image. For example, a training sample may be obtained by rotating the object in the sample image with an arbitrary angle. In some embodiments, each of the plurality of training samples may include an image obtained by performing a translation operation on a sample image. For example, a training sample may be obtained by translating the object in the sample image with any distance along one or more directions.
In some embodiments, at least one of the plurality of training samples may be obtained by photographing the object with different placement manners or from different directions, which avoids the image distortion caused by the angle transformation or the translation operation, thereby improving the accuracy of the training samples.
In some embodiments, feature information of the object in each training sample may be marked or annotated. As described in connection with operation 220, the feature information of the object may include positions of at least three key points of the object, direction information of the object, a confidence level, serial numbers of the at least three key points of the object, or the like, or any combination thereof. The feature information of the object in each training sample may be marked manually. It is understood that this embodiment described herein is not intended to be limiting. In the present disclosure, the feature information of the object in each training sample may be marked by other improved manners, for example, using a marking model.
In 420, the processing device 112 (e.g., the training module) may obtain the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples.
In some embodiments, the preliminary machine learning model may a deep learning model. As described in connection with FIG. 2, the deep learning model may include a CNN model, an RNN model, a DBN model, a stacked auto-encoder network model, or the like, or any combination thereof. In some embodiments, the preliminary machine learning model may include at least one model parameter with a preliminary value. The preliminary value of the at least one model parameter may be a default setting of the image processing system 100 or may be adjustable under different situations.
In some embodiments, the processing device 112 may train the preliminary machine learning model based on one or more gradient descent algorithms. Exemplary gradient descent algorithms may include an Adam optimization algorithm, a stochastic gradient descent (SGD) + Momentum optimization algorithm, a Nesterov accelerated gradient (NAG) algorithm, an adaptive gradient (Adagrad) algorithm, an adaptive delta (Adadelta) algorithm, a root mean square propagation (RMSprop) algorithm, an AdaMax algorithm, a Nadam (Nesterov-accelerated adaptive moment estimation) algorithm, an AMSGrad (Adam+SGD) algorithm, or the like, or any combination thereof.
In some embodiments, the processing device 112 may train the preliminary machine learning model iteratively until a termination condition is satisfied. In response to that the termination condition is satisfied, the trained feature recognition model may be obtained. In some embodiments, the termination condition may relate to a value of a loss function. For example, the termination condition may be satisfied if the value of the loss function is minimal or smaller than a threshold. As another example, the termination condition may be satisfied if the value of the loss function converges. In some embodiments, “convergence” may refer to that the variation of the values of the loss function in two or more consecutive iterations is  equal to or smaller than a vibration threshold. In some embodiments, “convergence” may refer to that a difference between the value of the loss function and a target value is equal to or smaller than a difference threshold. In some embodiments, the termination condition may be satisfied when a specified count of iterations have been performed in the training process. In some embodiments, when feature information of the object output by the preliminary machine learning model is the same as the marked feature information in each of the plurality of training samples, the termination condition may be satisfied.
It should be noted that the above description regarding the process 400 is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, one or more operations may be added or omitted. For example, the processing device 112 may update the trained feature recognition model periodically or irregularly based on one or more newly-generated training samples (e.g., new sample images) . As another example, the processing device 112 may divide the plurality of training samples into a training set and a test set. The training set may be used to train the model and the test set may be used to determine whether the training process has been completed.
FIG. 5 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure. In some embodiments, process 500 may be executed by the image processing system 100. For example, the process 500 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image correction device (e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 500. The  operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 500 illustrated in FIG. 5 and described below is not intended to be limiting.
In 510, an image to be processed and coordinates of corner points in the image to be processed may be obtained. Operation 510 may be performed by the obtaining model and the determination module described in FIG. 1, an information obtaining unit 1110 illustrated in FIG. 11, and an obtaining unit 1410 illustrated in FIG. 14.
In 520, a transformation matrix may be calculated based on the coordinates of the corner points in the image and coordinates of target corner points (also referred to as reference positions of the corner points) . Operation 520 may be performed by the processing module described in FIG. 1, a calculation unit 1120 illustrated in FIG. 11, and a calculation unit 1420 illustrated in FIG. 14. Points at the reference positions may be referred to as target corner points. More descriptions regarding the reference positions may be found elsewhere in the present disclosure (e.g., FIG. 2 and the description thereof) .
In 530, a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix. Operation 530 may be performed by the processing module described in FIG. 1, a correction unit 1130 illustrated in FIG. 11, and a perspective transformation unit 1430 illustrated in FIG. 14.
The image to be processed may be obtained based on the methods described in the present disclosure (e.g., FIG. 1 and the descriptions thereof) . The image to be processed may be a non-horizontal and tilt image (e.g., the image 310 illustrated in FIG. 3) . Coordinates of four corner points in the image to be processed may be obtained by using a trained feature recognition model (e.g., a deep learning model) . The transformation matrix may be obtained according to the  coordinates of four corner points and the coordinates of the target corner points. The coordinates of the target corner points may be coordinates of the corner points in an expected horizontal image. For example, as shown in FIG. 6, the four corner points in the image to be processed may be K, L, M, and N in an image 610 to be processed and the target corner points may be K’, L’, M’, and N’ in an expected horizontal image 620. Coordinates of the corner point A may be denoted as [u, v, w] and coordinates of the target corner point A' may be denoted as [x′, y′, w′] . A relationship between the coordinates of the corner point A and the coordinates of the target corner point A' may be denoted as formula (1) below:
Figure PCTCN2020122362-appb-000001
where
Figure PCTCN2020122362-appb-000002
refers to the transformation matrix.
Therefore, the transformation matrix may be determined according to the coordinates of four corner points and the coordinates of the target corner points. Further, the processed image in a horizontal state may be obtained by mapping a value of each pixel in the image to be processed to a value of a pixel corresponding to the pixel in the processed image in the horizontal state using the transformation matrix. The embodiments of the present disclosure may solve the problems of the inaccuracy in calculating a rotation angle of a certificate according to a tilt direction of a text in the certificate and the inability to correct a perspective image, thereby improving the accuracy of the image correction.
It should be noted that the image to be processed may include an image of a certificate or a business card that needs to be processed, an image taken directly by a user, an image extracted from images taken by the user, etc. In some embodiments, when performing an identity verification, a user needs to upload an image of his/her certificate. The user may point a camera directly at the certificate to take an image. The obtained image may only contain the certificate and no other objects. In such cases, the obtained image may be the image to be processed. In  addition, the user may not point the camera directly at the certificate to take an image. The obtained image may contain other objects (e.g., a desktop where the certificate is placed) except the certificate. In such cases, an image region only including the certificate may be extracted from the obtained image as the image to be processed.
In some embodiments, the target corner points of the processed image may be defined as an upper left corner point, an upper right corner point, a lower right corner point, and a lower left corner point of the processed image. Further, coordinates of four corner points in the image to be processed and serial numbers of the four corner points may be obtained by using the trained feature recognition model. The serial numbers of the four corner points in the image to be processed may be 1, 2, 3, and 4, respectively. No matter how the image to be processed rotates, the corner point 1 of the image to be processed may correspond to the upper left corner point of the processed image, the corner point 2 may correspond to the upper right corner point of the processed image, the corner point 3 may correspond to the lower right corner point of the processed image, and the corner point 4 may correspond to the lower left corner point of the processed image, which may ensure that the processed image is horizontal and proper.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
FIG. 7 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure. In some embodiments, process 700 may be executed by the image processing system 100. For example, the process 700 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image  correction device (e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 700. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 700 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 700 illustrated in FIG. 7 and described below is not intended to be limiting.
In 710, an image to be processed and coordinates of corner points in the image to be processed may be obtained. Operation 710 may be performed by the obtaining model and the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
In 720, a transformation matrix may be calculated based on the coordinates of the corner points and coordinates of target corner points. Operation 720 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
In 730, a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix. Operation 730 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14.
In 740, a text region may be obtained by detecting the processed image. Operation 740 may be performed by the processing module described in FIG. 1, a detection unit 1140 illustrated in FIG. 11, and a detection unit 1460 illustrated in FIG. 14.
In 750, text information may be obtained by performing an optical character recognition (OCR) on the text region. Operation 750 may be performed by the  processing module described in FIG. 1, an information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
In some embodiment, after the processed image is obtained, the text region may be obtained by detecting the processed image. Further, the text information may be obtained by performing the optical character recognition on the text region, which may realize the rapid and accurate extraction of the text information in the processed image and avoid the problems of low efficiency and high error rate caused by manual input.
The process described in FIG. 7 may be used to identify a certificate or a business card. The certificate may include an identity (ID) card, a vehicle certificate, a driving license, a bank card, etc. In scenarios such as the verification of identity information of bank customers and registration information of online car-hailing drivers, etc., a user may take and upload an image of a certificate. The image may be a tilt, non-horizontal image, which means that the user’s photography level is not required. Coordinates of four corner points of the certificate in the image and a confidence level of each of the coordinates of the four corner points may be obtained. When the confidence level is larger than a confidence level threshold, the transformation matrix may be calculated based on the coordinates of the corner points and the coordinates of the target corner points. A horizontal certificate image (i.e., the processed image) may be obtained by transforming each pixel of the image of the certificate by using the transformation matrix. Further, the text in the horizontal certificate image may be quickly and accurately extracted by the OCR, which may avoid the problems of low efficiency and high error rate caused by manual input.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
FIG. 8 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure. In some embodiments, process 800 may be executed by the image processing system 100. For example, the process 800 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image correction device (e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 800. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 800 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 800 illustrated in FIG. 8 and described below is not intended to be limiting.
In 810, an image to be processed, coordinates of corner points in the image to be processed, and confidence levels of the coordinates of the corner points may be obtained. Operation 810 may be performed by the obtaining model and the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14. A confidence level of a coordinate of a corner point may indicate the accuracy of the coordinate of the corner point determined from the image to be processed.
In 820, when the confidence levels of the coordinates of the corner points are larger than a confidence level threshold, a transformation matrix may be determined based on the coordinates of the corner points and coordinates of target corner points. Operation 820 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
In 830, a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix.  Operation 830 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14.
In 840, a text region may be obtained by detecting the processed image. Operation 840 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
In 850, text information may be obtained by performing an optical character recognition on the text region. Operation 850 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
Accordingly, the confidence levels of the coordinates of the corner points may be obtained. When the confidence levels of the coordinates of the corner points are larger than the confidence level threshold, the transformation matrix may be calculated, which may avoid inaccurate image correction caused by inaccurate coordinates of the corner points, thereby improving the accuracy of the image correction.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
FIG. 9 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure. In some embodiments, process 900 may be executed by the image processing system 100. For example, the process 900 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image correction device (e.g., the image correction device 1100 illustrated in FIG. 11, the  image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 900. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 900 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 900 illustrated in FIG. 9 and described below is not intended to be limiting.
In 901, one or more training images may be obtained. Operation 901 may be performed by the training module described in FIG. 1, a model obtaining unit 1160 illustrated in FIG. 11, a training 1440 illustrated in FIG. 14.
In 902, coordinates of corner points of each of the one or more training images may be labeled. Operation 902 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14.
In 903, a plurality of sample images may be obtained by rotating the training images at any angle. Operation 903 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14.
In 904, a trained feature recognition model may be generated based on the plurality of sample images and coordinates of corner points in each of the plurality of sample images. Operation 904 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14.
In 905, an image to be processed may be obtained. Operation 905 may be performed by the obtaining model described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
In 906, coordinates of corner points in the image to be processed and confidence levels of the coordinates of the corner points may be obtained by using the trained feature recognition model. Operation 906 may be performed by the  determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
In 907, if the confidence levels of the coordinates of the corner points are larger than a confidence level threshold, a transformation matrix may be determined based on the coordinates of the corner points and coordinates of target corner points. Operation 907 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
In 908, a processed image may be obtained by performing a perspective transformation on the image to be processed using the transformation matrix. Operation 908 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14.
In 909, a text region may be obtained by detecting the processed image. Operation 909 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
In 910, text information may be obtained by performing an optical character recognition on the text region. Operation 910 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
In this embodiment, the trained feature recognition model used to obtain the coordinates of the corner points in the image to be processed and confidence levels of the coordinates of the corner points may be established. Specifically, the coordinates of four corner points of the training image may be manually labeled. A regression training may be performed on a deep learning model using the coordinates of four corner points of the training image to obtain the trained feature recognition model. The training images may be randomly rotated at any angle during the training process. The coordinates of the corner points in the image to be  processed and confidence levels of the coordinates of the corner points may be obtained by using the trained feature recognition model directly, which may improve the efficiency of the image correction.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
FIG. 10 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure. In some embodiments, process 1000 may be executed by the image processing system 100. For example, the process 1000 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image correction device (e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 1000. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 1000 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 1000 illustrated in FIG. 10 and described below is not intended to be limiting.
In 1010, a trained feature recognition model may be obtained. One or more training images may be obtained. Operation 1010 may be performed by the training module described in FIG. 1, the model obtaining unit 1160 illustrated in FIG. 11, the training 1440 illustrated in FIG. 14. Coordinates of corner points of each of the one or more training images may be labeled. A plurality of sample images may be obtained by rotating the training images at any angle. A trained feature recognition  model may be generated based on the plurality of sample images and coordinates of corner points in each of the plurality of sample images.
In 1020, an image may be sampled. Operation 1020 may be performed by the obtaining model described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14. The image to be processed may be obtained by photographing a certificate.
In 1030, corner points may be detected from the image. Operation 1030 may be performed by the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14. The image to be processed may be input into the trained feature recognition model and the output of the trained feature recognition model may be confidence levels of corner points in the image to be processed and coordinates of the corner points.
In 1040, a perspective transformation may be performed on the image to obtain a horizontal image. Operation 1040 may be performed by the processing module described in FIG. 1, the correction unit 1130 illustrated in FIG. 11, and the perspective transformation unit 1430 illustrated in FIG. 14. The perspective transformation may be performed on the image using a transformation matrix. The transformation matrix may be calculated based on the coordinates of the corner points. The transformation matrix may be used to perform a perspective transformation on an original non-horizontal image (i.e., the image to be processed) to obtain a horizontal image (i.e., the processed image) .
In 1050, a text region may be detected from the horizontal image. Operation 1050 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
In 1060, an optical character recognition may be performed on the text region. Operation 1060 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection  unit 1460 illustrated in FIG. 14. The text region may be recognized by the optical character recognition.
In this embodiment, a rotation angle of the image to be processed is not calculated. The four corner points in the image to be processed may be located by using the deep learning model. According to coordinates of the four corner points, a transformation matrix may be calculated. The transformation matrix may be used to perform the perspective transformation on the original tilt image (i.e., the image to be processed) . After the perspective transformation is performed, a horizontal image (i.e., the processed image) may be obtained.
FIG. 11 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure. The image correction device 1100 may include an information obtaining unit 1110, a calculation unit 1120, a correction unit 1130, a detection unit 1140, an information recognition unit 1150, and a model obtaining unit 1160.
The information obtaining unit 1110 may be configured to obtain an image to be processed and coordinates of corner points in the image to be processed. More descriptions regarding the obtaining of the image to be processed and the coordinates of the corner points in the image to be processed may be found elsewhere in the present disclosure (e.g., FIG. 5 and the description thereof) .
The calculation unit 1120 may be configured to calculate a transformation matrix based on the coordinates of the corner points in the image and coordinates of target corner points. More descriptions regarding the calculation of the transformation matrix may be found elsewhere in the present disclosure (e.g., FIG. 5 and the description thereof) .
The correction unit 1130 may be configured to obtain a processed image by performing a perspective transformation on the image to be processed using the transformation matrix. More descriptions regarding the obtaining of the processed image may be found elsewhere in the present disclosure (e.g., FIG. 5 and the description thereof) .
The detection unit 1140 may be configured to obtain a text region by detecting the processed image. More descriptions regarding the obtaining of the text region may be found elsewhere in the present disclosure (e.g., FIG. 7 and the description thereof) .
The information recognition unit 1150 may be configured to obtain text information by performing an optical character recognition (OCR) on the text region. More descriptions regarding the obtaining of the text information may be found elsewhere in the present disclosure (e.g., FIG. 7 and the description thereof) .
The model obtaining unit 1160 may be configured to obtain a trained feature recognition model. In some embodiments, the model obtaining unit 1160 may be configured to obtain one or more training images and label coordinates of corner points of each of the one or more training images. The model obtaining unit 1160 may be configured to obtain a plurality of sample images by rotating the training images at any angle. Further, the model obtaining unit 1160 may be configured to generate the trained feature recognition model based on the plurality of sample images and coordinates of corner points in each of the plurality of sample images. More descriptions regarding the obtaining of the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 9 and the description thereof) .
The units in the image correction device 1100 may be connected to or communicate with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof. The wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof. It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not  depart from the scope of the present disclosure. In some embodiments, two or more of the units may be combined as a single unit, and any one of the units may be divided into two or more units. In some embodiments, one or more of the units may be omitted. For example, the model obtaining unit 1160 may be omitted. As another example, the detection unit 1140, the information recognition unit 1150, and the model obtaining unit 1160 may be omitted.
FIG. 12 is a flowchart illustrating an exemplary process for correcting (or processing) an image according to some embodiments of the present disclosure. In some embodiments, process 1200 may be executed by the image processing system 100. For example, the process 1200 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image correction device (e.g., the image correction device 1100 illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 1200. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 1200 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 1200 illustrated in FIG. 12 and described below is not intended to be limiting.
In 1210, coordinates and serial numbers of corner points of a certificate in an image may be obtained by using a trained feature recognition model. Operation 1210 may be performed by the determination module described in FIG. 1, the information obtaining unit 1110 illustrated in FIG. 11, and the obtaining unit 1410 illustrated in FIG. 14.
In some embodiments, information obtained by using the trained feature recognition model from the image may also include at least one of confidence levels of the corner points, a frame of the certificate, and a rotation direction of the certificate, etc.
In some embodiments, the trained feature recognition model may be obtained based on a plurality of training samples by a training process. The plurality of training samples may include a plurality of sample images. For each of the plurality of sample images, coordinates and serial numbers of corner points of a certificate in the sample image may be marked. The training process of the trained feature recognition model may be performed in a similar or same manner as process 400 as described in connection with FIG. 4, and the descriptions thereof are not repeated here.
In some embodiments, when a shape of the certificate in the image is a polygon, for example, a quadrilateral (e.g., a rectangular) , a pentagon, a hexagonal, the corner points of the certificate may be vertices of the polygon, for example, four vertices of the quadrilateral, five vertices of the pentagon, six vertices of the hexagon. For each of the corner points, coordinates of the corner point of the certificate may be coordinates of a vertex of the polygon in the image.
In some embodiments, the image may be taken, using an imaging device (e.g., a camera in the user device 130) , by a user in real-time. In some alternative embodiments, the image may be uploaded by the user from an album in the user device 130. It is understood that these embodiments described herein are not intended to be limiting. In the present disclosure, a manner of obtaining the image may be determined according to user needs.
In some embodiments, the image may be an image that includes at least one complete certificate in the image. If the image does not include a complete certificate, an instruction indicating a correction failure will be returned to the image processing system 100.
In 1220, a transformation matrix may be determined based on the coordinates of the corner points of the certificate in the image. Operation 1220 may be performed by the processing module described in FIG. 1, the calculation unit 1120 illustrated in FIG. 11, and the calculation unit 1420 illustrated in FIG. 14.
The transformation matrix may refer to a matrix used to perform a plurality of transformation operations on an image. The transformation operations may include a linear transformation, a translation, a perspective transformation, or the like, or any combination thereof.
In some embodiments, it is assumed that a shape of the certificate in the image is a rectangle and the corner points of the certificate are the four vertices of the rectangle, the transformation matrix may be determined as
Figure PCTCN2020122362-appb-000003
where
Figure PCTCN2020122362-appb-000004
indicates the linear transformation, [a 31 a 32] indicates the translation, [a 13 a 23T indicates the perspective transformation, a 33 may be a constant.
In 1230, a processed image may be determined by processing the image based on the transformation matrix. Operation 1230 may be performed by the processing module described in FIG. 1, a correction unit 1130 illustrated in FIG. 11, and a perspective transformation unit 1430 illustrated in FIG. 14.
In some embodiments, the processed image may be determined by mapping coordinates of pixels in the image using the transformation matrix to coordinates of a pixel in the processed image according to formula (2) below:
Figure PCTCN2020122362-appb-000005
wherein [x′, y′, w′] refers to coordinates of pixels in the processed image, [u, v, w] refers to coordinates of pixels in the processed image. The pixel with the coordinates in the processed image that is mapped by the coordinates of the pixel in the image using the transformation matrix may be designated with a pixel value of the pixel in the image. The processed image may be obtained by designating the pixel value of each pixel in the image as a pixel value of a pixel in the processed image that is mapped with the each pixel in the image.
In some embodiments, the processed image may be adjusted based on the serial numbers of the corner points of the certificate in the image. Each of the serial  numbers of the corner points of the certificate in the image may correspond to a standard serial number of each corner point. For a specific certificate with a specific shape, a standard serial number of each corner point in the specific certificate may be a default setting of the image processing system 100. For example, for a certificate with a rectangle, standard serial numbers of four  vertices  313, 314, 311, and 312 of the rectangle may be 1’, 2’, 3’, and 4’, respectively illustrated in FIG. 3. The obtained serial numbers of the four  vertices  313, 314, 311, and 312 in the image may be 3, 4, 1, and 2, respectively illustrated in FIG. 3. Merely by way of example, the processed image may be adjusted by transforming the processed image so that the serial numbers of the corner points in the image correspond to standard serial numbers. For example, the obtained serial numbers of the corner points 314, 313, 312, and 311 in the image may be 4, 3, 2, and 1, respectively illustrated in FIG. 3. To adjust the processed image, the processed image may be transformed to make the corner point 311 with the serial number 1 coincides with the corner point with the standard serial number 1, the corner point with the serial number 2 coincides with the corner point with the standard serial number 2 coincide, the corner point with the serial number 3 coincides with the corner point with the standard serial number 3, and the corner point with the serial number 4 coincides with the corner point with the standard serial number 4. More descriptions regarding the adjustment of the processed image based on the serial numbers of the corner points may be found elsewhere in the present disclosure (e.g., FIG. 3 and the description thereof) .
Commonly, the processed image may be determined by using a traditional correction algorithm. When the traditional correction algorithm detects that a state of the certificate in the image is horizontal and vertical, a direction of the certificate in the image may be not corrected. For example, for a certificate with a rectangular shape, when the traditional correction algorithm detects that two of the four sides of the certificate are vertical and the other two are horizontal, the direction of the certificate may be not corrected. However, in this case, the direction of the  certificate may be wrong. For example, the certificate in the image may be upside down, that is the certificate is rotated 180° from a right position. As another example, the certificate image may be rotated 90° from a right position, that is two sides of the four sides in the certificate are vertical and the other two sides are horizontal. The traditional algorithm may only guarantee that the certificate in the processed image is horizontal and vertical, but it cannot guarantee that the direction of the certificate in the processed image or the image is correct.
In the present disclosure, the coordinates and serial numbers of corner points of the certificate in the image may be obtained by using the trained feature recognition model. The transformation matrix may be determined based on the coordinates of the corner points of the certificate in the image. Further, the processed image may be determined by processing the image based on the transformation matrix. Compared with the processed image determined based on the traditional correction algorithm, the influence of the image background on the processed image may be reduced or eliminated, the wrong direction of the certificate in the image may be corrected, the perspective problem of the image is solved, thereby improving the image quality of the processed image.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
FIG. 13 is a flowchart illustrating an exemplary process for recognizing information in a corrected image according to some embodiments of the present disclosure. In some embodiments, process 1300 may be executed by the image processing system 100. For example, the process 1300 may be implemented as a set of instructions (e.g., an application) stored in a storage device disclosed elsewhere in the present disclosure. In some embodiments, the processing device 112 and/or an image correction device (e.g., the image correction device 1100  illustrated in FIG. 11, the image correction device 1400 illustrated in FIG. 14) may execute the set of instructions and may accordingly be directed to perform the process 1300. The operations of the illustrated process presented below are intended to be illustrative. In some embodiments, the process 1300 may be accomplished with one or more additional operations not described and/or without one or more of the operations discussed. Additionally, the order of the operations of process 1300 illustrated in FIG. 13 and described below is not intended to be limiting.
In 1310, fields may be detected in a processed image and a field region representing each field may be extracted from the processed image. Operation 1310 may be performed by the processing module described in FIG. 1, the detection unit 1140 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
Each of the fields may consist of one or more characters (e.g., words, texts, numbers, symbols) . In some embodiments, the field region representing a field may be extracted from the processed image based on an extraction function. The extraction function may be a default setting of the image processing system 100 or may be adjustable under different situations. In some embodiments, a target field may be determined according to the user needs. A field region representing the target field may be extracted from the processed image. It is understood that these embodiments described herein are not intended to be limiting. In the present disclosure, the field region may be extracted according to other manners.
in 1320, the one or more characters in each field may be recognized from the field region based on a preset algorithm. Operation 1320 may be performed by the processing module described in FIG. 1, the information recognition unit 1150 illustrated in FIG. 11, and the detection unit 1460 illustrated in FIG. 14.
In some embodiments, the one or more characters may be recognized from the field region by using an optical character recognition (OCR) algorithm. The OCR algorithm may be used to automatically recognize characters (e.g., words, texts, numbers, symbols) in a certificate in an image (e.g., the processed image) .  The certificate may include an identity (ID) card, a vehicle certificate, a driving license, a bank card, etc. The OCR algorithm may have a wide range of uses in different fields, for example, the verification of identity information of bank customers and registration information of online car-hailing drivers, etc. The OCR algorithm may quickly and accurately recognize and extract the characters in the certificate in the image, which may solve the problems of low efficiency and high error rate of manual input. Further, the one or more characters may be output to a user device (e.g., the user device 130) or an external device.
As described in connection with FIG 12, the processed image may be determined. According to the above embodiments, the one or more characters (i.e., effective information) in the certificate may be recognized by using the OCR algorithm, which not only solves the problem that the traditional correction algorithm is greatly affected by the image background, while solving the problem of inaccurate determination of the direction of the certificate in the image when the traditional correction algorithm is used and the inability to correct the perspective image, thereby improving the image quality of the processed image and the accuracy of character recognition in the processed image.
It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.
FIG. 14 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure. The image correction device 1400 may include an obtaining unit 1410, a calculation unit 1420, a perspective transformation unit 1430, a training unit 1440, an output unit 1450, and a detection unit 1460.
The obtaining unit 1410 may be configured to obtain coordinates and serial numbers of corner points of a certificate in an image by using a trained feature  recognition model. More descriptions regarding the obtaining of the coordinates and the serial numbers of the corner points may be found elsewhere in the present disclosure (e.g., operation 1210 in FIG. 12 and the description thereof) .
The calculation unit 1420 may be configured to determine a transformation matrix based on the coordinates of the corner points of the certificate in the image. More descriptions regarding the determination of the transformation matrix may be found elsewhere in the present disclosure (e.g., operation 1220 in FIG. 12 and the description thereof) .
The perspective transformation unit 1430 may be configured to determine a processed image by processing the image based on the transformation matrix. More descriptions regarding the determination of the processed image may be found elsewhere in the present disclosure (e.g., operation 1230 in FIG. 12 and the description thereof) .
The training unit 1440 may be configured to obtain the trained feature recognition model. More descriptions regarding the obtaining of the trained feature recognition model may be found elsewhere in the present disclosure (e.g., FIG. 4 and the description thereof) .
The detection unit 1460 may be configured to detect fields in the processed image and extract a field region representing each field from the processed image. More descriptions regarding the detection of the fields in the processed image and the extraction of the field region representing each field may be found elsewhere in the present disclosure (e.g., operation 1310 in FIG. 13 and the description thereof) . The detection unit 1460 may be also configured to recognize the one or more characters in each field from the field region based on a preset algorithm. More descriptions regarding the recognition of the one or more characters in each field may be found elsewhere in the present disclosure (e.g., operation 1320 in FIG. 13 and the description thereof) .
The output unit 1450 may be configured to output the coordinates and the serial numbers of the corner points of the certificate in the image. The output unit  1450 may be configured to output the processed image. The output unit 1450 may be also configured to output the one or more characters in each field recognized from the field region.
The units in the image correction device 1400 may be connected to or communicate with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof. The wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof. It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations or modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, two or more of the units may be combined as a single unit, and any one of the units may be divided into two or more units. In some embodiments, one or more of the units may be omitted. For example, the detection unit 1460 may be omitted. As another example, the training unit 1440, the output unit 1450, and the detection unit 1460 may be omitted.
FIG. 15 is a block diagram illustrating an exemplary image correction device according to some embodiments of the present disclosure. The image correction device 1500 may include a processor 1510, a memory 1520, and a bus 1530. The memory 1520 may store machine-readable instructions executable by the processor 1510. When the image correction device 1500 is operated, the processor 1510 may communicate with the memory 1520 through the bus 1530, and the processor 1510 may execute the machine-readable instructions to execute to implement a process (e.g., process 200, process 400, process 500, process 700, process 800, process 900, process 1000, process 1200, process 1300) described elsewhere in the present disclosure.
In some embodiments, the present disclosure may also provide a storage medium storing computer program. The computer program may be executed to implement a process (e.g., process 200, process 400, process 500, process 700, process 800, process 900, process 1000, process 1200, process 1300) described elsewhere in the present disclosure. The storage medium may be a general storage medium, such as a removable disk, a hard disk, etc. As a result, the problem of excessive resource occupation caused by the excessively large database size that is caused by the diversity of language expressions is solved, thereby achieving the effect of reducing resource occupation.
Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.
Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.
Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or  composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a “unit, ” “module, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, or the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the "C" programming language, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the  latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a Software as a Service (SaaS) .
Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing server or mobile device.
Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, claimed subject matter may lie in less than all features of a single foregoing disclosed embodiment.

Claims (21)

  1. A system, comprising:
    at least one storage device including a set of instructions; and
    at least one processor in communication with the at least one storage device, wherein when executing the set of instructions, the at least one processor is directed to perform operations including:
    obtaining an image of an object;
    determining, based on the image, feature information of the object in the image by using a trained feature recognition model; and
    processing, based on the feature information, the image.
  2. The system of claim 1, wherein:
    the feature information includes positions of at least three key points of the object in the image; and
    the processing the image based on the feature information includes:
    obtain reference positions of the at least three key points of the object; and
    processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object.
  3. The system of claim 2, wherein the at least three key points includes corner points of the object.
  4. The system of claim 2 or claim 3, wherein the processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object includes:
    determining a transformation matrix based on the positions of at least three key points of the object in the image and the reference positions of the at least three key points of the object; and
    processing the image to obtain a processed image by transforming the image using the transformation matrix.
  5. The system of claim 1, wherein:
    the feature information includes direction information of the object, the direction information of the object including a deflection direction of the object relative to a reference line associated with the image; and
    the processing the image based on the feature information includes processing the image based on the direction information of the object.
  6. The system of claim 1, wherein:
    the feature information includes a confidence level, the confidence level indicating accuracy of the feature information of the object determined from the image; and
    the processing the image based on the feature information includes:
    determining whether the confidence level of the feature information satisfies a condition;
    in response to determining that the confidence level of the feature information satisfies the condition, processing the image based on the feature information.
  7. The system of any one of claims 1-6, wherein the processing the image includes:
    determining a target region of the object from the image by using an object recognition model; and
    processing the target region of the object in the image.
  8. The system of claim 7, wherein the determining the target region of the object from the image by using the object recognition model includes:
    inputting the feature information of the object into the object recognition model;  and
    determining the target region of the object from the image based on an output of the object recognition model.
  9. The system of claim 1, wherein:
    the trained feature recognition model includes a plurality of sub-models, each of the plurality of sub-models corresponding to a reference object type; and
    the determining, based on the image, the feature information of the object in the image by using the trained feature recognition model includes:
    determining an object type of the object;
    obtaining a sub-model from the plurality of sub-models based on the object type of the object and the reference object type corresponding to the sub-model; and
    determining the feature information of the object in the image by using the sub-model.
  10. The system of any one of claims 1-9, wherein the trained feature recognition model is obtained by a training process including:
    obtaining a plurality of training samples each of which includes an image obtained by performing an angle transformation on a sample image; and
    obtaining the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples.
  11. A method implemented on a computing device including at least one processor, at least one storage medium, and a communication platform connected to a network, the method comprising:
    obtaining an image of an object;
    determining, based on the image, feature information of the object in the image by using a trained feature recognition model; and
    processing, based on the feature information, the image.
  12. The method of claim 11, wherein:
    the feature information includes positions of at least three key points of the object in the image; and
    the processing the image based on the feature information includes:
    obtain reference positions of the at least three key points of the object; and
    processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object.
  13. The method of claim 12, wherein the at least three key points includes corner points of the object.
  14. The method of claim 12 or claim 13, wherein the processing the image based on the positions of the at least three key points of the object and the reference positions of the at least three key points of the object includes:
    determining a transformation matrix based on the positions of at least three key points of the object in the image and the reference positions of the at least three key points of the object; and
    processing the image to obtain a processed image by transforming the image using the transformation matrix.
  15. The method of claim 11, wherein:
    the feature information includes direction information of the object, the direction information of the object including a deflection direction of the object relative to a reference line associated with the image; and
    the processing the image based on the feature information includes processing the image based on the direction information of the object.
  16. The method of claim 11, wherein:
    the feature information includes a confidence level, the confidence level indicating accuracy of the feature information of the object determined from the image; and
    the processing the image based on the feature information includes:
    determining whether the confidence level of the feature information satisfies a condition;
    in response to determining that the confidence level of the feature information satisfies the condition, processing the image based on the feature information.
  17. The method of any one of claims 11-16, wherein the processing the image includes:
    determining a target region of the object from the image by using an object recognition model; and
    processing the target region of the object in the image.
  18. The method of claim 17, wherein the determining the target region of the object from the image by using the object recognition model includes:
    inputting the feature information of the object into the object recognition model; and
    determining the target region of the object from the image based on an output of the object recognition model.
  19. The method of claim 11, wherein:
    the trained feature recognition model includes a plurality of sub-models, each of the plurality of sub-models corresponding to a reference object type; and
    the determining, based on the image, the feature information of the object in the  image by using the trained feature recognition model includes:
    determining an object type of the object;
    obtaining a sub-model from the plurality of sub-models based on the object type of the object and the reference object type corresponding to the sub-model; and
    determining the feature information of the object in the image by using the sub-model.
  20. The method of any one of claims 11-19, wherein the trained feature recognition model is obtained by a training process including:
    obtaining a plurality of training samples each of which includes an image obtained by performing an angle transformation on a sample image; and
    obtaining the trained feature recognition model by training a preliminary machine learning model based on the plurality of training samples.
  21. A non-transitory computer readable medium, comprising executable instructions that, when executed by at least one processor, direct the at least one processor to perform a method, the method comprising:
    obtaining an image of an object;
    determining, based on the image, feature information of the object in the image by using a trained feature recognition model; and
    processing, based on the feature information, the image.
PCT/CN2020/122362 2019-10-24 2020-10-21 Systems and methods for image processing WO2021078133A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911016421.2A CN111860527A (en) 2019-10-24 2019-10-24 Image correction method, image correction device, computer device, and storage medium
CN201911016421.2 2019-10-24
CN201911252892.3A CN111860489A (en) 2019-12-09 2019-12-09 Certificate image correction method, device, equipment and storage medium
CN201911252892.3 2019-12-09

Publications (1)

Publication Number Publication Date
WO2021078133A1 true WO2021078133A1 (en) 2021-04-29

Family

ID=75619621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122362 WO2021078133A1 (en) 2019-10-24 2020-10-21 Systems and methods for image processing

Country Status (1)

Country Link
WO (1) WO2021078133A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919899A (en) * 2017-01-18 2017-07-04 北京光年无限科技有限公司 The method and system for imitating human face expression output based on intelligent robot
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108776787A (en) * 2018-06-04 2018-11-09 北京京东金融科技控股有限公司 Image processing method and device, electronic equipment, storage medium
CN109102026A (en) * 2018-08-16 2018-12-28 新智数字科技有限公司 A kind of vehicle image detection method, apparatus and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919899A (en) * 2017-01-18 2017-07-04 北京光年无限科技有限公司 The method and system for imitating human face expression output based on intelligent robot
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108776787A (en) * 2018-06-04 2018-11-09 北京京东金融科技控股有限公司 Image processing method and device, electronic equipment, storage medium
CN109102026A (en) * 2018-08-16 2018-12-28 新智数字科技有限公司 A kind of vehicle image detection method, apparatus and system

Similar Documents

Publication Publication Date Title
WO2019174130A1 (en) Bill recognition method, server, and computer readable storage medium
US11615643B2 (en) Methods, systems, and media for evaluating images
CN110781885A (en) Text detection method, device, medium and electronic equipment based on image processing
CN110163087B (en) Face gesture recognition method and system
CN112016438A (en) Method and system for identifying certificate based on graph neural network
WO2022105517A1 (en) Systems and methods for detecting traffic accidents
US11003730B2 (en) Systems and methods for parent-child relationship determination for points of interest
CN111860489A (en) Certificate image correction method, device, equipment and storage medium
WO2023005091A1 (en) Systems and methods for object detection
EP3783524A1 (en) Authentication method and apparatus, and electronic device, computer program, and storage medium
CN110222641B (en) Method and apparatus for recognizing image
EP4256517A1 (en) Systems and methods for temperature determination
WO2020125062A1 (en) Image fusion method and related device
CN111191644B (en) Identity recognition method, system and device
WO2021147938A1 (en) Systems and methods for image processing
EP4229540A1 (en) Systems and methods for image detection
CN112529827A (en) Training method and device for remote sensing image fusion model
US9087272B2 (en) Optical match character classification
WO2022247406A1 (en) Systems and methods for determining key frame images of video data
CN113313114B (en) Certificate information acquisition method, device, equipment and storage medium
US10866633B2 (en) Signing with your eyes
WO2024051593A1 (en) Systems and methods for image processing
WO2021078133A1 (en) Systems and methods for image processing
WO2019201141A1 (en) Methods and systems for image processing
WO2021223709A1 (en) Systems and methods for barcode decoding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20879655

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20879655

Country of ref document: EP

Kind code of ref document: A1