CN116092094A - Image text recognition method and device, computer readable medium and electronic equipment - Google Patents

Image text recognition method and device, computer readable medium and electronic equipment Download PDF

Info

Publication number
CN116092094A
CN116092094A CN202111307156.0A CN202111307156A CN116092094A CN 116092094 A CN116092094 A CN 116092094A CN 202111307156 A CN202111307156 A CN 202111307156A CN 116092094 A CN116092094 A CN 116092094A
Authority
CN
China
Prior art keywords
image
gray
layer
text
complaint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111307156.0A
Other languages
Chinese (zh)
Inventor
夏磊豪
陈萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111307156.0A priority Critical patent/CN116092094A/en
Priority to PCT/CN2022/118298 priority patent/WO2023077963A1/en
Publication of CN116092094A publication Critical patent/CN116092094A/en
Priority to US18/354,726 priority patent/US20230360183A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application belongs to the technical field of computers, and particularly relates to an image text recognition method and device, a computer readable medium and electronic equipment. The method comprises the following steps: converting the image to be processed into a gray image, and dividing the gray image into gray image layers corresponding to all the image layer intervals according to the image layer intervals to which the gray values of all the pixel points of the gray image belong; image corrosion is carried out on each gray level image layer, so that a characteristic layer corresponding to each gray level image layer is obtained, and the characteristic layer comprises a plurality of communication areas; superposing the characteristic layers to obtain a superposed characteristic layer, wherein the superposed characteristic layer comprises a plurality of communicated areas; expanding each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region; and identifying the text of each text region on the superimposed feature layer to obtain an identification text corresponding to the image to be processed. Therefore, the recognition accuracy of the connected areas of each layer is improved, and the text of the image to be processed can be accurately recognized.

Description

Image text recognition method and device, computer readable medium and electronic equipment
Technical Field
The application belongs to the technical field of computers, and particularly relates to an image text recognition method and device, a computer readable medium and electronic equipment.
Background
With the development of computer science and technology, the automation information processing capability and level are also improved significantly. The electronic realization of a picture document has been paid attention to by researchers of related art as one of the indispensable steps in the electronic realization of a document.
The text recognition method in the related technology needs to rely on manually setting characteristics and rules according to scene changes of the picture document, is greatly influenced by subjective factors, is poor in generality, and is only good in scene effect on the current design characteristics and rules. Once scene changes are analyzed, the originally designed features and rules are often no longer applicable, and the accuracy of text recognition is low.
It should be noted that the information disclosed in the foregoing background section is only for enhancing understanding of the background of the present application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
The present application aims to provide an image text recognition method, an image text recognition device, a computer readable medium and an electronic device, which at least overcome the technical problem of how to improve the accuracy of text recognition in the related art to a certain extent.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned in part by the practice of the application.
According to an aspect of the embodiments of the present application, there is provided an image text recognition method, including:
converting an image to be processed into a gray image, and dividing the gray image into gray layers corresponding to each gray layer interval according to the layer interval to which the gray value of each pixel point of the gray image belongs, wherein the layer interval is used for representing the value range of the gray value of the pixel point in the corresponding gray layer;
image corrosion is carried out on each gray scale layer to obtain a characteristic layer corresponding to each gray scale layer, wherein the characteristic layer comprises a plurality of communication areas, and the communication areas are areas comprising a plurality of pixel points with communication relations;
superposing the characteristic layers to obtain a superposed characteristic layer, wherein the superposed characteristic layer comprises a plurality of communication areas;
expanding each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region;
and identifying the text of each text region on the superimposed feature layer to obtain an identification text corresponding to the image to be processed.
According to an aspect of the embodiments of the present application, there is provided an image text recognition apparatus including:
the image layer segmentation module is configured to convert an image to be processed into a gray image, and segment the gray image into gray image layers corresponding to the image layer intervals according to the image layer intervals to which the gray values of the pixel points of the gray image belong, wherein the image layer intervals are used for representing the value ranges of the gray values of the pixel points in the corresponding gray image layers;
the corrosion module is configured to perform image corrosion on each gray level image layer to obtain a characteristic layer corresponding to each gray level image layer, wherein the characteristic layer comprises a plurality of communication areas, and the communication areas are areas comprising a plurality of pixel points with a communication relationship;
the feature superposition module is configured to superpose the feature layers to obtain a superposition feature layer, and the superposition feature layer comprises a plurality of communication areas;
the expansion module is configured to expand each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region;
and the text recognition module is configured to recognize the text of each text region on the superimposed feature layer to obtain a recognition text corresponding to the image to be processed.
In some embodiments of the present application, based on the above technical solutions, the image text recognition apparatus further includes:
a minimum value determining unit configured to determine one or more minimum values in a distribution frequency of each gray value in the gray image according to gray values of each pixel point of the gray image;
a full-value range determining unit configured to determine a minimum value of a full-value range according to a minimum gray value of gray values of the gray images; determining the maximum value of the full value range according to the maximum gray value of the gray values of the gray images;
and the layer interval acquisition unit is configured to divide the full-value range into a plurality of layer intervals according to the gray values corresponding to the minimum values.
In some embodiments of the present application, based on the above technical solution, the layer interval obtaining unit includes:
a sorting subunit configured to sort the minimum value of the full-value range, and the gray values corresponding to the minimum values in order from small to large or from large to small;
and the layer interval segmentation subunit is configured to segment the full value range by taking two gray values which are adjacent in sequence as two interval endpoints corresponding to the layer interval, so as to obtain a plurality of layer intervals which are connected end to end and do not overlap.
In some embodiments of the present application, based on the above technical solutions, the minimum value determining unit includes:
a distribution frequency determining subunit configured to calculate a distribution frequency of each gray value according to the gray value of each pixel point in the gray image;
a distribution function obtaining subunit configured to obtain a corresponding distribution function according to the distribution frequency of each gray value in the gray image;
a smooth curve obtaining subunit configured to perform function smoothing on the distribution function to obtain a smooth curve corresponding to the distribution function;
and the minimum value acquisition subunit is configured to identify each trough of the smooth curve, and take the value of the point corresponding to each trough as the minimum value in the distribution frequency of each gray value in the gray image.
In some embodiments of the present application, based on the above technical solutions, the corrosion module includes:
a binary layer obtaining unit configured to determine a target threshold in a gray value interval of the gray layer, and to correspond a gray value greater than or equal to the target threshold in the gray layer to a first value, and to correspond a gray value smaller than the target threshold in the gray layer to a second value, so as to form a binary layer corresponding to the gray layer;
A mark communication region acquisition unit configured to perform image erosion on the binary image to obtain a mark communication region composed of a plurality of pixel points having a gray value of the first value;
and the corrosion unit is configured to reserve pixel values in the gray level layer, which are positioned at positions corresponding to the mark communication areas of the binary image layer, and discard pixel values in the gray level layer, which are positioned outside the positions corresponding to the mark communication areas of the binary image layer.
In some embodiments of the present application, based on the above technical solution, the preset direction is a horizontal direction or a vertical direction, and the expansion module includes:
an circumscribed rectangle obtaining unit configured to obtain a circumscribed rectangle of the communication area, expand the communication area to fill the circumscribed rectangle, the circumscribed rectangle being a rectangle circumscribed with the communication area in a preset direction;
a nearest-neighbor communication region acquisition unit configured to acquire a nearest-neighbor communication region of the communication region, the nearest-neighbor communication region being a communication region having a shortest distance from the communication region;
and a text region acquisition unit configured to expand the connected region toward the direction of the nearest connected region to obtain the text region when the direction of the nearest connected region relative to the connected region is a preset direction.
In some embodiments of the present application, based on the above technical solutions, the text recognition module includes:
a text cutting unit configured to perform text cutting on the text region to obtain one or more word regions;
the character recognition unit is configured to recognize characters of the single character areas and obtain character information corresponding to the single character areas;
a text information obtaining unit configured to combine character information corresponding to each single word region according to arrangement positions of each single word region in the text region, so as to obtain text information corresponding to the text region;
and the identification text acquisition unit is configured to acquire identification texts of the images to be processed according to the text information corresponding to the text areas.
In some embodiments of the present application, based on the above technical solutions, the text cutting unit includes:
a length-to-height ratio calculation subunit configured to calculate a length-to-height ratio of the text region, the length-to-height ratio being a ratio of a length of the text region to a height of the text region;
a character prediction subunit configured to calculate a predicted number of characters of the text region according to the aspect ratio;
And the single word region acquisition subunit is configured to uniformly cut the text region in the length direction according to the expected number to obtain the expected number of single word regions.
In some embodiments of the present application, based on the above technical solution, the single word region acquiring subunit includes:
a pre-cut number acquisition subunit configured to acquire a pre-cut number according to the predicted number, the pre-cut number being greater than or equal to the predicted number;
a cut line uniform arrangement subunit configured to uniformly arrange candidate cut lines in a length direction on the text region according to the pre-cut number, the candidate cut lines being capable of uniformly cutting the text region in the length direction to obtain the pre-cut number of candidate regions;
a target cut line acquisition subunit configured to set, as a target cut line, a candidate cut line having adjacent cut lines on both sides;
a distance sum computation subunit configured to detect a distance sum of distances between the target cut line and candidate cut lines adjacent to both sides;
a target cut line retaining subunit configured to retain the target cut line when a ratio of the sum of distances to a height of the text region is greater than or equal to a preset ratio;
And a target cut line discarding subunit configured to discard the target cut line when a ratio of the sum of distances to the height of the text region is smaller than a preset ratio.
In some embodiments of the present application, based on the above technical solution, the feature stacking module includes:
the superimposed feature layer acquisition unit is configured to superimpose the feature layers to obtain superimposed feature layers;
a merging connected region acquiring unit configured to merge connected regions with a separation distance smaller than a preset distance on the superimposed feature layer into a merging connected region;
an area-ratio calculating unit configured to calculate an area of a communication area from each feature layer in the merged communication area, and calculate an area ratio corresponding to each feature layer, the area ratio being a ratio of an area of a communication area at a corresponding position of the feature layer to an area of the merged communication area;
and a connected region replacement unit configured to replace the merged connected region with a connected region of a corresponding position of the feature layer having the largest area ratio.
In some embodiments of the present application, based on the above technical solutions, the method is applied to the automated processing of a complaint sheet, the image to be processed includes an image in the complaint sheet; the image text recognition apparatus further includes:
The label classification unit is configured to input an identification text corresponding to the image to be processed into a pre-trained neural network model to obtain a complaint effectiveness label and a complaint risk label corresponding to a complaint list where the image to be processed is located;
a complaint single database storage unit configured to store a complaint effectiveness label and a complaint risk label corresponding to the complaint sheet, and a subject corresponding to the complaint sheet into a complaint single database.
In some embodiments of the present application, based on the above technical solutions, the image text recognition apparatus further includes:
a trade data acquisition unit configured to acquire information flow data and funds flow data of a trade order, the trade order corresponding to a target subject;
a tag search unit configured to search the complaint ticket database according to the target subject to acquire a target complaint ticket corresponding to the target subject, and a complaint efficacy tag and a complaint risk tag of the target complaint ticket pair;
the risk policy suggestion acquisition unit is configured to input information flow data and fund flow data of the transaction orders, and complaint effectiveness labels and complaint risk labels corresponding to the target subjects into a pre-trained decision tree model to obtain risk policy suggestions corresponding to the target subjects, wherein the risk policy suggestions comprise one or more of trust transaction orders, limit transaction order amounts, penalize transaction orders, intercept transaction orders and remind transaction risks.
According to an aspect of the embodiments of the present application, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements an image text recognition method as in the above technical solution.
According to an aspect of the embodiments of the present application, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the image text recognition method as in the above technical solution via execution of the executable instructions.
According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the image text recognition method as in the above technical solution.
In the technical scheme provided by the embodiment of the application, according to the layer intervals to which the gray values of all pixel points of the gray image belong, dividing the gray image into gray layers corresponding to all the layer intervals; image corrosion is carried out on each gray level image layer; superposing the characteristic layers to obtain a superposed characteristic layer; expanding each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region; and identifying the text of each text region on the superimposed feature layer to obtain an identification text corresponding to the image to be processed. Therefore, the gray image is divided into the gray image layers corresponding to each image layer interval, image corrosion is carried out on each gray image layer, corrosion treatment is carried out on each gray image layer in the image to be treated, the corrosion effect on each image layer is improved, omission and misidentification of identification of a communication area are avoided, the identification accuracy of the communication area can be improved, and therefore accurate identification of texts of the image to be treated can be achieved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.
Fig. 2 schematically illustrates a flow chart of steps of an image text recognition method according to some embodiments of the present application.
Fig. 3 schematically illustrates a flowchart of steps before dividing a gray scale image into gray scale layers corresponding to respective layer sections according to layer sections to which gray scale values of respective pixels of the gray scale image belong in an embodiment of the present application.
Fig. 4 schematically illustrates a correspondence between a gray value and a distribution frequency of a gray image according to some embodiments of the present application.
Fig. 5 schematically illustrates a flowchart of the steps of dividing the full-value range into a plurality of layer sections according to the gray scale values corresponding to the respective minima in an embodiment of the present application.
Fig. 6 schematically illustrates a flowchart of the steps for determining one or more minima in the distribution frequency of each gray value in a gray image according to the gray value of each pixel of the gray image in an embodiment of the present application.
Fig. 7 schematically illustrates a step flow chart of performing image erosion on each gray scale layer to obtain a feature layer corresponding to each gray scale layer, where the feature layer includes a plurality of connected regions in an embodiment of the present application.
Fig. 8 schematically illustrates a flowchart of steps for stacking feature layers to obtain a stacked feature layer in an embodiment of the present application.
Fig. 9 schematically illustrates a flowchart of a step of expanding each connected region on the superimposed feature layer according to a preset direction to obtain a text region in an embodiment of the present application.
Fig. 10 schematically illustrates a flowchart of a step of identifying a text of each text region on the feature layer to obtain an identified text corresponding to an image to be processed in an embodiment of the present application.
Fig. 11 schematically illustrates a flowchart of the steps for text cutting a text region to obtain one or more word regions in an embodiment of the present application.
Fig. 12 schematically illustrates a flowchart of steps for uniformly cutting text regions in a length direction according to a predicted number to obtain a predicted number of single-word regions in an embodiment of the present application.
Fig. 13 schematically illustrates a flowchart of steps after identifying text of each text region on the feature layer in an embodiment of the present application to obtain an identified text corresponding to an image to be processed.
Fig. 14 schematically illustrates a schematic diagram of a model internal structure of a first sub-neural network model according to an embodiment of the present application.
Fig. 15 schematically illustrates a schematic diagram of a model internal structure of the second sub-neural network model according to an embodiment of the present application.
FIG. 16 schematically illustrates a flowchart of steps after storing a complaint effectiveness label and a complaint risk label corresponding to a complaint ticket, and a subject corresponding to the complaint ticket, in a complaint ticket database, in an embodiment of the present application.
FIG. 17 schematically illustrates a specific process of inputting information flow data and fund flow data of a trade order, and complaint effectiveness labels and complaint risk labels corresponding to a target subject into a pre-trained decision tree model to obtain risk policy suggestions corresponding to the target subject in an embodiment of the present application.
Fig. 18 schematically shows a block diagram of an image text recognition apparatus provided in an embodiment of the present application.
Fig. 19 schematically shows a block diagram of a computer system for implementing an electronic device according to an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present application. One skilled in the relevant art will recognize, however, that the aspects of the application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, map construction, and other techniques, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and the like.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
Blockchains are novel application modes of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The blockchain underlying platform may include processing modules for user management, basic services, smart contracts, operation monitoring, and the like. The user management module is responsible for identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, maintenance of corresponding relation between the real identity of the user and the blockchain address (authority management) and the like, and under the condition of authorization, supervision and audit of transaction conditions of certain real identities, and provision of rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node devices, is used for verifying the validity of a service request, recording the service request on a storage after the effective request is identified, for a new service request, the basic service firstly analyzes interface adaptation and authenticates the interface adaptation, encrypts service information (identification management) through an identification algorithm, and transmits the encrypted service information to a shared account book (network communication) in a complete and consistent manner, and records and stores the service information; the intelligent contract module is responsible for registering and issuing contracts, triggering contracts and executing contracts, a developer can define contract logic through a certain programming language, issue the contract logic to a blockchain (contract registering), invoke keys or other event triggering execution according to the logic of contract clauses to complete the contract logic, and simultaneously provide a function of registering contract upgrading; the operation monitoring module is mainly responsible for deployment in the product release process, modification of configuration, contract setting, cloud adaptation and visual output of real-time states in product operation, for example: alarms, monitoring network conditions, monitoring node device health status, etc.
The platform product service layer provides basic capabilities and implementation frameworks of typical applications, and developers can complete the blockchain implementation of business logic based on the basic capabilities and the characteristics of the superposition business. The application service layer provides the application service based on the block chain scheme to the business participants for use.
The embodiments of the present application relate to computer vision technology and machine learning technology of artificial intelligence, and are specifically described by the following embodiments.
Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.
As shown in fig. 1, system architecture 100 may include a terminal device 110, a network 120, and a server 130. Terminal device 110 may include various electronic devices such as smart phones, tablet computers, notebook computers, desktop computers, and the like. The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Network 120 may be a communication medium of various connection types capable of providing a communication link between terminal device 110 and server 130, and may be, for example, a wired communication link or a wireless communication link.
The system architecture in the embodiments of the present application may have any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 130 may be a server group composed of a plurality of server devices. In addition, the technical solution provided in the embodiment of the present application may be applied to the terminal device 110, or may be applied to the server 130, or may be implemented by the terminal device 110 and the server 130 together, which is not limited in particular in this application.
For example, the image text recognition method according to the embodiment of the present application may be installed on the server 130, and the user interacts with the server 130 through the client on the terminal device 110. Thus, the gray image is divided into gray layers corresponding to each of the layer sections according to the layer section to which the gray value of each pixel point of the gray image belongs; image corrosion is carried out on each gray level image layer; superposing the characteristic layers to obtain a superposed characteristic layer; expanding each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region; and identifying the text of each text region on the superimposed feature layer to obtain an identification text corresponding to the image to be processed. Therefore, the gray image is divided into the gray image layers corresponding to each image layer interval, image corrosion is carried out on each gray image layer, corrosion treatment is carried out on each gray image layer in the image to be treated, the corrosion effect on each image layer is improved, omission and misidentification of identification of a communication area are avoided, the identification accuracy of the communication area can be improved, and therefore accurate identification of texts of the image to be treated can be achieved.
Or, for example, the image text recognition method of the embodiment of the present application may be mounted on the server 130, and applied to automatic processing of a complaint form, where a user uploads the complaint form to the server 130 through a client on the terminal device 110, after the server performs text recognition on the complaint form by using the image text recognition method of the present application, the server inputs the recognition text corresponding to each text region into a pre-trained neural network model, obtains a complaint effectiveness tag and a complaint risk tag corresponding to the complaint form, and stores the complaint effectiveness tag and the complaint risk tag corresponding to the complaint form and a subject corresponding to the complaint form in a complaint form database, thereby realizing automatic processing of the complaint form, saving labor, and improving the processing efficiency of the complaint form.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
In the related art, the text detection idea of the image may be to extract the text of the image by means of edge detection. However, edge detection under an image of a complex background may cause excessive edges of the background portion (i.e., increased noise), while edge information of the text portion is easily ignored, resulting in poor effect. If corrosion or swelling is performed at this time, the background area is adhered to the text area, and the effect is further deteriorated. In some scenarios, for example, the pictures in the complaint order may be chat shots, product page shots, etc., the page background is complex, and the recognition capability for the text in the image is poor.
According to the method and the device, the gray image is divided into the gray image layers corresponding to the image layer intervals, image corrosion is conducted on the gray image layers, corrosion treatment is conducted on the gray image layers in the image to be treated respectively, the corrosion effect on the image layers is improved, omission and misidentification of identification of the communication area are avoided, the identification accuracy of the communication area can be improved, and therefore accurate identification of texts of the image to be treated can be achieved.
The image text recognition method provided by the application is described in detail below with reference to the specific embodiments.
Fig. 2 schematically illustrates a flow chart of steps of an image text recognition method according to some embodiments of the present application. The execution subject of the image text recognition method may be a terminal device, a server, or the like, and the application is not limited thereto. As shown in fig. 2, the image text recognition method may mainly include the following steps S210 to S250.
S210, converting the image to be processed into a gray image, and dividing the gray image into gray image layers corresponding to all the image layer intervals according to the image layer intervals to which the gray values of all the pixel points of the gray image belong, wherein the image layer intervals are used for representing the value ranges of the gray values of the pixel points in the corresponding gray image layers.
Specifically, the image to be processed may be a chat record picture, a trade order interface, a document, an advertisement screenshot, and the like. The value ranges of the layer intervals can be preset value ranges which are not overlapped with each other.
Therefore, the gray image can be divided into gray image layers corresponding to each image layer interval, and pixel points with the gray values close to each other are divided into the same image layer, so that the image corrosion and the identification of the communication area are respectively carried out on each image layer in the subsequent steps, the corrosion effect on each image layer can be improved, and the omission and the misrecognition of the identification of the communication area are avoided.
Fig. 3 schematically illustrates a flowchart of steps before dividing a gray scale image into gray scale layers corresponding to respective layer sections according to layer sections to which gray scale values of respective pixels of the gray scale image belong in an embodiment of the present application. As shown in fig. 3, in addition to the above embodiment, the following steps S310 to S330 may be further included before dividing the grayscale image into grayscale layers corresponding to the respective layer sections according to the layer sections to which the grayscale values of the respective pixel points of the grayscale image belong in step S210.
S310, determining one or more minimum values in the distribution frequency of each gray value in the gray image according to the gray value of each pixel point of the gray image.
S320, determining the minimum value of the full value range according to the minimum gray value of the gray image; and determining the maximum value of the full-value range according to the maximum gray value of the gray values of the gray images.
S330, dividing the full-value range into a plurality of layer intervals according to the gray values corresponding to the minimum values.
Fig. 4 schematically illustrates a correspondence between a gray value and a distribution frequency of a gray image according to some embodiments of the present application. For example, referring to fig. 4, according to a schematic diagram of the correspondence between the gray value and the distribution frequency of the gray image, the minimum value corresponding to six minimum value points in the distribution frequency of each gray value in the gray image can be determined: minimum value 0 corresponding to minimum value point (48,0), minimum value 8 corresponding to minimum value point (72,8), minimum value 150 corresponding to minimum value point (100, 150), minimum value 95 corresponding to minimum value point (120, 95), minimum value 14 corresponding to minimum value point (141,14), and minimum value 0 corresponding to minimum value point (218,0). Then, the minimum value of the full-value range is determined as the gradation value 49 from the minimum gradation value 49 of the gradation values of the gradation image, or any gradation value smaller than the minimum gradation value 49, for example, gradation values 0, 1, 5, etc., may be used as the minimum value of the full-value range. Next, the maximum value of the full-value range is determined as the gray value 217 from the maximum gray value 217 of the gray values of the gray image, or any gray value larger than the maximum gray value 217, for example, the gray values 250, 254, 255, or the like may be used as the maximum value of the full-value range.
For example, the minimum value of the full-value range is determined to be the gray value 49 from the minimum gray value 49 of the gray value of the gray image, the maximum value of the full-value range is determined to be the gray value 217 from the maximum gray value 217 of the gray value of the gray image, and then the full-value range is divided into a plurality of layer intervals [49,72], (72,100 ], (100, 120], (120,141 ], (141,217) according to the gray values corresponding to the respective minimum values.
For example, the minimum value of the full value range is determined to be 0 which is smaller than the minimum gray value 49 based on the minimum gray value 49 of the gray image, the maximum value of the full value range is determined to be 255 which is larger than the maximum gray value 217 based on the maximum gray value 217 of the gray image, and then the full value range is divided into a plurality of layer sections [0,72], (72,100 ], (100, 120], (120,141 ], (141,255) based on the gray value corresponding to each minimum value after the minimum gray value 48 and the maximum gray value 218 of the gray values corresponding to the minimum value are removed.
In some embodiments, the corresponding relationship between the gray value of the gray image and the occurrence probability of each gray value may be generated according to the gray value of each pixel of the gray image, one or more minima in the occurrence probability of each gray value in the gray image may be determined, and the full value range may be divided into a plurality of image layer intervals according to the gray value corresponding to each minima, which is similar to steps S310 to S330, and will not be described herein.
Therefore, the full-value range is divided into a plurality of image layer sections, the gray level image is divided into gray level image layers corresponding to the image layer sections according to the image layer sections, the image layers are corroded, the gray level values of the image layers are approximate, and the corrosion effect on the image can be improved.
In some embodiments, before the full-value range is divided into the plurality of layer intervals according to the gray values corresponding to the respective minima in step S330, one or more maxima in the distribution frequency of the respective gray values in the gray image may be determined according to the gray values of the respective pixels of the gray image, and then the number of layer intervals into which the full-value range is divided may be determined according to the number of maxima, where the value range of each layer interval includes the corresponding one of the maxima. Specifically, referring to fig. 4, before the full-value range is divided into a plurality of layer intervals according to the gray values corresponding to the minimum values in step S330, one or more maximum values in the distribution frequency of each gray value in the gray image are determined according to the gray values of each pixel point of the gray image: maximum 254 corresponding to maximum point (60,254), maximum 610 corresponding to maximum point (94,610), maximum 270 corresponding to maximum point (106,270), maximum 305 corresponding to maximum point (130,305), and maximum 202 corresponding to maximum point (156,202). Then, according to the number of maximum values 5, the number of layer intervals into which the full-value range is divided is determined to be 5. The value range of each layer interval comprises a corresponding maximum value. Then, as described in the above embodiment, the full value range is divided into 5 layer intervals [49,72], (72,100 ], (100, 120], (120,141 ], (141,217 ]) according to the gray values corresponding to the respective minima.
Fig. 5 schematically illustrates a flowchart of the steps of dividing the full-value range into a plurality of layer sections according to the gray scale values corresponding to the respective minima in an embodiment of the present application. As shown in fig. 5, in addition to the above embodiment, the step S330 may further include the following steps S510 to S520, in which the full-value range is divided into a plurality of layer sections according to the gradation values corresponding to the respective minima.
S510, sequencing the minimum value of the full value range, the maximum value of the full value range and the gray values corresponding to the minimum values in order from small to large or from large to small;
s520, taking two gray values which are adjacent in sequence as two interval endpoints corresponding to the layer interval, and dividing the full value range to obtain a plurality of layer intervals which are connected end to end and are not overlapped.
For example, in the embodiment of fig. 4, a gray value 0 smaller than the minimum gray value 49 is set as the minimum value of the full-value range, and a gray value 255 larger than the maximum gray value 217 is set as the maximum value of the full-value range. Then, the minimum value 0 of the full value range, the maximum value 255 of the full value range, and the gray values 48, 72,100, 120,141, 218 corresponding to the respective minimum values are sorted in order from small to large or from large to small to obtain the gray values after the minimum gray value 48 and the maximum gray value 218 are removed: 0. 72,100, 120,141, 255. Then, the two gray values adjacent in sequence are used as two interval endpoints corresponding to the layer interval, and the full value range is divided to obtain a plurality of layer intervals [0, 72], (72,100 ], (100, 120], (120,141) and (141,255) which are connected end to end and are not overlapped.
Fig. 6 schematically illustrates a flowchart of the steps for determining one or more minima in the distribution frequency of each gray value in a gray image according to the gray value of each pixel of the gray image in an embodiment of the present application. As shown in fig. 6, on the basis of the above embodiment, determining one or more minimum values in the distribution frequency of each gray value in the gray image according to the gray value of each pixel point of the gray image in step S310 may further include the following steps S610 to S640.
S610, calculating the distribution frequency of each gray value according to the gray value of each pixel point in the gray image;
s620, obtaining a corresponding distribution function according to the distribution frequency of each gray value in the gray image;
s630, performing function smoothing on the distribution function to obtain a smooth curve corresponding to the distribution function;
s640, identifying each trough of the smooth curve, and taking the value of the point corresponding to each trough as the minimum value in the distribution frequency of each gray value in the gray image.
Specifically, the function smoothing is performed on the distribution function, which may be core density estimation (kernel density estimation) performed on the distribution function, so that the distribution of the distribution function is smooth and continuous, and therefore clear trough can be obtained, more accurate minimum values can be obtained from a statistical angle, and therefore, the layer interval can be divided according to the clustering trend of the gray values of the gray images, the division of the layer interval is more accurate, similar pixels with the gray values close to each other are divided into the same layer, the recognition accuracy of the connected region is improved, and the recognition accuracy of the text of the image to be processed can be improved.
In some embodiments, in addition to smoothing the distribution function by using the kernel density estimation method, filtering or other methods may be used to smooth the distribution function, which is not limited in this application.
In some embodiments, after step S630, each peak of the smooth curve may be further identified, the value of the point corresponding to each peak is taken as the maximum value in the distribution frequency of each gray value in the gray image, and then the number of layer intervals into which the full-value range is divided is determined according to the number of the maximum values, where the value range of each layer interval includes a corresponding one of the maximum values.
S220, performing image corrosion on each gray level image layer to obtain a characteristic layer corresponding to each gray level image layer, wherein the characteristic layer comprises a plurality of communication areas, and the communication areas are areas comprising a plurality of pixel points with communication relations.
Specifically, the image erosion may be performed by scanning the pixels one by one using convolution check pixels, which is not limited in this application.
The communication region is a region including a plurality of pixel points having a communication relationship. In the region of the pixel points having the communication relationship, each pixel point has an adjacent relationship with at least one of the pixel points of the region. The adjacency may include 4 adjacency, 8 adjacency, etc.
Fig. 7 schematically illustrates a step flow chart of performing image erosion on each gray scale layer to obtain a feature layer corresponding to each gray scale layer, where the feature layer includes a plurality of connected regions in an embodiment of the present application. As shown in fig. 7, on the basis of the above embodiment, the image erosion is performed on each gray scale layer in step S220 to obtain a feature layer corresponding to each gray scale layer, where the feature layer includes a plurality of communication areas, and the following steps S710 to S730 may be further included.
S710, determining a target threshold value in a gray value interval of a gray level layer, enabling a gray level value larger than or equal to the target threshold value in the gray level layer to correspond to a first value, enabling a gray level value smaller than the target threshold value in the gray level layer to correspond to a second value, and forming a binary level layer corresponding to the gray level layer;
s720, performing image corrosion on the binary image to obtain a mark communication area consisting of a plurality of pixel points with gray values of a first value;
s730, reserving pixel values in the gray level layer, which are positioned at positions of the mark communication areas corresponding to the binary level layer, and discarding pixel values in the gray level layer, which are positioned outside positions of the mark communication areas corresponding to the binary level layer.
Therefore, after the binary image layer corresponding to the gray image layer is determined, image corrosion is carried out on the binary image, a mark communication area formed by a plurality of pixel points with gray values of a first value is obtained, then the pixel values of the gray image layer, which are positioned at the positions of the mark communication areas corresponding to the binary image layer, are reserved, the pixel values of the gray image layer, which are positioned outside the positions of the mark communication areas corresponding to the binary image layer, are discarded, the corrosion of the gray image layer can be realized under the condition that the multi-level gray values of the pixel points of the gray image layer are not lost, and the identification of the communication areas in the image layer can be realized under the condition that the color level accuracy of the image layer is reserved.
S230, superposing the feature layers to obtain a superposed feature layer, wherein the superposed feature layer comprises a plurality of communication areas.
Fig. 8 schematically illustrates a flowchart of steps for stacking feature layers to obtain a stacked feature layer in an embodiment of the present application. As shown in fig. 8, on the basis of the above embodiment, each feature layer is superimposed in step S230 to obtain a superimposed feature layer, and the following steps S810 to S840 may be further included.
S810, superposing the characteristic layers to obtain a superposed characteristic layer;
S820, combining the communication areas with the interval distance smaller than the preset distance on the overlapped characteristic layers into a combined communication area;
s830, calculating the area of the communication areas from each feature layer in the combined communication area, and calculating the area occupation ratio corresponding to each feature layer, wherein the area occupation ratio is the ratio of the area of the communication area at the corresponding position of the feature layer to the area of the combined communication area;
and S840, replacing the combined communication area with the communication area of the corresponding position of the characteristic layer with the largest area ratio.
Therefore, each feature layer is overlapped to obtain an overlapped feature layer, the communication areas with the interval distance smaller than the preset distance on the overlapped feature layer are combined into a combined communication area, and the original combined or close communication areas among the layers can be combined to generate association, so that the association among the layers can be enhanced, and the identification accuracy of the layers to be processed is improved. Then, the combined communication area is replaced by the communication area with the largest area ratio at the corresponding position of the feature layer, so that only the communication area with the largest area ratio at the corresponding position of the feature layer in the combined communication area is reserved, namely, only the communication area with the larger contribution to the corresponding position of the feature layer is reserved, the recognition of the combined communication area can pay more attention to the feature layer with the larger contribution to the feature layer, and the recognition accuracy of the communication area can be improved, and the text recognition accuracy of an image to be processed is improved.
S240, expanding each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region.
Specifically, the preset direction is a horizontal direction, a vertical direction, a 30-degree inclined direction, a 45-degree inclined direction, a 60-degree inclined direction, a curve direction with a certain curvature, and the like, and different preset directions can be adopted according to the application scene.
Fig. 9 schematically illustrates a flowchart of a step of expanding each connected region on the superimposed feature layer according to a preset direction to obtain a text region in an embodiment of the present application. As shown in fig. 9, on the basis of the above embodiment, the preset direction is the horizontal direction or the vertical direction, and each communication area on the overlapped feature layer is expanded according to the preset direction in step S240 to obtain a text area, which may further include the following steps S910 to S930.
S910, obtaining an circumscribed rectangle of the communication area, expanding the communication area until the communication area is fully filled with the circumscribed rectangle, wherein the circumscribed rectangle is a rectangle circumscribed with the communication area in a preset direction;
s920, acquiring the nearest adjacent communication area of the communication area, wherein the nearest adjacent communication area is the communication area with the shortest interval distance with the communication area;
and S930, expanding the communication area towards the direction of the nearest communication area when the direction of the nearest communication area relative to the communication area is a preset direction, so as to obtain the text area.
Therefore, expansion in the preset direction between the communication area and the nearest communication area can be achieved, and the text area is obtained. It will be appreciated that chinese characters such as "small", "denier", "eight", "meta", etc. are not identified in the layer as one connected region, but as a plurality of connected regions, since the interior of such chinese characters is not fully connected, but is separated by portions of the character that are incomplete. According to the method and the device, expansion in the preset direction between the connected region and the nearest connected region is achieved, the text region is obtained, the connected regions containing incomplete characters or single characters can be connected into the text region through expansion, and the text region can comprise a plurality of characters. In the expansion process, incomplete characters are wrapped in the expansion area, so that missing recognition of the characters or independent recognition of the incomplete characters can be avoided, and further the text recognition capability of an image to be processed can be improved.
In some embodiments, when the direction of the nearest connected region relative to the connected region is a preset direction, the connected region is expanded towards the direction of the nearest connected region, wherein the preset direction is a horizontal direction, so that in combination with the reading habit of a person, the characters of most of the images are transversely typeset, and the text recognition accuracy of most of the images to be processed can be improved.
In some embodiments, when the direction of the nearest connected region relative to the connected region is a preset direction, the connected region is triggered to be co-expanded towards the direction opposite to the nearest connected region, so as to obtain the text region. Thus, the communication region and the nearest communication region can be expanded together in the opposite direction, so that the expansion is more uniform, and a more accurate text region can be obtained.
In some embodiments, when the direction of the nearest communication region relative to the communication region is a preset direction, and the distance between the nearest communication region and the communication region is smaller than the first preset distance, the communication region is expanded towards the direction of the nearest communication region, so as to obtain the text region. Therefore, when the interval distance between the nearest communicating region and the communicating region is too large, expansion still occurs between the nearest communicating region and the communicating region, and the situation that unrelated communicating regions are expanded and communicated to obtain a text region can be avoided, and the accuracy of identifying the text region can be improved.
S250, identifying the text of each text area on the superimposed feature layer to obtain an identification text corresponding to the image to be processed.
Specifically, each text region on the superimposed feature layer may be input into a pre-trained machine learning model, to obtain a recognition text corresponding to the image to be processed. The pre-trained machine learning model is based on a model that can be built from CNN (Convolutional Neural Network ) model, CNN+LSTM (Long Short-Term Memory network) model, faster RCNN, etc. Training data may be constructed first, and sample images may be constructed using 48 x 48 gray scale images, each of which may include a single character, as training data for training a machine learning model. In order to ensure the sufficiency of training data, 45 fonts with different styles, such as Song Ti, bold, regular script and nonstandard handwriting, can be collected, so that various printing fonts can be comprehensively covered, and the recognition capability of a machine learning model on characters can be improved.
In some embodiments, the fonts of different styles can respectively comprise pictures with different font sizes, and each font size is provided with a plurality of sheets, so that the diversity of training data and the comprehensiveness of coverage are improved.
In some embodiments, a predetermined proportion of random artifacts such as 5%, 6%, 7%, 8%, 9%, or 10% may be added to each sample image, thereby enhancing the generalization ability of the machine learning model.
Fig. 10 schematically illustrates a flowchart of a step of identifying a text of each text region on the feature layer to obtain an identified text corresponding to an image to be processed in an embodiment of the present application. As shown in fig. 10, on the basis of the above embodiment, the step S250 of identifying the text of each text region on the feature layer to obtain the identification text corresponding to the image to be processed may further include the following steps S1010 to S1040.
S1010, performing text cutting on the text region to obtain one or more single-word regions;
s1020, identifying characters of each single-word area to obtain character information corresponding to each single-word area;
s1030, combining character information corresponding to each single word region according to the arrangement positions of each single word region in the text region to obtain text information corresponding to the text region;
s1040, acquiring the identification text of the image to be processed according to text information corresponding to the text areas.
Specifically, the identification text of the image to be processed may be obtained according to text information corresponding to the text regions, for example, the text regions with similar positions and distributed line by line may be spliced line by line according to the positions of the text regions in the image to be processed, so as to obtain the identification text of the image to be processed.
Therefore, after the text region is cut to obtain the single character region, the characters of each single character region are identified, and as the identified objects are the single character regions, compared with the method for directly identifying the whole text region, the identification method can be simplified, and the identification accuracy can be improved. For example, the recognition of a single word region is easier to construct and train a recognition model than the construction and training of the overall recognition of a text region, and better training results can be achieved by a lower training data volume.
Fig. 11 schematically illustrates a flowchart of the steps for text cutting a text region to obtain one or more word regions in an embodiment of the present application. As shown in fig. 11, on the basis of the above embodiment, text cutting is performed on the text region in step S1010 to obtain one or more single-word regions, and the following steps S1110 to S1130 may be further included.
S1110, calculating the length-height ratio of the text region, wherein the length-height ratio is the ratio of the length of the text region to the height of the text region;
s1120, calculating the expected number of characters of the text region according to the length-to-height ratio;
and S1130, uniformly cutting the text regions in the length direction according to the expected number to obtain the expected number of single-word regions.
It will be appreciated that there is generally a relatively fixed aspect ratio for each character in the same language. Therefore, the expected number of characters in the text region can be estimated approximately according to the length-height ratio of the text region, so that the text region can be cut accurately, and the accurate identification of the single-word region can be realized.
Fig. 12 schematically illustrates a flowchart of steps for uniformly cutting text regions in a length direction according to a predicted number to obtain a predicted number of single-word regions in an embodiment of the present application. As shown in fig. 12, on the basis of the above embodiment, the text region is uniformly cut in the length direction according to the predicted number in step S1130, to obtain the predicted number of individual character regions, and the following steps S1210 to S1260 may be further included.
S1210, acquiring the pre-cutting number according to the expected number, wherein the pre-cutting number is greater than or equal to the expected number;
s1220, uniformly arranging candidate cutting lines on the text region in the length direction according to the number of the pre-cuts, wherein the candidate cutting lines can uniformly cut the text region in the length direction to obtain a number of pre-cut candidate regions;
s1230, taking candidate cutting lines with adjacent cutting lines on two sides as target cutting lines;
S1240, detecting the sum of the distances between the target cutting line and the candidate cutting lines adjacent to the two sides;
s1250, when the ratio of the sum of the distances to the height of the text area is greater than or equal to a preset ratio, reserving the target cutting line;
s1260, discarding the target cutting line when the ratio of the sum of the distances to the height of the text region is smaller than a preset ratio.
Therefore, the method from step S1210 to step S1260 is executed by combining the fact that the interval between two characters is generally the minimum interval, taking the empirical value of the ratio between the minimum interval between two characters and the height of the text line composed of the characters as the preset ratio, and screening of candidate cutting lines can be achieved, so that the cutting accuracy of a single character area is improved, and the character recognition accuracy is further improved.
Fig. 13 schematically illustrates a flowchart of steps after identifying text of each text region on the feature layer in an embodiment of the present application to obtain an identified text corresponding to an image to be processed. As shown in fig. 13, on the basis of the above embodiment, the method is applied to the automated processing of a complaint sheet, and the image to be processed includes an image in the complaint sheet; the step S250 of identifying the text of each text region on the feature layer to be processed to obtain the identified text corresponding to the image to be processed may further include the following steps S1310 to S1320.
S1310, inputting an identification text corresponding to the image to be processed into a pre-trained neural network model to obtain a complaint effectiveness label and a complaint risk label corresponding to a complaint list where the image to be processed is located;
s1320, storing the complaint effectiveness label and the complaint risk label corresponding to the complaint form and the main body corresponding to the complaint form into a complaint form database.
The complaint effectiveness labels may include complaint effective labels and complaint ineffective labels. Complaint risk tags may include empty classification tags, fraud risk tags, transaction dispute risk tags, and the like.
The neural network model may include a first sub-neural network model and a second sub-neural network model. The first sub-neural network model can be a pre-training model such as BERT (Bidirectional Encoder Representation from Transformers), and can perform semantic understanding and text classification on the identification text corresponding to the image to be processed to obtain a complaint efficacy label corresponding to the identification text. The second sub-neural network model can be a classification model such as CRF (Conditional Random Fields, conditional random field) and the like, and can perform semantic understanding, information extraction and text classification on the identification text corresponding to the image to be processed to obtain a complaint risk label corresponding to the identification text.
In some embodiments, the recognition text corresponding to the image to be processed may be first cleaned and denoised, and then input into the pre-trained neural network model. Specifically, the data cleaning may include removing illegal characters and stop words, emoticons, etc. in the recognition text corresponding to the image to be processed, and then performing misprint correction and symbol cleaning on the text.
In some embodiments, the pre-trained neural network model may be deployed on a quasi-real-time platform, outputting the complaint effectiveness label and the complaint risk label corresponding to the complaint form at the hour level, and storing the complaint effectiveness label and the complaint risk label corresponding to the complaint form, and the subject corresponding to the complaint form in a complaint form database.
Fig. 14 schematically illustrates a schematic diagram of a model internal structure of a first sub-neural network model according to an embodiment of the present application. Specifically, after word segmentation processing is performed on the recognition text corresponding to the image to be processed, the recognition text is input into the first sub-neural network model. For example, the recognition text corresponding to the image to be processed is: "you good, I call Zhang San". And the identification text corresponding to the image to be processed is good, and I call for three. After word segmentation, we get "[ CLS ]/your/,/me/call/Zhang Sanhe/. [ SEP ] ". Let X1 = "yougood", X2 = ",", X3 = "me", X4 = "call", X5 = "Zhang San", X6 = ". ", xn=" [ SEP ] ", is input into the first sub-neural network model as shown in fig. 14. The code E [ CLS ] of X [ CLS ] is obtained by embedding and coding X [ CLS ], the code E1 of X1 is obtained by embedding and coding X1, the code EN of XN is obtained by embedding and coding XN … …, and so on. Then E [ CLS ], E1 … … EN are input into a transducer neural network to obtain corresponding text characteristics C, T1 … … TN, and then an identification text 'your sound, I call Zhang three' corresponding to the image to be processed is obtained according to the text characteristics C, T1 … … TN. "complaint efficacy label.
Fig. 15 schematically illustrates a schematic diagram of a model internal structure of the second sub-neural network model according to an embodiment of the present application. For example, the recognition text corresponding to the image to be processed is: "I come from A market". And the identification text corresponding to the image to be processed is from the city A. After word segmentation, I/O/A city/' is obtained. ". Let X1 = "me", X2 = "from", X3 = "a city", X4 = ". "is input into the first sub-neural network model as shown in fig. 15. The code E1 of X1 is obtained by embedding and coding X1, the code E2 of X2 is obtained by embedding and coding X2, the code EN of XN is obtained by embedding and coding XN … …, and so on. Then E1 and E2 … … EN are input into a transducer neural network to obtain corresponding text characteristics T1 and T2 … … TN, and then the text characteristics T1 and T2 … … TN are input into a neural network consisting of a plurality of LSTMs to obtain corresponding type characteristics C1 and C2 … … CN. Finally, according to the type characteristics C1 and C2 … … CN, the identification text 'I come from A' corresponding to the image to be processed is obtained. "complaint risk tag. Complaint risk tags may include empty classification tags, fraud risk tags, transaction dispute risk tags, and the like.
Therefore, the automatic processing of the complaint list is realized by carrying out text recognition on the image in the complaint list and inputting the recognition text corresponding to the image to be processed into the pre-trained neural network model to obtain the complaint effectiveness label and the complaint risk label of the recognition text corresponding to the image to be processed, the labor cost for checking the complaint list by a person can be saved, and the processing efficiency of the complaint list can be improved through automatic processing, so that the complaint order with larger harm is timely stopped.
It can be understood that characters contained in assaults in the complaint list are likely to be transaction content information or communication content before transaction, so that the embodiment of the application can effectively identify the maliciousness of merchants and the transaction categories of the merchants, obtain complaint efficacy labels and complaint risk labels of identification texts corresponding to images to be processed, and realize automatic processing of the complaint list.
In addition, the text of the image to be processed can be accurately identified, so that loss of effective information in the complaint picture can be reduced, and the processing accuracy and rationality of the automatic processing of the complaint list can be improved.
In one application scenario, there may be cases where blackout is fraudulently performed on online payment, and how to obtain effective information for identification and hit of abnormal merchants is a big issue. Complaints are usually carried out when a user notices an abnormality in the transaction, and a large amount of text information can be contained in a complaint picture in a complaint order submitted by the user in the complaint. Therefore, the method and the device can effectively identify the maliciousness of the commercial tenant and the transaction category of the commercial tenant in the application scene, obtain the complaint effectiveness label and the complaint risk label of the identification text corresponding to the image to be processed, realize the automatic processing of the complaint list, and are favorable for accurately, timely and comprehensively striking the black product.
FIG. 16 schematically illustrates a flowchart of steps after storing a complaint effectiveness label and a complaint risk label corresponding to a complaint ticket, and a subject corresponding to the complaint ticket, in a complaint ticket database, in an embodiment of the present application. As shown in fig. 16, after storing the complaint efficacy label and the complaint risk label corresponding to the complaint ticket and the subject corresponding to the complaint ticket in the complaint ticket database in step S1320, the following steps S1610 to S1630 may be further included.
S1610, obtaining information flow data and fund flow data of a trade order, wherein the trade order corresponds to a target subject;
s1620, searching a complaint list database according to the target main body to obtain a target complaint list corresponding to the target main body, and a complaint effectiveness label and a complaint risk label of the target complaint list;
s1630, inputting information flow data and fund flow data of the trade orders, complaint efficacy labels and complaint risk labels corresponding to the target subjects into a pre-trained decision tree model, and obtaining risk strategy suggestions corresponding to the target subjects, wherein the risk strategy suggestions comprise one or more of trust trade orders, limit trade order amounts, punish trade orders, intercept trade orders and remind trade risks.
FIG. 17 schematically illustrates a specific process of inputting information flow data and fund flow data of a trade order, and complaint effectiveness labels and complaint risk labels corresponding to a target subject into a pre-trained decision tree model to obtain risk policy suggestions corresponding to the target subject in an embodiment of the present application. As shown in fig. 17, after acquiring a complaint sheet and performing text recognition on images in the complaint sheet, the recognition text corresponding to the image to be processed is input into the first sub-neural network model, and a complaint effectiveness label of the recognition text corresponding to the image to be processed is obtained. And inputting the identification text corresponding to the image to be processed into the second sub-neural network model to obtain a complaint risk tag of the identification text corresponding to the image to be processed. Then, the complaint effectiveness label and the complaint risk label corresponding to the complaint form and the main body corresponding to the complaint form are stored in a complaint form database. The real-time policy engine can acquire information flow data and fund flow data of the transaction order in real time, search a complaint order database according to a target subject corresponding to the transaction order, and acquire a target complaint order corresponding to the target subject, and a complaint effectiveness label and a complaint risk label of the target complaint order. And finally, inputting the information flow data and the fund flow data of the trade orders and the complaint effectiveness labels and the complaint risk labels corresponding to the target subjects into a pre-trained decision tree model or a scoring card model which are included in the real-time strategy engine, and obtaining risk strategy suggestions corresponding to the target subjects, wherein the risk strategy suggestions comprise one or more of trust trade orders, limit trade order amounts, punish trade orders, intercept trade orders and remind trade risks.
Specifically, automatic punishment with different gradients can be performed according to different types of risk labels of a target main body corresponding to a transaction order, a serious processing strategy such as closing payment authority, punishing funds and the like is performed on merchants with more complaint effective labels, the limit of the amount is performed on the merchants with fewer complaint effective labels, or a slight processing strategy such as interception reminding and the like is performed on abnormal orders in the merchants, so that wind control on real-time transaction is realized.
Therefore, the complaint effectiveness label and the complaint risk label corresponding to the complaint list and the main body corresponding to the complaint list are stored in the complaint list database, so that the complaint list database is searched according to the target main body to obtain the target complaint list corresponding to the target main body, the complaint effectiveness label and the complaint risk label corresponding to the target complaint list, the information flow data, the fund flow data and the complaint effectiveness label and the complaint risk label corresponding to the target main body of the transaction order are input into a pre-trained decision tree model, the risk strategy suggestion corresponding to the target main body is obtained, an automatic processing strategy can be generated based on the multi-category risk label, whether the complaint label is effective or not and other transaction information of the merchant, a gradient punishment system is facilitated to be established for abnormal merchants, and automatic processing of abnormal transaction orders is realized.
It should be noted that although the steps of the methods in the present application are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
The following describes an embodiment of an apparatus of the present application, which may be used to perform the image text recognition method in the above-described embodiments of the present application. Fig. 18 schematically shows a block diagram of an image text recognition apparatus provided in an embodiment of the present application. As shown in fig. 18, the image text recognition apparatus 1800 includes:
the layer segmentation module 1810 is configured to convert the image to be processed into a gray image, and segment the gray image into gray layers corresponding to each layer interval according to the layer interval to which the gray value of each pixel point of the gray image belongs, where the layer interval is used for representing the value range of the gray value of the pixel point in the corresponding gray layer;
the corrosion module 1820 is configured to perform image corrosion on each gray scale layer to obtain a feature layer corresponding to each gray scale layer, where the feature layer includes a plurality of connected areas, and the connected areas are areas including a plurality of pixel points with a connection relationship;
The feature stacking module 1830 is configured to stack each feature layer to obtain a stacked feature layer, where the stacked feature layer includes a plurality of communication areas;
the expansion module 1840 is configured to expand each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region;
the text recognition module 1850 is configured to recognize the text of each text region on the feature layer to obtain a recognized text corresponding to the image to be processed.
In some embodiments of the present application, based on the above embodiments, the image text recognition apparatus further includes:
a minimum value determining unit configured to determine one or more minimum values in a distribution frequency of each gray value in the gray image, based on the gray values of each pixel point of the gray image;
a full-value range determining unit configured to determine a minimum value of the full-value range from a minimum gray value of the gray values of the gray images; determining the maximum value of the full value range according to the maximum gray value of the gray values of the gray images;
and the layer interval acquisition unit is configured to divide the full-value range into a plurality of layer intervals according to the gray values corresponding to the minimum values.
In some embodiments of the present application, based on the above embodiments, the layer interval acquisition unit includes:
a sorting subunit configured to sort the minimum value of the full-value range, and the gradation values corresponding to the respective minimum values in order from small to large or from large to small;
and the layer interval segmentation subunit is configured to segment the full value range by taking two gray values which are adjacent in sequence as two interval endpoints corresponding to the layer interval, so as to obtain a plurality of layer intervals which are connected end to end and do not overlap.
In some embodiments of the present application, based on the above embodiments, the minimum value determination unit includes:
a distribution frequency determining subunit configured to calculate a distribution frequency of each gray value according to the gray value of each pixel point in the gray image;
the distribution function acquisition subunit is configured to obtain a corresponding distribution function according to the distribution frequency of each gray value in the gray image;
the smooth curve acquisition subunit is configured to perform function smoothing on the distribution function to obtain a smooth curve corresponding to the distribution function;
and the minimum value acquisition subunit is configured to identify each trough of the smooth curve, and take the value of the point corresponding to each trough as the minimum value in the distribution frequency of each gray value in the gray image.
In some embodiments of the present application, based on the above embodiments, the corrosion module includes:
the binary image layer acquisition unit is configured to determine a target threshold value in a gray value interval of the gray image layer, correspond gray values which are larger than or equal to the target threshold value in the gray image layer to a first value, correspond gray values which are smaller than the target threshold value in the gray image layer to a second value, and form a binary image layer corresponding to the gray image layer;
the mark communication area acquisition unit is configured to perform image corrosion on the binary image to obtain a mark communication area consisting of a plurality of pixel points with gray values of a first value;
and the corrosion unit is configured to reserve pixel values in the gray level layer, which are positioned at positions corresponding to the mark communication areas of the binary level layer, and discard pixel values in the gray level layer, which are positioned outside the positions corresponding to the mark communication areas of the binary level layer.
In some embodiments of the present application, based on the above embodiments, the preset direction is a horizontal direction or a vertical direction, and the expansion module includes:
an circumscribed rectangle acquisition unit configured to acquire a circumscribed rectangle of the communication region, expand the communication region to be filled with the circumscribed rectangle, the circumscribed rectangle being a rectangle circumscribed with the communication region in a preset direction;
A nearest-neighbor communication region acquisition unit configured to acquire a nearest-neighbor communication region of a communication region, the nearest-neighbor communication region being a communication region having a shortest distance from the communication region;
and a text region acquisition unit configured to expand the connected region toward the direction of the nearest connected region to obtain the text region when the direction of the nearest connected region with respect to the connected region is a preset direction.
In some embodiments of the present application, based on the above embodiments, the text recognition module includes:
the text cutting unit is configured to cut texts of the text areas to obtain one or more single-word areas;
the character recognition unit is configured to recognize characters of each single character area to obtain character information corresponding to each single character area;
the text information acquisition unit is configured to combine the character information corresponding to each single word area according to the arrangement positions of each single word area in the text area to obtain text information corresponding to the text area;
and the identification text acquisition unit is configured to acquire identification texts of the images to be processed according to the text information corresponding to the text areas.
In some embodiments of the present application, based on the above embodiments, the text cutting unit includes:
A length-to-height ratio calculation subunit configured to calculate a length-to-height ratio of the text region, the length-to-height ratio being a ratio of a length of the text region to a height of the text region;
a character prediction subunit configured to calculate a predicted number of characters of the text region according to the aspect ratio;
and the single-word region acquisition subunit is configured to uniformly cut the text region in the length direction according to the expected number to obtain the expected number of single-word regions.
In some embodiments of the present application, based on the above embodiments, the single word region acquisition subunit includes:
a pre-cut number acquisition subunit configured to acquire a pre-cut number according to the predicted number, the pre-cut number being greater than or equal to the predicted number;
the cutting line uniform arrangement subunit is configured to uniformly arrange candidate cutting lines in the length direction on the text region according to the pre-cutting number, wherein the candidate cutting lines can uniformly cut the text region in the length direction to obtain the pre-cutting number of candidate regions;
a target cut line acquisition subunit configured to set, as a target cut line, a candidate cut line having adjacent cut lines on both sides;
a distance sum calculating subunit configured to detect a distance sum of distances between the target cut line and the candidate cut lines adjacent to both sides;
A target cut line retaining subunit configured to retain the target cut line when a ratio of the sum of distances to the height of the text region is greater than or equal to a preset ratio;
and a target cut line discarding subunit configured to discard the target cut line when the ratio of the sum of distances to the height of the text region is smaller than a preset ratio.
In some embodiments of the present application, based on the above embodiments, the feature stacking module includes:
the superimposed feature layer acquisition unit is configured to superimpose each feature layer to obtain superimposed feature layers;
a merging connected region acquisition unit configured to merge connected regions on the superimposed feature layer, the distance being smaller than a preset distance, into a merging connected region;
an area ratio calculating unit configured to calculate an area of the communication area from each of the feature layers in the merged communication area, and calculate an area ratio corresponding to each of the feature layers, the area ratio being a ratio of the area of the communication area at the corresponding position of the feature layer to the area of the merged communication area;
and a communication region replacement unit configured to replace the merged communication region with a communication region at a corresponding position of the feature layer having the largest area ratio.
In some embodiments of the present application, based on the above embodiments, the method is applied to automated processing of a complaint sheet, the image to be processed comprising an image in the complaint sheet; the image text recognition device further includes:
the label classifying unit is configured to input an identification text corresponding to the image to be processed into the pre-trained neural network model to obtain a complaint effectiveness label and a complaint risk label corresponding to a complaint list where the image to be processed is located;
and a complaint single database storage unit configured to store a complaint effectiveness label and a complaint risk label corresponding to the complaint sheet, and a subject corresponding to the complaint sheet into the complaint sheet database.
In some embodiments of the present application, based on the above embodiments, the image text recognition apparatus further includes:
a trade data acquisition unit configured to acquire information flow data and funds flow data of a trade order, the trade order corresponding to a target subject;
a tag search unit configured to search a complaint form database according to a target subject to acquire a target complaint form corresponding to the target subject, and a complaint effectiveness tag and a complaint risk tag of the target complaint form pair;
the risk policy suggestion acquisition unit is configured to input information flow data and fund flow data of the transaction order, complaint effectiveness labels and complaint risk labels corresponding to the target main body into the pre-trained decision tree model to obtain risk policy suggestions corresponding to the target main body, wherein the risk policy suggestions comprise one or more of trust transaction orders, limit transaction order amounts, punish transaction orders, intercept transaction orders and remind transaction risks.
Specific details of the image text recognition device provided in each embodiment of the present application have been described in detail in the corresponding method embodiments, and are not described herein again.
Fig. 19 schematically shows a block diagram of a computer system for implementing an electronic device according to an embodiment of the present application.
It should be noted that, the computer system 1900 of the electronic device shown in fig. 19 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 19, the computer system 1900 includes a central processing unit 1901 (Central Processing Unit, CPU) that can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 1902 (ROM) or a program loaded from a storage section 1908 into a random access Memory 1903 (Random Access Memory, RAM). In the random access memory 1903, various programs and data required for system operation are also stored. The cpu 1901 and the ram 1902 are connected to each other via a bus 1904. An Input/Output interface 1905 (i.e., an I/O interface) is also connected to bus 1904.
The following components are connected to the input/output interface 1905: an input section 1906 including a keyboard, a mouse, and the like; an output portion 1907 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage section 1908 including a hard disk or the like; and a communication section 1909 including a network interface card such as a local area network card, a modem, and the like. The communication section 1909 performs communication processing via a network such as the internet. The driver 1910 is also connected to the input/output interface 1905 as needed. A removable medium 1911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1910, so that a computer program read out therefrom is installed into the storage portion 1908 as needed.
In particular, according to embodiments of the present application, the processes described in the various method flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 1909, and/or installed from the removable media 1911. The computer programs, when executed by the central processor 1901, perform the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal that propagates in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. An image text recognition method, comprising:
converting an image to be processed into a gray image, and dividing the gray image into gray layers corresponding to each gray layer interval according to the layer interval to which the gray value of each pixel point of the gray image belongs, wherein the layer interval is used for representing the value range of the gray value of the pixel point in the corresponding gray layer;
image corrosion is carried out on each gray scale layer to obtain a characteristic layer corresponding to each gray scale layer, wherein the characteristic layer comprises a plurality of communication areas, and the communication areas are areas comprising a plurality of pixel points with communication relations;
superposing the characteristic layers to obtain a superposed characteristic layer, wherein the superposed characteristic layer comprises a plurality of communication areas;
expanding each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region;
and identifying the text of each text region on the superimposed feature layer to obtain an identification text corresponding to the image to be processed.
2. The method according to claim 1, wherein before the division of the gradation image into gradation layers corresponding to the respective layer sections according to the layer sections to which the gradation values of the respective pixel points of the gradation image belong, the method further comprises:
Determining one or more minimum values in the distribution frequency of each gray value in the gray image according to the gray value of each pixel point of the gray image;
determining the minimum value of the full value range according to the minimum gray value of the gray image; determining the maximum value of the full value range according to the maximum gray value of the gray values of the gray images;
and dividing the full-value range into a plurality of layer intervals according to the gray value corresponding to each minimum value.
3. The method according to claim 2, wherein the dividing the full-value range into the plurality of layer sections according to the gray-scale values corresponding to the minimum values includes:
sequencing the minimum value of the full value range, the minimum value of the full value range and the gray value corresponding to each minimum value in order from small to large or from large to small;
and taking the two gray values which are adjacent in sequence as two interval endpoints corresponding to the layer interval, and dividing the full value range to obtain a plurality of layer intervals which are connected end to end and are not overlapped.
4. The method of claim 2, wherein determining one or more minima in the distribution frequency of each gray value in the gray image from the gray value of each pixel of the gray image comprises:
Calculating the distribution frequency of each gray value according to the gray value of each pixel point in the gray image;
obtaining a corresponding distribution function according to the distribution frequency of each gray value in the gray image;
performing function smoothing on the distribution function to obtain a smooth curve corresponding to the distribution function;
and identifying each trough of the smooth curve, and taking the value of the point corresponding to each trough as the minimum value in the distribution frequency of each gray value in the gray image.
5. The method of claim 1, wherein the performing image erosion on each gray scale layer to obtain a feature layer corresponding to each gray scale layer, the feature layer including a plurality of connected regions, includes:
determining a target threshold value in a gray value interval of the gray level layer, enabling a gray value larger than or equal to the target threshold value in the gray level layer to correspond to a first value, enabling a gray value smaller than the target threshold value in the gray level layer to correspond to a second value, and forming a binary level corresponding to the gray level layer;
performing image corrosion on the binary image to obtain a mark communication area consisting of a plurality of pixel points with gray values of the first numerical value;
And reserving pixel values in the gray level layer, which are positioned at positions corresponding to the mark communication areas of the binary level layer, and discarding the pixel values in the gray level layer, which are positioned outside the positions corresponding to the mark communication areas of the binary level layer.
6. The method according to claim 1, wherein the preset direction is a horizontal direction or a vertical direction, and the expanding each connected region on the overlay feature layer according to the preset direction to obtain a text region includes:
obtaining an circumscribed rectangle of the communication area, expanding the communication area until the circumscribed rectangle is filled, wherein the circumscribed rectangle is a rectangle circumscribed with the communication area in a preset direction;
acquiring the nearest adjacent communication area of the communication area, wherein the nearest adjacent communication area is the communication area with the shortest interval distance with the communication area;
and when the direction of the nearest communicating region relative to the communicating region is a preset direction, expanding the communicating region towards the direction of the nearest communicating region so as to obtain the text region.
7. The method according to claim 1, wherein the identifying the text of each text region on the overlay feature layer to obtain the identified text corresponding to the image to be processed includes:
Performing text cutting on the text region to obtain one or more single-word regions;
identifying the characters of each single-word area to obtain character information corresponding to each single-word area;
combining the character information corresponding to each single word region according to the arrangement positions of each single word region in the text region to obtain text information corresponding to the text region;
and acquiring the identification text of the image to be processed according to the text information corresponding to the text areas.
8. The method of claim 7, wherein said performing text cutting on said text region to obtain one or more single word regions comprises:
calculating the length-height ratio of the text region, wherein the length-height ratio is the ratio of the length of the text region to the height of the text region;
calculating the expected number of characters of the text region according to the length-to-height ratio;
and uniformly cutting the text areas in the length direction according to the predicted number to obtain the predicted number of the single-word areas.
9. The method of claim 8, wherein the uniformly cutting the text region in the length direction according to the predicted number to obtain the predicted number of the individual character regions includes:
Acquiring a pre-cutting number according to the estimated number, wherein the pre-cutting number is greater than or equal to the estimated number;
uniformly arranging candidate cutting lines on the text region in the length direction according to the pre-cutting number, wherein the candidate cutting lines can uniformly cut the text region in the length direction to obtain the pre-cutting number of candidate regions;
taking the candidate cutting lines with adjacent cutting lines on two sides as target cutting lines;
detecting the sum of the distances between the target cutting line and the candidate cutting lines adjacent to the two sides;
when the ratio of the sum of the distances to the height of the text region is greater than or equal to a preset ratio, reserving the target cutting line;
and discarding the target cutting line when the ratio of the sum of the distances to the height of the text region is smaller than a preset ratio.
10. The method of claim 1, wherein the superimposing each of the feature layers to obtain a superimposed feature layer comprises:
superposing the characteristic layers to obtain a superposed characteristic layer;
combining the communication areas with the interval distance smaller than the preset distance on the overlapped characteristic layers into a combined communication area;
Calculating the area of a communication area from each feature layer in the combined communication area, and calculating the area occupation ratio corresponding to each feature layer, wherein the area occupation ratio is the ratio of the area of the communication area at the corresponding position of the feature layer to the area of the combined communication area;
and replacing the combined communication area with the communication area of the corresponding position of the characteristic layer with the largest area ratio.
11. The method according to any one of claims 1-10, wherein the method is applied to automated processing of a complaint sheet, the image to be processed comprising an image in the complaint sheet; after identifying the text of each text region on the superimposed feature layer to obtain an identification text corresponding to the image to be processed, the method further comprises:
inputting the identification text corresponding to the image to be processed into a pre-trained neural network model to obtain a complaint effectiveness label and a complaint risk label corresponding to a complaint list where the image to be processed is located;
and storing the complaint effectiveness label and the complaint risk label corresponding to the complaint form and the main body corresponding to the complaint form into a complaint form database.
12. The method of claim 11, wherein after the storing of the complaint efficacy label and the complaint risk label for the complaint ticket and the subject for the complaint ticket in a complaint ticket database, the method further comprises:
acquiring information flow data and fund flow data of a trade order, wherein the trade order corresponds to a target subject;
searching the complaint list database according to the target subject to obtain a target complaint list corresponding to the target subject, and a complaint effectiveness label and a complaint risk label of the target complaint list;
inputting the information flow data and the fund flow data of the trade orders, and the complaint efficacy labels and the complaint risk labels corresponding to the target subjects into a pre-trained decision tree model to obtain risk strategy suggestions corresponding to the target subjects, wherein the risk strategy suggestions comprise one or more of trust trade orders, limit trade order amounts, punish trade orders, intercept trade orders and remind trade risks.
13. An image text recognition apparatus, comprising:
the image layer segmentation module is configured to convert an image to be processed into a gray image, and segment the gray image into gray image layers corresponding to the image layer intervals according to the image layer intervals to which the gray values of the pixel points of the gray image belong, wherein the image layer intervals are used for representing the value ranges of the gray values of the pixel points in the corresponding gray image layers;
The corrosion module is configured to perform image corrosion on each gray level image layer to obtain a characteristic layer corresponding to each gray level image layer, wherein the characteristic layer comprises a plurality of communication areas, and the communication areas are areas comprising a plurality of pixel points with a communication relationship;
the feature superposition module is configured to superpose the feature layers to obtain a superposition feature layer, and the superposition feature layer comprises a plurality of communication areas;
the expansion module is configured to expand each connected region on the overlapped characteristic layer according to a preset direction to obtain a text region;
and the text recognition module is configured to recognize the text of each text region on the superimposed feature layer to obtain a recognition text corresponding to the image to be processed.
14. A computer readable medium having stored thereon a computer program which, when executed by a processor, implements the image text recognition method of any one of claims 1 to 12.
15. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the image text recognition method of any one of claims 1 to 12 via execution of the executable instructions.
CN202111307156.0A 2021-11-05 2021-11-05 Image text recognition method and device, computer readable medium and electronic equipment Pending CN116092094A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202111307156.0A CN116092094A (en) 2021-11-05 2021-11-05 Image text recognition method and device, computer readable medium and electronic equipment
PCT/CN2022/118298 WO2023077963A1 (en) 2021-11-05 2022-09-13 Image text recognition method and apparatus, computer readable medium, and electronic device
US18/354,726 US20230360183A1 (en) 2021-11-05 2023-07-19 Method, computer-readable medium, and electronic device for image text recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111307156.0A CN116092094A (en) 2021-11-05 2021-11-05 Image text recognition method and device, computer readable medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116092094A true CN116092094A (en) 2023-05-09

Family

ID=86210694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111307156.0A Pending CN116092094A (en) 2021-11-05 2021-11-05 Image text recognition method and device, computer readable medium and electronic equipment

Country Status (3)

Country Link
US (1) US20230360183A1 (en)
CN (1) CN116092094A (en)
WO (1) WO2023077963A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934517A (en) * 2024-03-19 2024-04-26 西北工业大学 Single-example self-evolution target detection segmentation method based on divergence clustering

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002279344A (en) * 2001-03-16 2002-09-27 Ricoh Co Ltd Character recognition device and method, and recording medium
CN104156706A (en) * 2014-08-12 2014-11-19 华北电力大学句容研究中心 Chinese character recognition method based on optical character recognition technology
CN108985324A (en) * 2018-06-04 2018-12-11 平安科技(深圳)有限公司 Handwritten word training sample acquisition methods, device, equipment and medium
CN109034147B (en) * 2018-09-11 2020-08-11 上海唯识律简信息科技有限公司 Optical character recognition optimization method and system based on deep learning and natural language
CN109255499B (en) * 2018-10-25 2021-12-07 创新先进技术有限公司 Complaint and complaint case processing method, device and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934517A (en) * 2024-03-19 2024-04-26 西北工业大学 Single-example self-evolution target detection segmentation method based on divergence clustering

Also Published As

Publication number Publication date
US20230360183A1 (en) 2023-11-09
WO2023077963A1 (en) 2023-05-11

Similar Documents

Publication Publication Date Title
CN111931664B (en) Mixed-pasting bill image processing method and device, computer equipment and storage medium
CN111898696A (en) Method, device, medium and equipment for generating pseudo label and label prediction model
CN108596616B (en) User data authenticity analysis method and device, storage medium and electronic equipment
CN111681091B (en) Financial risk prediction method and device based on time domain information and storage medium
CN113014566B (en) Malicious registration detection method and device, computer readable medium and electronic device
CN110633991A (en) Risk identification method and device and electronic equipment
CN110502694A (en) Lawyer's recommended method and relevant device based on big data analysis
CN112541443B (en) Invoice information extraction method, invoice information extraction device, computer equipment and storage medium
CN115115969A (en) Video detection method, apparatus, device, storage medium and program product
Nadeem et al. SSM: Stylometric and semantic similarity oriented multimodal fake news detection
CN115238688A (en) Electronic information data association relation analysis method, device, equipment and storage medium
CN116092094A (en) Image text recognition method and device, computer readable medium and electronic equipment
CN114282258A (en) Screen capture data desensitization method and device, computer equipment and storage medium
CN117058723A (en) Palmprint recognition method, palmprint recognition device and storage medium
CN111898528A (en) Data processing method and device, computer readable medium and electronic equipment
US11935331B2 (en) Methods and systems for real-time electronic verification of content with varying features in data-sparse computer environments
CN113269179B (en) Data processing method, device, equipment and storage medium
CN115880702A (en) Data processing method, device, equipment, program product and storage medium
CN114373098A (en) Image classification method and device, computer equipment and storage medium
CN113590786A (en) Data prediction method, device, equipment and storage medium
CN112950222A (en) Resource processing abnormity detection method and device, electronic equipment and storage medium
CN113836297A (en) Training method and device for text emotion analysis model
CN115907781A (en) False image determination method and device, computer readable medium and electronic device
CN113572913B (en) Image encryption method, device, medium and electronic equipment
CN117523586A (en) Check seal verification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40086088

Country of ref document: HK