CN116434218A

CN116434218A - Check identification method, device, equipment and medium suitable for mobile terminal

Info

Publication number: CN116434218A
Application number: CN202310479560.9A
Authority: CN
Inventors: 崔魏
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-07-14

Abstract

The present disclosure provides a check identification method suitable for mobile terminals, relates to the field of artificial intelligence, and can be applied to the field of financial science and technology. The method comprises the following steps: extracting image features of a check image by using a character region detection model, detecting character regions in the check image based on the image features, and then identifying text information in the character regions by using a character identification model. When the image features of the check image are extracted, a plurality of network layers in a trunk feature extraction network are utilized to extract the image features of the check image in a serial mode, wherein a first feature image output by a preset network layer in the plurality of network layers is subjected to reinforcement processing by a feature reinforcement module to be a second feature image, and then the second feature image is transmitted back to the trunk feature extraction network to be continuously processed. The present disclosure also provides a check identification apparatus, device, storage medium and program product suitable for use on a mobile terminal.

Description

Check identification method, device, equipment and medium suitable for mobile terminal

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and may be used in the field of financial technology, and more particularly, to a check identification method, apparatus, device, medium, and program product suitable for a mobile terminal.

Background

Check business is one of the common businesses of banks. With the rapid development of the mobile internet, the mobile banking client side also opens a check deposit service. When a mobile banking client uses a check to conduct a transaction (e.g., check deposit), a user usually uses the mobile phone to shoot a check picture, then uploads the check picture to a background server for image information processing, and then returns a result obtained by recognition to the client, thereby completing a check deposit flow. However, when the check image is larger, for example, as the performance of the camera on the mobile phone is improved, the size of the shot image is larger and larger, and the check image is uploaded to the background server, which is dependent on the network rate. Under the condition of weak network, the situation that the uploading time of the picture is too long and even the uploading is unsuccessful exists, and the use experience of a user is affected.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a check identification method, apparatus, device, medium, and program product suitable for use on a mobile terminal.

According to a first aspect of the present disclosure, a check identification method suitable for use at a mobile terminal is provided. The method comprises the following steps: acquiring a check image to be identified; extracting image features of the check image by using a text region detection model, and detecting text regions in the check image based on the image features; identifying text information in the text region by using a text identification model; and uploading the text information to a server. Wherein the extracting the image features of the check image using the text region detection model comprises: and extracting image features of the check image in a serial connection mode by utilizing a plurality of network layers in a trunk feature extraction network, wherein a first feature image output by a preset network layer in the plurality of network layers is subjected to reinforcement processing by a feature reinforcement module to be a second feature image, and then the second feature image is transmitted back to the trunk feature extraction network for continuous processing.

According to an embodiment of the present disclosure, the enhancing the first feature image by the feature enhancing module includes: respectively carrying out feature extraction of n size grades on the first feature image to obtain n first feature subgraphs, wherein n is an integer greater than or equal to 2, and the n size grades comprise a global size grade for feature extraction according to a 1x1 grid and at least one local size grade for feature extraction according to a grid different from the 1x1 grid; feature fusion is carried out on the n first feature subgraphs to obtain a second feature subgraph; and processing the second feature subgraph according to the image parameters required by the trunk feature extraction network when the network continues to process, so as to obtain the second feature image.

According to an embodiment of the disclosure, the extracting features of n size classes from the first feature image to obtain n first feature subgraphs includes: respectively carrying out pooling treatment on the first characteristic images by using n pooling layers respectively corresponding to the n size grades; and respectively carrying out convolution processing on the outputs of the n pooling layers by using n first convolution layers to compress the channel number of the image, so as to obtain n first characteristic subgraphs.

According to an embodiment of the present disclosure, the convolution kernel size of the first convolution layer is 1*1.

According to an embodiment of the present disclosure, performing feature fusion on the n first feature subgraphs to obtain a second feature subgraph includes: and continuously updating the sampling object from the first characteristic subgraph corresponding to the global size level as an initial sampling object in the following manner until the n characteristic subgraphs are fused and then outputting the sampling object: up-sampling the sampling object according to the image resolution corresponding to the size grade adjacent to the size grade of the sampling object to obtain a feature subgraph to be fused; and carrying out feature fusion on the feature subgraph to be fused and a first feature subgraph corresponding to a size grade adjacent to the size grade of the sampling object to obtain a new sampling object.

According to an embodiment of the present disclosure, the feature fusion of the n first feature subgraphs to obtain a second feature subgraph further includes: and carrying out feature fusion on the sampling object which is output after the n feature subgraphs are fused and the first feature image to obtain the second feature subgraph.

According to an embodiment of the present disclosure, processing the second feature subgraph according to image parameters required by the backbone feature extraction network when continuing to process, to obtain the second feature image includes: when the preset network layer is one network layer among the network layers, carrying out convolution processing on the second characteristic subgraph according to image parameters input by a next network layer of the preset network layers in the network layers to obtain the second characteristic image; and when the preset network layer is the last network layer among the network layers, performing convolution processing on the second characteristic subgraph according to the image parameters output by the main characteristic extraction network to obtain the second characteristic image.

In a second aspect of the disclosed embodiments, a check identification apparatus suitable for use at a mobile terminal is provided. The device comprises an acquisition module, a text region detection model, a text recognition model and an uploading module. The acquisition module is used for acquiring check images to be identified. The text region detection model is used for extracting image features of the check image and detecting text regions in the check image based on the image features. The text recognition model is used for recognizing text information in the text region. And the uploading module is used for uploading the text information to a server. The text region detection model comprises a trunk feature extraction network and a feature reinforcing module. The backbone feature extraction network is used for extracting image features of the check image in a serial manner through a plurality of network layers. The feature strengthening module is used for strengthening the first feature image output by a preset network layer in the plurality of network layers into a second feature image through the feature strengthening module, and then transmitting the second feature image back to the trunk feature extraction network for continuous processing.

According to an embodiment of the disclosure, the feature enhancement module is specifically configured to: respectively carrying out feature extraction of n size grades on the first feature image to obtain n first feature subgraphs, wherein n is an integer greater than or equal to 2, and the n size grades comprise a global size grade for feature extraction according to a 1x1 grid and at least one local size grade for feature extraction according to a grid different from the 1x1 grid; performing feature fusion on the n first feature subgraphs to obtain a second feature subgraph; and processing the second feature subgraph according to the image parameters required by the trunk feature extraction network when the network continues to process, so as to obtain the second feature image.

In a third aspect of the disclosed embodiments, an electronic device is provided. The electronic device includes one or more processors and memory. The memory is configured to store one or more programs that, when executed by the one or more processors, cause the one or more processors to perform the above-described method.

In a fourth aspect of the disclosed embodiments, there is also provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method.

In a fifth aspect of the disclosed embodiments, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the above method.

One or more of the above embodiments have the following advantages or benefits: the method can at least partially avoid the problems that the uploading time is too long and even the uploading fails in the process that the mobile terminal uploads the check picture to the server terminal under the weak network environment, and the characteristic strengthening module is added into the lightweight trunk characteristic extraction network used by the mobile terminal, so that the characteristic strengthening module is utilized to conduct further characteristic extraction and strengthening on the image characteristics output by at least one network layer, the accuracy of characteristic extraction is improved, the accuracy of character region detection is improved, and the method is suitable for the business requirements of the mobile terminal.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a check identification method, apparatus, device, medium and program product suitable for use on a mobile terminal according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a check identification method applicable to a mobile terminal according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart for detecting text regions in a check image using a text detection model according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a process flow of enhancing a first feature image by a feature enhancement module in a check identification method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart for identifying a check image using a backbone feature extraction network and a feature enhancement module in a check identification method according to another embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a check identification apparatus adapted for use on a mobile end in accordance with an embodiment of the present disclosure; and

fig. 7 schematically illustrates a block diagram of an electronic device adapted to implement a check identification method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first," "second," and the like, herein are used solely for distinguishing, and not as a limitation, and any number of elements in the figures are used for illustration, and not as a limitation.

The check recognition process can be divided into text detection and text recognition. The character detection is to detect the character area in the check image, which belongs to the pre-step of character recognition. The accuracy of text detection directly affects the accuracy of text recognition.

The model used at the service end for identifying check images is usually relatively large, wherein, mainly because the model for text detection is relatively large, and the data processing capability of the mobile end (such as a mobile phone, ipad) is limited, the model used at the service end is difficult to be directly transplanted to the mobile end. Therefore, the inventor provides a lightweight network model based on the characteristic reinforcing module, and the network model is used as a text region detection model and is combined with a corresponding text recognition model, so that the check recognition requirement of a mobile terminal can be met.

Specifically, the check recognition method, device, equipment, medium and program product suitable for the mobile terminal provided by the embodiment of the disclosure can utilize the character area detection model installed in the mobile terminal and comprising the characteristic strengthening module to perform character detection, then utilize the character recognition model installed in the mobile terminal to perform character recognition, and then upload the character recognition result to the service terminal to perform check transaction. The feature enhancement module can further extract and enhance the image features output by at least one network layer in the main feature extraction network in the text region detection model, so that the feature extraction precision can be improved, and the text region detection accuracy can be improved. The text region detection model can be designed into a lightweight model without losing recognition accuracy, and is suitable for the business requirements of the mobile terminal.

FIG. 1 schematically illustrates an application scenario diagram of a check identification method, apparatus, device, medium and program product suitable for use on a mobile terminal according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a mobile terminal 101, a network 102, and a server 103, the network 102 serving as a medium for providing a communication link between the mobile terminal 101 and the server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 103 through the network 12 using the mobile terminal 101 to receive or send messages or the like. The mobile terminal 101 may have various communication client applications installed thereon, such as shopping applications, mobile banking applications, social platform software, etc. (as examples only). The mobile terminal 101 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, ipad tablets, and the like.

The server 103 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the mobile terminal 101. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the check identification method applicable to the mobile terminal provided in the embodiments of the present disclosure may be executed by the mobile terminal 101. Accordingly, the check identifying apparatus, device, medium and program product applicable to the mobile terminal provided in the embodiments of the present disclosure may also be provided in the mobile terminal 101.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

A check recognition method applicable to a mobile terminal according to an embodiment of the present disclosure will be described in detail with reference to fig. 2 to 5 based on the scenario described in fig. 1. It should be noted that the sequence numbers of the respective operations in the following methods are merely representative of the operations for the purpose of description, and should not be construed as representing the order of execution of the respective operations. The method need not be performed in the exact order shown unless explicitly stated.

FIG. 2 schematically illustrates a flow chart of a check identification method suitable for use on a mobile terminal in accordance with an embodiment of the present disclosure.

As shown in fig. 2, the check recognition method according to the embodiment may include operations S210 to S240.

First, in operation S210, a check image to be recognized is acquired. For example, the user may use the mobile terminal 101 to open the mobile banking client to operate, and trigger the camera to capture the check image. Alternatively, the user may also receive check images sent by other users using a communication client (e.g., a WeChat or mailbox) in the mobile terminal 101, and then provide the check images to a mobile banking application in the mobile terminal 101.

Then, in operation S220, a text region in the check image is detected using the text region detection model.

The text region detection model comprises a trunk feature extraction network and a feature enhancement module. The backbone feature extraction network may include multiple network layers (e.g., convolution layers, pooling layers, etc.). The feature enhancement module may be located between predetermined two network layers or after the last network layer of the backbone feature network. In the text region detection model, there may be one or more feature reinforcing modules. To accommodate the lightweight network architecture of the mobile end, in one embodiment, the backbone feature extraction network may employ a MobileNet network.

FIG. 3 schematically illustrates a flow chart of detecting text regions in a check image using a text detection model in operation S220, according to an embodiment of the disclosure.

Referring to fig. 3, operation S220 may include operations S221 to S222.

Firstly, in operation S221, image features of check images are extracted by using a plurality of network layers in the trunk feature extraction network in a serial manner, wherein a first feature image output by a predetermined network layer in the plurality of network layers is reinforced by a feature reinforcing module to be processed into a second feature image, and then the second feature image is transmitted back to the trunk feature extraction network for continuous processing.

Then, in operation S222, text regions in the check image are detected based on the image features. The checks are mainly table lines, logo, seal and characters. Through training the character detection model, the character detection model can analyze the coordinate relation or distribution characteristics of pixels with different characteristics and the like based on the extracted image characteristics, and a character area is positioned.

In the trunk feature extraction network, a feature extraction module is added, so that the network depth can be increased, the feature extraction precision is improved, and the checking accuracy of the text region is improved.

Next, in operation S230, text information in the text region is recognized using the text recognition model. The text information may include text position information, text semantic information, and the like. The character recognition model may be an artificial intelligence model based on optical character recognition technology (Optical Character Recognition, OCR).

Finally, in operation S240, the text information is uploaded to the server. For example, a check transaction is performed by the server 103 based on the received text information.

The character detection model and the character recognition model are trained artificial intelligent models, and need to be trained in advance. For example, a large number of check images may be gathered in server 103 or other server(s) (cluster) and the text detection model and the text recognition model trained. After the recognition accuracy of the character detection model and the character recognition model to be trained meets the requirement, the character detection model and the character recognition model can be packaged together and integrated in a mobile banking client for downloading by a user by using the mobile terminal 101.

The training process of the character detection model and the character recognition model can be combined training, namely the same set of training sample data is used, and the output of the character detection model is used as the input of the character recognition model for training; the method can also be that the two models are trained independently, and collected check images are preprocessed and marked in a targeted mode according to the training requirement of each model to form training sample data of each model.

According to the embodiment of the invention, check identification can be migrated from the server side to the mobile side, so that effective processing of check images under the condition of weak network can be realized, and user experience is improved. And moreover, the risk of leakage of user information, which is possibly brought in the process that the mobile terminal uploads the check picture to the server, is avoided, and the safety of the user information is improved.

FIG. 4 schematically illustrates a process flow of enhancing a first feature image by a feature enhancement module in a check identification method according to an embodiment of the present disclosure.

As shown in fig. 4, a process of performing, by the feature enhancement module, enhancement processing on a first feature image output by a certain network layer in the backbone feature network according to an embodiment of the present disclosure may include operations S401 to S403.

First, in operation S401, feature extraction of n size classes is performed on the first feature image, so as to obtain n first feature subgraphs, where n is an integer greater than or equal to 2.

The n size classes include a global size class for feature extraction in a 1x1 grid and at least one local size class for feature extraction in a grid other than 1x1 (such as 2x2, 4x4, 8x 8). When extracting the characteristics according to a certain grid, carrying out weighted average processing on the characteristics of the pixels in the area defined by the grid each time, and taking the characteristics of one pixel in the output image; then, after one extraction is completed, moving the grid on the input image according to a preset step length; the feature extraction is iterated in this way.

And extracting the features of the first feature image according to 1x1 grid division, so that the global features of the first feature image can be extracted. And when the first characteristic image is subjected to characteristic extraction according to grids different from 1x1, local characteristics in the range of the corresponding grids in the first characteristic image can be extracted.

In one embodiment, the at least one local size level may include a plurality of size levels. The largest grid in one embodiment may be half the grid of the predetermined network layer that extracted the first feature image. For example, when the first feature image is 16×16, the largest grid among n size scales may be 8×8. Thus, the n-size level extracted features, including features from multiple levels, from global features to local features of multiple different scales, increase the level and richness of features compared to the first feature image.

In a specific implementation, in one embodiment, n pooling layers corresponding to n size classes may be used to pool the first feature images respectively, and then n first convolution layers are used to convolve the outputs of the n pooling layers to compress the channel number of the image, so as to obtain n first feature subgraphs.

The convolution layer extracts image features through convolution operation. The specific operation of the pooling layer is basically consistent with the operation of the convolution layer, and the main purpose is to perform feature dimension reduction, so that the size of an image parameter matrix can be well reduced. The pooling layer typically does not have an effect on interactions between channels, but rather performs dimension reduction operations in the individual channels. While the convolutional layer may interact from channel to channel.

In one embodiment, to reduce the model processing parameters, the convolution kernel size of n first convolution layers may be selected to be 1*1. The convolution kernel is the weighted average of pixels in a small region of the input image to become each corresponding pixel in the output image during image processing, where the weight is defined by a function called the convolution kernel. The parameters of the convolution kernel include kernel size, step size, and number of padding steps. The kernel size represents the size of the receptive field in the network, e.g., the convolution kernel of 1*1, representing that only pixels in the 1*1 region are weighted average at a time to become corresponding pixels in the output image. Step size, representing the accuracy of the extraction, i.e., the length spanned by each convolution.

It can be seen that the n first feature subgraphs extracted through operation S401 include features of multiple levels from global to local, and the receptive field for feature extraction is increased, so that the features that can be extracted are enriched.

Next, in operation S402, feature fusion is performed on the n first feature subgraphs, so as to obtain a second feature subgraph.

When the feature fusion is carried out, the first feature subgraphs obtained by utilizing the small grids in the n first feature subgraphs can be fused with the first feature subgraphs of adjacent size levels after up sampling. Specifically, by up-sampling, the graph extracted by the small grid can be made to coincide with the feature extraction scale of the first feature image corresponding to the adjacent size level, so that feature fusion can be performed.

Specifically, the first feature subgraph corresponding to the global size level may be taken as an initial sampling object, and the sampling object is continuously updated according to the following manner until n feature subgraphs are fused and then the sampling object is output: firstly, up-sampling a sampling object according to image resolution corresponding to a size grade adjacent to the size grade of the sampling object to obtain a feature subgraph to be fused; and then carrying out feature fusion on the feature subgraph to be fused and the first feature subgraph corresponding to the size grade adjacent to the size grade of the sampling object to obtain a new sampling object.

In one embodiment, the sampling objects output after the n first feature subgraphs are fused step by step according to the above description may be used as the second feature subgraphs. In another embodiment, the sampled object output after the n feature subgraphs are fused can be subjected to feature fusion with the first feature image to obtain a second feature subgraph, so that the features of the original first feature image are not lost, and the features of the second feature subgraph are ensured to be richer than those of the first feature image transmitted to the feature reinforcing module. According to the embodiment of the disclosure, the feature pyramid mode is adopted to perform feature fusion step by step and iteratively, so that the robustness of the character detection model to characters with different scales can be improved.

And then in operation S403, the second feature subgraph is processed according to the image parameters required by the backbone feature extraction network when continuing to process, so as to obtain a second feature image. In this way, smooth execution of the trunk feature extraction network is ensured, and excessive intervention or modification of the trunk feature extraction network is avoided.

Specifically, when the predetermined network layer is one network layer among the plurality of network layers, the convolution processing is performed on the second feature subgraph according to the image parameters input by the next network layer of the predetermined network layers in the plurality of network layers, so as to obtain a second feature image. Or when the preset network layer is the last network layer among the network layers, the image parameters output by the network are extracted according to the trunk characteristics, and convolution processing is carried out on the second characteristic subgraph, so that a second characteristic image is obtained.

In the embodiment of the disclosure, the feature reinforcing module can be added in the lightweight trunk feature extraction network used by the mobile terminal, so that the network depth is increased, the speed and accuracy of text detection can be improved by reducing the number of parameters, and the method and the device are suitable for the related service requirements of the mobile terminal.

Fig. 5 schematically illustrates a flowchart of identifying a check image using a backbone feature extraction network and a feature enhancement module in a check identification method according to another embodiment of the present disclosure. Those skilled in the art will appreciate that the illustration in fig. 5 is merely exemplary and is not meant to limit the present disclosure.

As shown in fig. 5, in this embodiment, the process of extracting features from a check image may include the following steps 1 to 7 when the check image is recognized.

And step 1, extracting the bottom-up characteristic of the acquired check image by utilizing a main characteristic extraction network, extracting a characteristic layer (namely the first characteristic image), and inputting the characteristic layer into a characteristic reinforcing module.

And 2, processing the feature layer input into the feature enhancement module through a pooling layer. Specifically, assuming that the first feature image size is 16×16×32, 4 independent pooling layers may be used to average the pooling according to the mesh divisions of 1×1,2×2,4×4,8×8 sizes, respectively. The 1×1,2×2,4×4,8×8 sized grids correspond to 4 size classes, respectively.

In one embodiment, a pooled layer output of 1×1×32,2×2×32,4×4× 32,8 ×8×32 may be obtained after pooling.

And step 3, reducing the channel number of each first characteristic subgraph to the original 1/4 size through 4 first convolution layers respectively by outputting the 4 pooling layers. The convolution kernel size of the 4 first convolution layers is 1 x 1. The outputs of the 4 first convolution layers are 1×1×8,2, respectively x 2 x 8,4 x 8,8 x 8.

And 4, entering a feature fusion stage. And upsampling the first characteristic subgraphs obtained by the small grids used in the two first characteristic subgraphs corresponding to each two adjacent size grades, and then fusing the first characteristic subgraphs with the first characteristic subgraphs corresponding to the other size grade.

Specifically, the first feature sub-graph of 1×1×8 is up-sampled to obtain a graph of 2×2×8, and then feature fusion is performed with the first feature sub-graph of 2×2×8. After the fusion, a 2×2×16 map is obtained, and the map is up-sampled to obtain a 4×4×16 map. The 4×4×16 graph is fused with the 4×4×8 first feature sub-graph to obtain a 4×4×24 graph. Next, the 4 x 24 graph is upsampled, resulting in an 8 x 24 graph, and fuses the graph with the 8 x 8 first feature sub-graph, a graph of 8×8×32 is output.

And 5, upsampling the output 8×8×32 graph to obtain 16×16×32, and performing feature fusion with the first feature image of the 16×16×32 input to the feature enhancement module from the trunk feature extraction network to obtain a 16×16×64 graph.

And 6, reducing the number of channels of the 16×16×64 graph output in the step 5 through a convolution layer operation with a convolution kernel of 1×1, and performing information fusion by using a convolution layer with a convolution kernel of 3×3 to obtain a second characteristic image. The use of a 1 x 1 convolution kernel can reduce the number of parameters for feature extraction and reduce the computational effort of the model.

And 7, a second characteristic image transmission trunk characteristic extraction network is adopted. The trunk feature extraction network detects a text region after further performing top-down feature fusion and processing of a plurality of feature fusion layers. And then, the information of the detected text area is transmitted to a text recognition model, and text information such as text position information, semantic information and the like is recognized by the text recognition model.

Embodiments of the present disclosure may employ MobileNet as a backbone feature extraction network. The feature reinforcing module is added into the trunk feature extraction network, wherein the convolution layer adopts a depth separable convolution block, the depth of the network is increased, the parameters of a model are reduced, the receptive field is increased, and in a feature fusion stage, the robustness to characters with different scales is improved by adopting a feature pyramid model. And inputting the detected text region into a text recognition module to obtain a recognition result. The model can improve the speed of text detection by reducing the parameter quantity, deepen the depth of the network and improve the accuracy of detection, and is suitable for the service requirement of a mobile terminal.

Based on the check identification method suitable for the mobile terminal in the above embodiments, the present disclosure further provides a check identification device suitable for the mobile terminal. The device will be described in detail below in connection with fig. 6.

Fig. 6 schematically illustrates a block diagram of a check identification apparatus 600 adapted for use on a mobile terminal according to an embodiment of the present disclosure.

As shown in FIG. 6, the check identification apparatus 600 may include an acquisition module 610, a text region detection module 620, a text identification module 630, and an upload module 640. The text region detection model 620 includes a trunk feature extraction network 621 and a feature enhancement module 622. The check identification apparatus 600 may be provided in the mobile terminal 101 and may perform the method described with reference to the foregoing fig. 2 to 5.

The acquisition module 610 is configured to acquire a check image to be identified. In one embodiment, the acquisition module 610 may perform the aforementioned operation S210.

Text region detection model 620 is used to extract image features of the check image and detect text regions in the check image based on the image features. In one embodiment, text region detection model 620 may perform operation S220 described previously.

Specifically, the backbone feature extraction network 621 is configured to extract image features of check images in a tandem manner through a plurality of network layers. The feature enhancing module 622 is configured to enhance a first feature image output by a predetermined network layer of the plurality of network layers to form a second feature image, and then transmit the second feature image back to the backbone feature extraction network for further processing.

In some embodiments, feature enhancement module 622 is specifically configured to: firstly, respectively carrying out feature extraction of n size grades on a first feature image to obtain n first feature subgraphs, wherein n is an integer greater than or equal to 2, and the n size grades comprise a global size grade for feature extraction according to a 1x1 grid and at least one local size grade for feature extraction according to a grid different from the 1x1 grid; then carrying out feature fusion on the n first feature subgraphs to obtain a second feature subgraph; and then, processing a second characteristic subgraph according to image parameters required by the trunk characteristic extraction network when the network continues to process, and obtaining a second characteristic image.

The word recognition model 630 is used to recognize text information in a text region. In one embodiment, word recognition model 630 may perform operation S230 described previously.

The uploading module 640 is configured to upload the text information to the server. In one embodiment, the upload module 640 may perform the aforementioned operation S240.

Any of the acquisition module 610, text region detection module 620, text recognition module 630, upload module 640, backbone feature extraction network 621, and feature enhancement module 622 may be combined in one module to be implemented, or any of the modules may be split into multiple modules, according to embodiments of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the acquisition module 610, the text region detection module 620, the text recognition module 630, the upload module 640, the backbone feature extraction network 621, and the feature enhancement module 622 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the acquisition module 610, the text region detection module 620, the text recognition module 630, the upload module 640, the backbone feature extraction network 621, and the feature enhancement module 622 may be at least partially implemented as computer program modules that, when executed, perform the corresponding functions.

As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.

In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 702 and/or the RAM 703. Note that the program may be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the present disclosure, the electronic device 700 may further include an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 702 and/or RAM 703 and/or one or more memories other than ROM 702 and RAM 703 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods provided by embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 701. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A check identification method suitable for a mobile terminal, comprising:

acquiring a check image to be identified;

extracting image features of the check image by using a text region detection model, and detecting text regions in the check image based on the image features;

Identifying text information in the text region by using a text identification model; and

uploading the text information to a server;

wherein the extracting the image features of the check image using the text region detection model comprises: and extracting image features of the check image in a serial connection mode by utilizing a plurality of network layers in a trunk feature extraction network, wherein a first feature image output by a preset network layer in the plurality of network layers is subjected to reinforcement processing by a feature reinforcement module to be a second feature image, and then the second feature image is transmitted back to the trunk feature extraction network for continuous processing.

2. The method of claim 1, wherein subjecting the first feature image to the feature enhancement module enhancement process comprises:

respectively carrying out feature extraction of n size grades on the first feature image to obtain n first feature subgraphs, wherein n is an integer greater than or equal to 2, and the n size grades comprise a global size grade for feature extraction according to a 1x1 grid and at least one local size grade for feature extraction according to a grid different from the 1x1 grid;

feature fusion is carried out on the n first feature subgraphs to obtain a second feature subgraph; and

And processing the second characteristic subgraph according to the image parameters required by the trunk characteristic extraction network when the network continues to process, so as to obtain the second characteristic image.

3. The method of claim 2, wherein the extracting features of n size classes from the first feature image to obtain n first feature subgraphs includes:

respectively carrying out pooling treatment on the first characteristic images by using n pooling layers respectively corresponding to the n size grades; and

and respectively carrying out convolution processing on the outputs of the n pooling layers by using n first convolution layers to compress the channel number of the image, so as to obtain the n first characteristic subgraphs.

4. A method according to claim 3, wherein the convolution kernel size of the first convolution layer is 1*1.

5. The method of claim 2, wherein the feature fusing the n first feature subgraphs to obtain a second feature subgraph includes:

taking the first characteristic subgraph corresponding to the global size level as an initial sampling object, continuously updating the sampling object according to the following mode until the n characteristic subgraphs are fused and then outputting the sampling object:

up-sampling the sampling object according to the image resolution corresponding to the size grade adjacent to the size grade of the sampling object to obtain a feature subgraph to be fused;

And carrying out feature fusion on the feature subgraph to be fused and a first feature subgraph corresponding to a size grade adjacent to the size grade of the sampling object to obtain a new sampling object.

6. The method of claim 5, wherein the feature fusing the n first feature subgraphs to obtain a second feature subgraph further comprises:

and carrying out feature fusion on the sampling object which is output after the n feature subgraphs are fused and the first feature image to obtain the second feature subgraph.

7. The method of claim 2, wherein processing the second feature sub-graph to obtain the second feature image according to image parameters required by the backbone feature extraction network to continue processing comprises:

when the preset network layer is one network layer among the network layers, carrying out convolution processing on the second characteristic subgraph according to image parameters input by a next network layer of the preset network layers in the network layers to obtain the second characteristic image; and

and when the preset network layer is the last network layer among the network layers, carrying out convolution processing on the second characteristic subgraph according to the image parameters output by the main characteristic extraction network to obtain the second characteristic image.

8. A check identification apparatus adapted for use at a mobile terminal, wherein the apparatus comprises:

the acquisition module is used for acquiring check images to be identified;

a text region detection model for extracting image features of the check image and detecting text regions in the check image based on the image features; and

the character recognition model is used for recognizing text information in the character area; and

the uploading module is used for uploading the text information to a server;

the text region detection model comprises a trunk feature extraction network and a feature reinforcing module:

the backbone feature extraction network is to: extracting image features of the check image in a serial manner through a plurality of network layers;

the characteristic strengthening module is used for: and the first characteristic image output by a preset network layer in the network layers is subjected to reinforcement processing by a characteristic reinforcement module to form a second characteristic image, and the second characteristic image is transmitted back to the trunk characteristic extraction network for continuous processing.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of claims 1 to 7.

11. A computer program product comprising computer program instructions which, when executed by a processor, implement the method of any one of claims 1 to 7.