CN112883953A

CN112883953A - Card recognition device and method based on joint learning

Info

Publication number: CN112883953A
Application number: CN202110196711.0A
Authority: CN
Inventors: 张雷; 杜姗; 蔡为彬; 罗樋
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-02-22
Filing date: 2021-02-22
Publication date: 2021-06-01
Anticipated expiration: 2041-02-22
Also published as: CN112883953B

Abstract

A card recognition device and method based on joint learning, suitable for the big data processing field, can be used in finance field and other fields, the said device includes: the device comprises an image scanning module, a data preprocessing module, a model generating module and a result output module; the image scanning module is used for acquiring electronic image data of a preset card; the data preprocessing module is used for labeling the corresponding image area in the electronic image data according to a preset labeling rule to generate training image data; the model generation module is used for extracting structural information and visual information of image noise in training image data; analyzing structural information and visual information through an embedded layer in a preset neural network model to obtain image characteristics, semantic characteristics, text vector characteristics and position characteristics; training a neural network model by using the characteristics to obtain a card identification model; and the result output module is used for analyzing the electronic image data of the card to be identified through the card identification model to obtain an identification result.

Description

Card recognition device and method based on joint learning

Technical Field

The invention relates to the field of artificial intelligence, can be applied to the fields of finance and image recognition, and particularly relates to a card recognition device and method based on joint learning.

Background

At present, cards are widely used in banking industries, and cover all branches of banking business. The definition of the card is not limited to the bank card used by the client for transferring accounts and storing cash, but also includes the communication tools such as business cards used by business personnel in marketing work. However, the bank card of the customer or the business card used for paying for the business personnel is very easy to be invaded by stains and abrasion, and has very weak resistance to external adverse factors. In the past, the card making cost of the bank is increased, the card image identification quality is reduced, the application effect and the scene landing of a subsequent model of the bank are influenced, and potential residual damage and benefit loss are brought to the bank.

To solve such problems, many experts and scholars have concentrated on solving the noise problem of images and proposed many excellent solutions, such as: gaussian filtering and median filtering, but the traditional denoising methods often have great limitations, and only adapt to solve the image noise of a specific category, and although these methods can bring performance improvement to a certain extent for the noise of the specific category, the generality is not satisfactory. In recent years, image denoising methods based on deep learning are also proposed and developed, such as convolutional neural networks, multi-layer perceptrons and the like, but the models cannot well utilize visual information and structural information of image noise to assist in image noise recognition, so that the accuracy of image recognition still has a larger space for improvement.

Disclosure of Invention

The invention aims to provide a card recognition device and method based on joint learning, which overcome the defects of the traditional digital image processing denoising method such as scene limitation, image blurring and the like and overcome the defects of the traditional image noise recognition algorithm.

In order to achieve the above object, the present invention provides a card recognition device based on joint learning, the device comprising: the device comprises an image scanning module, a data preprocessing module, a model generating module and a result output module; the image scanning module is used for acquiring electronic image data of a preset card; the data preprocessing module is used for labeling the corresponding image area in the electronic image data according to a preset labeling rule to generate training image data; the model generation module is used for extracting structural information and visual information of image noise in the training image data; analyzing the structural information and the visual information through an embedded layer in a preset neural network model to obtain image characteristics, semantic characteristics, text vector characteristics and position characteristics; training the neural network model by using the image features, the semantic features, the text vector features and the position features to obtain a card identification model; and the result output module is used for analyzing the electronic image data of the card to be identified through the card identification model to obtain an identification result.

In the above-described card recognition device based on joint learning, preferably, the model generation module includes an image feature extraction unit, a semantic feature extraction unit, a text feature extraction unit, and a position feature extraction unit; the image feature extraction unit is used for extracting image information in a text frame in the training image data, setting a weight coefficient according to noise features in the image information, and screening out noise images in the image information through the weight coefficient to obtain image features; the semantic feature extraction unit is used for acquiring corresponding text semantic features according to text contents in the image information; the text feature extraction unit is used for dividing the text content into characters, converting each character into a vector and generating text vector features; the position feature extraction unit is used for generating position features according to coordinate information of texts in the image information.

In the above-described card recognition device based on joint learning, it is preferable that the image feature extraction unit further includes: acquiring the noise characteristics according to the visual characteristics and the attribute characteristics in the image information; and respectively setting weight coefficients for the noise characteristics and the text contents by calculating the pixel matrix difference of the noise characteristics and the text contents in the image information.

In the above-mentioned card recognition device based on joint learning, preferably, the intuitive feature includes one or more combinations of size, color, font, granularity and shape of the image noise; the attribute characteristics comprise one or more of drip type, infection type, breakage type, pollution type.

In the above card recognition device based on joint learning, preferably, the model generation module includes a training unit, and the training unit is configured to train the neural network model through a predetermined number of samples according to a preset iteration cycle; and when the neural network model completes the iterative round of training and the identification accuracy is higher than or equal to a preset baseline, obtaining a card identification model according to the neural network model.

The invention also provides a card identification method based on joint learning, which specifically comprises the following steps: collecting electronic image data of a preset card; marking corresponding image areas in the electronic image data according to a preset marking rule to generate training image data; extracting structural information and visual information of image noise in the training image data; analyzing the structural information and the visual information through an embedded layer in a preset neural network model to obtain image characteristics, semantic characteristics, text vector characteristics and position characteristics; training the neural network model by using the image features, the semantic features, the text vector features and the position features to obtain a card identification model; and analyzing the electronic image data of the card to be identified through the card identification model to obtain an identification result.

In the above card recognition method based on joint learning, preferably, the structural information and the visual information are analyzed through an embedded layer in a preset neural network model, and the obtained image features, semantic features, text vector features and position features include: extracting image information in a text frame in the training image data, setting a weight coefficient according to noise features in the image information, and screening out noise images in the image information through the weight coefficient to obtain image features; obtaining corresponding text semantic features according to text contents in the image information; dividing the text content into characters, and converting each character into a vector to generate text vector characteristics; and generating position characteristics according to the coordinate information of the text in the image information.

In the above method for card recognition based on joint learning, preferably, the setting of the weight coefficient according to the noise feature in the image information includes: acquiring the noise characteristics according to the visual characteristics and the attribute characteristics in the image information; and respectively setting weight coefficients for the noise characteristics and the text contents by calculating the pixel matrix difference of the noise characteristics and the text contents in the image information.

In the above card recognition method based on joint learning, preferably, the visual features include one or more combinations of sizes, colors, fonts, granularities and shapes of image noises; the attribute characteristics comprise one or more of drip type, infection type, breakage type, pollution type.

In the above card recognition method based on joint learning, preferably, the training the neural network model using the image feature, the semantic feature, the text vector feature, and the position feature to obtain a card recognition model includes: training the neural network model through a preset number of samples according to a preset iteration turn; and when the neural network model completes the iterative round of training and the identification accuracy is higher than or equal to a preset baseline, obtaining a card identification model according to the neural network model.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method when executing the computer program.

The present invention also provides a computer-readable storage medium storing a computer program for executing the above method.

The joint learning-based card identification device and method provided by the invention are based on a deep learning technology, comprehensively utilize image noise visual information and a structural relationship, and construct an innovative neural network model; model training is carried out based on the model, so that the accuracy of card identification is improved, the card making cost of a bank is saved, and more powerful accuracy guarantee is provided for subsequent application of an identification result (including but not limited to identification of card face information by an APP (application) of a mobile bank, reading of business card information by a social APP and the like), and the excellent credit and public praise of the bank are favorably maintained; the model is different from the existing Gaussian filtering and median filtering methods, has good universality, can train various types of image noise, and can accurately identify the content influenced by various types of noise; through the image embedding layer, the semantic embedding layer, the text embedding layer and the position embedding layer which are innovatively added in the neural network model, the model can more fully learn the characteristics of noise, and the accuracy of character recognition is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic structural diagram of a card recognition device based on joint learning according to an embodiment of the present invention;

fig. 2 is a schematic application flow diagram of an image scanning module according to an embodiment of the present invention;

fig. 3 is a schematic application flow diagram of a data preprocessing module according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating an application flow of a model building module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating an application flow of a model training module according to an embodiment of the present invention;

fig. 6 is a schematic application flow diagram of a result output module according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating various features of image noise according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating a sample card key-value pair distribution provided in accordance with an embodiment of the present invention;

fig. 9 is a flowchart illustrating a card recognition method based on joint learning according to an embodiment of the present invention;

FIG. 10 is a schematic diagram illustrating a feature acquisition process according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, unless otherwise specified, the embodiments and features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.

The electronic image data are processed and modeled by combining the characteristic data such as the structural information and the visual information of the card and comprehensively utilizing the technologies such as the deep learning pre-training technology and the machine learning algorithm, so that the electronic image content of the card with image noise is accurately identified, and the influence of the image noise on the card identification is reduced; on one hand, the experience of the customer on the card identification function is improved, and the service life of various cards of the customer is prolonged; on the other hand, the business processing efficiency of bank personnel can be improved, and the labor cost and the card making cost are saved.

Referring to fig. 1, the card recognition device based on joint learning provided by the present invention includes: the system comprises an image scanning module 1, a data preprocessing module 2, a model generating module (a model constructing module 3 and a model training module 4 in the figure 1) and a result output module 5; the image scanning module 1 is used for collecting electronic image data of a preset card; the data preprocessing module 2 is used for labeling the corresponding image area in the electronic image data according to a preset labeling rule to generate training image data; the model generation module is used for extracting structural information and visual information of image noise in the training image data; analyzing the structural information and the visual information through an embedded layer in a preset neural network model to obtain image characteristics, semantic characteristics, text vector characteristics and position characteristics; training the neural network model by using the image features, the semantic features, the text vector features and the position features to obtain a card identification model; the result output module 5 is used for analyzing the electronic image data of the card to be identified through the card identification model to obtain an identification result.

In actual work, the image scanning module 1 is connected with the data preprocessing module 2, the data preprocessing module 2 is connected with the model building module 3, the model building module 3 is connected with the model training module 4, and the model training module 4 is connected with the result output module 5, specifically:

the image scanning module 1 refers to an electronic device for obtaining a complete electronic image of a card, and the device includes, but is not limited to, a mobile phone, a digital camera, a scanner, and the like. After passing through the image scanning module 1, we can obtain a complete electronic image of the card.

The data preprocessing module 2 refers to a process of performing data annotation on the card electronic image data obtained by the image scanning module 1 by using a data annotation tool, which includes, but is not limited to, label, labelImg, yolo _ mark.

The model building module 3 is a process for building a neural network model according to specific requirements. The novel card recognition model is used for modeling structural information and visual information of image noise, and effectively combining the structural information and the visual information of the image noise by constructing neural network models with an image embedding layer, a semantic embedding layer, a text embedding layer, a position embedding layer and the like, so that the effect of the original model is effectively improved, and the influence of the image noise on card electronic image recognition is reduced.

The model training module 4 is a process of reading a picture file, setting parameters such as iteration turns, sample number, learning rate and the like of a model, dividing the existing data set into a training set and a test set according to a certain proportion, and loading the neural network model constructed in the model construction module 3 for training. And after the training is finished, evaluating the quality of the model. Finally, the model file is saved.

The result output module 5 outputs a correct recognition result by applying the model trained in the model training module 4 and using the new card image as input data. The result can be used as an input parameter for downstream applications.

Referring to fig. 2, in practical operation, the image scanning module 1 provided by the present invention has the following steps in the using process:

step S101: preparing the card material object.

Step S102: the customer scans the card desired to be identified through the image scanner device.

Step S103: obtaining the electronic image of the card.

Step S104: and placing the scanned electronic image of the card in a specified catalogue.

Referring to fig. 3, the data preprocessing module 2 in fig. 1 is used as follows:

step S201: the image data acquired from the image scanning module 1 is subjected to data cleaning, and electronic image data with quality problems such as missing values and duplication are removed.

Step S202: a labeling specification is made and used as a reference in step S204.

Step S203: and establishing an image marking tool using environment and installing the image marking tool.

Step S204: and (5) opening the data set picture to be labeled for labeling by using the image labeling tool in the step (S203).

Step S205: and storing the marked information after marking, wherein the marked information on the picture is stored in a json file format, and the content of the json file includes but is not limited to the path, the tag name and the like of the file.

In an embodiment of the present invention, the model generation module includes an image feature extraction unit, a semantic feature extraction unit, a text feature extraction unit, and a position feature extraction unit; the image feature extraction unit is used for extracting image information in a text frame in the training image data, setting a weight coefficient according to noise features in the image information, and screening out noise images in the image information through the weight coefficient to obtain image features; the semantic feature extraction unit is used for acquiring corresponding text semantic features according to text contents in the image information; the text feature extraction unit is used for dividing the text content into characters, converting each character into a vector and generating text vector features; the position feature extraction unit is used for generating position features according to coordinate information of texts in the image information. Wherein the image feature extraction unit further includes: acquiring the noise characteristics according to the visual characteristics and the attribute characteristics in the image information; and respectively setting weight coefficients for the noise characteristics and the text contents by calculating the pixel matrix difference of the noise characteristics and the text contents in the image information. The visual characteristics comprise one or more combinations of size, color, font, granularity and shape of the image noise; the attribute characteristics comprise one or more of drip type, infection type, breakage type, pollution type.

In actual work, the model generation module can be divided into a model construction module and a model training module, as shown in fig. 4, the implementation steps of the model construction module 3 provided by the present invention are as follows:

step S301: the card image data and the card text content are read in batch.

Step S302: and constructing a neural network model. The neural network model of the inventive method consists of four main embedded layers: the system comprises an image embedding layer, a semantic embedding layer, a text embedding layer and a position embedding layer.

Wherein the image embedding layer: the image features in the text border responsible for recording the electronic image data include, but are not limited to, visual features and attribute features of image noise, and the image features can provide more information for extracting card information. As shown in fig. 7, the intuitive features include, but are not limited to, image noise: size, color, font, particle size, shape, and attribute characteristics include, but are not limited to, water droplet type, infestation type, breakage type, contamination type. The image text and the image noise are different in color, shape and other characteristics, different weights are given to the image noise and the text content by calculating pixel matrix difference of the image noise and the text content, and the text and the noise image can be judged better by an auxiliary model. The learning of the weight coefficient is completed by training the visual feature and the attribute feature.

Semantic embedding layer: in charge of recording the text semantics of the electronic image data, the card usually presents the content in the form of key value pairs, as shown in fig. 8. Typically, key-value pairs are arranged in a particular relationship, such as left-right, up-down, and so forth. Through the semantic embedding layer and the position embedding layer, semantic information and position information are recorded, structural information naturally aligned with a text is learned, and when noise with high disguise is faced, the method can help to further identify text and image noise of the card, and the method comprises the following steps: scenes such as non-numeric characters appear in the numeric area.

Text embedding layer: the text is segmented into characters, and each character is converted into a vector, so that image noise can be conveniently processed subsequently.

The position embedding layer can be divided into four sub-embedding layers: x0, Y0, X1, Y1. X0, Y0, X1 and Y1 refer to specific positions of the text in the card image, wherein (X0 and Y0) are vertex coordinates of the upper left corner of the text border, and (X1 and Y1) are vertex coordinates of the lower right corner of the text border. The representation of the coordinates corresponding to the four sub-layers X, Y, w and h is calculated by converting the physical coordinates X0, Y0, X1 and Y1 into virtual coordinates, wherein X, Y, w and h refer to the coordinates (X, Y), width, height and position of the center point of the text border, and the embedded layer is a combination of the four sub-layers.

In an embodiment of the present invention, the model generation module includes a training unit, and the training unit is configured to train the neural network model through a predetermined number of samples according to a preset iteration number; and when the neural network model completes the iterative round of training and the identification accuracy is higher than or equal to a preset baseline, obtaining a card identification model according to the neural network model. Referring to fig. 5, a training unit provided by the present invention can be shown as the model training module 4 in fig. 5, and includes the following specific steps:

step S401: and setting the iteration turns of model training.

Step S402: the number of samples to be obtained in batches is set.

Step S403: the learning rate is set.

Step S404: reading the card electronic image data set, and dividing the training set and the test set according to a certain proportion.

Step S405: and loading the neural network model constructed in the step S402 for training.

Step S406: after the iteration of the specified round of step S401 is completed, the model is evaluated.

Step S407: and if the model identification accuracy reaches the baseline, saving the model file.

Therefore, as shown in fig. 6, on the basis of the above, the image recognition module 5 provided by the present invention can perform the following steps:

step S501: the model file saved in step S507 is loaded.

Step S502: and inputting the picture file for identification.

Referring to fig. 9, the present invention further provides a card identification method based on joint learning, which specifically includes:

s901, collecting electronic image data of a preset card; marking corresponding image areas in the electronic image data according to a preset marking rule to generate training image data;

s902, extracting structural information and visual information of image noise in the training image data;

s903 analyzes the structural information and the visual information through an embedded layer in a preset neural network model to obtain image characteristics, semantic characteristics, text vector characteristics and position characteristics;

s904, training the neural network model by using the image features, the semantic features, the text vector features and the position features to obtain a card recognition model;

s905 analyzes the electronic image data of the card to be identified through the card identification model to obtain an identification result.

As shown in fig. 10, in the above embodiment, analyzing the structural information and the visual information through an embedded layer in a preset neural network model, and obtaining image features, semantic features, text vector features, and position features includes:

s1001, extracting image information in a text frame in the training image data, setting a weight coefficient according to noise features in the image information, and screening out noise images in the image information through the weight coefficient to obtain image features;

s1002, acquiring corresponding text semantic features according to text contents in the image information; dividing the text content into characters, and converting each character into a vector to generate text vector characteristics;

s1003 generates position characteristics according to the coordinate information of the text in the image information.

Wherein, setting the weight coefficient according to the noise characteristics in the image information comprises: acquiring the noise characteristics according to the visual characteristics and the attribute characteristics in the image information; and respectively setting weight coefficients for the noise characteristics and the text contents by calculating the pixel matrix difference of the noise characteristics and the text contents in the image information. Further, the visual features comprise one or more combinations of size, color, font, granularity and shape of the image noise; the attribute characteristics comprise one or more of drip type, infection type, breakage type, pollution type.

In another embodiment of the present invention, training the neural network model using the image features, the semantic features, the text vector features, and the location features to obtain a card recognition model includes: training the neural network model through a preset number of samples according to a preset iteration turn; and when the neural network model completes the iterative round of training and the identification accuracy is higher than or equal to a preset baseline, obtaining a card identification model according to the neural network model.

As shown in fig. 11, the electronic device 600 may further include: communication module 110, input unit 120, audio processing unit 130, display 160, power supply 170. It is noted that the electronic device 600 does not necessarily include all of the components shown in FIG. 11; furthermore, the electronic device 600 may also comprise components not shown in fig. 11, which may be referred to in the prior art.

As shown in fig. 11, the central processor 100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 100 receiving input and controlling the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the central processing unit 100 may execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides input to the cpu 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used to display an object to be displayed, such as an image or a character. The display may be, for example, an LCD display, but is not limited thereto.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 140 may also be some other type of device. Memory 140 includes buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage section 142, and the application/function storage section 142 is used to store application programs and function programs or a flow for executing the operation of the electronic device 600 by the central processing unit 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage portion 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, address book application, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. The communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and receive audio input from the microphone 132 to implement general telecommunications functions. Audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, an audio processor 130 is also coupled to the central processor 100, so that recording on the local can be enabled through a microphone 132, and so that sound stored on the local can be played through a speaker 131.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A joint learning-based card recognition apparatus, comprising: the device comprises an image scanning module, a data preprocessing module, a model generating module and a result output module;

the image scanning module is used for acquiring electronic image data of a preset card;

the data preprocessing module is used for labeling the corresponding image area in the electronic image data according to a preset labeling rule to generate training image data;

the model generation module is used for extracting structural information and visual information of image noise in the training image data; analyzing the structural information and the visual information through an embedded layer in a preset neural network model to obtain image characteristics, semantic characteristics, text vector characteristics and position characteristics; training the neural network model by using the image features, the semantic features, the text vector features and the position features to obtain a card identification model;

and the result output module is used for analyzing the electronic image data of the card to be identified through the card identification model to obtain an identification result.

2. The joint learning-based card recognition device according to claim 1, wherein the model generation module includes an image feature extraction unit, a semantic feature extraction unit, a text feature extraction unit, and a position feature extraction unit;

the image feature extraction unit is used for extracting image information in a text frame in the training image data, setting a weight coefficient according to noise features in the image information, and screening out noise images in the image information through the weight coefficient to obtain image features;

the semantic feature extraction unit is used for acquiring corresponding text semantic features according to text contents in the image information;

the text feature extraction unit is used for dividing the text content into characters, converting each character into a vector and generating text vector features;

the position feature extraction unit is used for generating position features according to coordinate information of texts in the image information.

3. The joint learning-based card recognition device according to claim 2, wherein the image feature extraction unit further includes:

acquiring the noise characteristics according to the visual characteristics and the attribute characteristics in the image information;

and respectively setting weight coefficients for the noise characteristics and the text contents by calculating the pixel matrix difference of the noise characteristics and the text contents in the image information.

4. The joint learning-based card recognition device according to claim 3, wherein the intuitive features include a combination of one or more of a size, a color, a font, a granularity, and a shape of an image noise; the attribute characteristics comprise one or more of drip type, infection type, breakage type, pollution type.

5. The joint learning-based card recognition device according to claim 1, wherein the model generation module comprises a training unit for training the neural network model through a predetermined number of samples according to a preset iteration cycle; and when the neural network model completes the iterative round of training and the identification accuracy is higher than or equal to a preset baseline, obtaining a card identification model according to the neural network model.

6. A method for card recognition based on joint learning, the method comprising:

collecting electronic image data of a preset card;

marking corresponding image areas in the electronic image data according to a preset marking rule to generate training image data;

extracting structural information and visual information of image noise in the training image data;

analyzing the structural information and the visual information through an embedded layer in a preset neural network model to obtain image characteristics, semantic characteristics, text vector characteristics and position characteristics;

training the neural network model by using the image features, the semantic features, the text vector features and the position features to obtain a card identification model;

and analyzing the electronic image data of the card to be identified through the card identification model to obtain an identification result.

7. The joint learning-based card recognition method according to claim 6, wherein analyzing the structural information and the visual information through an embedded layer in a preset neural network model to obtain image features, semantic features, text vector features and position features comprises:

extracting image information in a text frame in the training image data, setting a weight coefficient according to noise features in the image information, and screening out noise images in the image information through the weight coefficient to obtain image features;

obtaining corresponding text semantic features according to text contents in the image information;

dividing the text content into characters, and converting each character into a vector to generate text vector characteristics;

and generating position characteristics according to the coordinate information of the text in the image information.

8. The joint learning-based card recognition method according to claim 7, wherein setting a weight coefficient according to the noise features in the image information comprises:

9. The joint learning-based card recognition method according to claim 8, wherein the visual features include a combination of one or more of size, color, font, granularity, shape of image noise; the attribute characteristics comprise one or more of drip type, infection type, breakage type, pollution type.

10. The joint learning-based card recognition method of claim 6, wherein training the neural network model using the image features, the semantic features, the text vector features and the location features to obtain a card recognition model comprises:

training the neural network model through a preset number of samples according to a preset iteration turn;

and when the neural network model completes the iterative round of training and the identification accuracy is higher than or equal to a preset baseline, obtaining a card identification model according to the neural network model.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 6 to 10 when executing the computer program.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 6 to 10.