CN114529917A - Zero-sample Chinese single character recognition method, system, device and storage medium - Google Patents

Zero-sample Chinese single character recognition method, system, device and storage medium Download PDF

Info

Publication number
CN114529917A
CN114529917A CN202210095194.2A CN202210095194A CN114529917A CN 114529917 A CN114529917 A CN 114529917A CN 202210095194 A CN202210095194 A CN 202210095194A CN 114529917 A CN114529917 A CN 114529917A
Authority
CN
China
Prior art keywords
chinese
category
chinese single
learnable
single character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210095194.2A
Other languages
Chinese (zh)
Other versions
CN114529917B (en
Inventor
黄宇浩
毛慧芸
周伟英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210095194.2A priority Critical patent/CN114529917B/en
Publication of CN114529917A publication Critical patent/CN114529917A/en
Application granted granted Critical
Publication of CN114529917B publication Critical patent/CN114529917B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a zero-sample Chinese single character recognition method, a system, a device and a storage medium, wherein the method comprises the following steps: extracting visual characteristics of the Chinese single character image; the method comprises the steps of performing learnable class coding on Chinese single character classes, decomposing the component structure of the Chinese single characters, and calculating to obtain learnable class coding; mapping the category codes of the Chinese single characters into a visual space, and constraining semantic consistency of the category codes before and after mapping through a reconstruction loss function; matching the category code of the Chinese single character with the visual characteristics of the image through a transform-based decoder, acquiring the characteristics related to the category code from the visual characteristics of the image, and finally decoding and outputting the recognition result of the Chinese single character. The invention realizes the Chinese single character recognition of zero samples by a learnable category coding method, and solves the problem that the traditional Chinese single character recognition method depends on a large amount of labeled data. The invention can be widely applied to the technical field of pattern recognition and artificial intelligence.

Description

Zero-sample Chinese single character recognition method, system, device and storage medium
Technical Field
The invention relates to the technical field of pattern recognition and artificial intelligence, in particular to a zero-sample Chinese single character recognition method, a system, a device and a storage medium.
Background
Chinese is one of the oldest characters in the world and has been thousands of years old today. The study on Chinese character recognition has important value and significance for electronic preservation of ancient book data. The traditional Chinese single character recognition method mainly depends on a deep learning method taking data as drive, and a large number of training samples need to be labeled in the method. However, the Chinese language has huge category number, 70224 Chinese characters exist according to the standard of GB18030-2005, and it is difficult and time-consuming to label enough data for each Chinese character. Some related work in recent years has attempted to solve the above-mentioned problems by employing zero sample identification methods based on component decoding or on component encoding. However, the method based on component decoding requires a long decoding time and post-processing operation, and the method based on component encoding uses artificially designed encoding and cannot flexibly adjust according to different data.
Disclosure of Invention
To solve at least one of the technical problems in the prior art to a certain extent, the present invention provides a zero-sample Chinese character recognition method, system, device and storage medium.
The technical scheme adopted by the invention is as follows:
a zero sample Chinese single word recognition method comprises the following steps:
extracting visual characteristics of the Chinese single character image;
performing learnable category coding on Chinese single character categories, adopting a depth-first search algorithm to decompose the component structure of the Chinese single characters, and calculating to obtain learnable category coding;
mapping the category codes of the Chinese single characters into a visual space, enabling the dimension of the category codes of the Chinese single characters to be equal to the dimension of the visual space on the basis of a mapping module of a full connection layer, and constraining semantic consistency of the category codes before and after mapping through a reconstruction loss function;
matching the category code of the Chinese single character with the visual characteristics of the image through a transform-based decoder, acquiring the characteristics related to the category code from the visual characteristics of the image, and finally decoding and outputting the recognition result of the Chinese single character.
Further, the extracting the visual characteristics of the Chinese single character image comprises:
and extracting the visual characteristics of the Chinese single character image by adopting an image encoder based on a densely connected convolutional neural network.
Further, the image encoder adopts a DenseNet121 model as a backbone network for extracting visual features of the image;
the backbone network adopts an 8-time down-sampling mode, and in order to enable the output visual features to be better matched with the class codes, the backbone network eliminates the final output activation layer and the global average pooling layer.
Further, the class coding capable of learning the Chinese single character class, which decomposes the component structure of the Chinese single character by adopting a depth-first search algorithm and calculates to obtain the learnable class coding, includes:
obtaining a component sequence of decomposed Chinese single characters through a depth-first search algorithm according to a Chinese ideograph sequence dictionary, wherein the component sequence is represented as a tree data structure to obtain depth information and relative position information of each component; wherein the depth information indicates the depth of the component in the tree, and the relative position information indicates the position of the component relative to the parent node;
calculating to obtain the corresponding learnable class code of each Chinese single character, wherein the calculation process is expressed as formula (1):
Figure BDA0003490579340000021
wherein i represents a part in a sequence of parts R, liIndicating depth information, gamma, of the partiShowing the relative position information of the part, alpha and beta being learnable parameters, yiOne-hot encoding for the part;
splicing the learnable category code obtained by calculation with the depth information and the relative position information of each component in dimensionality to obtain the final learnable category code, wherein the calculation process is represented as formula (2):
Figure BDA0003490579340000022
wherein,
Figure BDA0003490579340000023
and
Figure BDA0003490579340000024
the normalized depth information and relative position information are represented,
Figure BDA0003490579340000025
a splicing operation is shown.
Further, the mapping module based on the full connection layer is composed of a full connection layer; the output elements of the full connection layer are all obtained by performing linear operation on the input elements;
the fully-connected layer maps the category codes of the Chinese single characters into a visual space, and the dimension of the category codes is equal to that of the visual space.
Further, the reconstruction loss function is used for calculating a mean square error of the class codes before and after mapping, and a calculation process is expressed as formula (3):
Figure BDA0003490579340000026
wherein L isreIs a function of the loss of the reconstruction,
Figure BDA0003490579340000027
indicating the mapped class code, phi (y)i) Representing the class code before mapping, b and wTIs the transpose of the bias and weight of the fully-connected layer, and N is the number of class codes.
Further, the concrete operations of the transform-based decoder include:
matching the category code of the Chinese single character and the visual characteristic of the image by adopting a multi-head attention mechanism, acquiring the characteristic related to the category code from the visual characteristic of the image, and expressing the calculation process as shown in the formula (4):
MultiHead(Q,K,V)=Concat(head1,...,headh)WO
headi=Attention(QWi Q,KWi K,VWi V)
Figure BDA0003490579340000031
wherein, the multi-head Attention is realized by MultiHead (Q, K, V), the Attention is calculated by Attention (Q, K, V), Q represents the category code of Chinese single words, K and V represent the visual characteristics of images, Wi Q、Wi K、Wi VAre all learnable projection matrices, dkDenoted is dimension Q, K, V, WOA parameter representing multi-head attention;
after the characteristics related to the class codes are obtained, decoding the characteristics by adopting a feed-forward neural network, wherein the feed-forward neural network consists of three full-connection layers and finally outputting the recognition result of the Chinese single characters; in the stage of feedforward neural network training, a cross entropy loss function is adopted as an optimization target of the network, and the expression of the cross entropy loss function is as follows:
Figure BDA0003490579340000032
wherein p isiIs the label probability of class i, qiIs the prediction probability for class i and k is the total number of classes.
The other technical scheme adopted by the invention is as follows:
a zero sample Chinese word recognition system, comprising:
the characteristic extraction module is used for extracting visual characteristics of the Chinese single character image;
the category coding module is used for carrying out learnable category coding on Chinese single character categories, decomposing the component structure of the Chinese single characters by adopting a depth-first search algorithm and calculating to obtain learnable category coding;
the information mapping module is used for mapping the category codes of the Chinese single characters into a visual space, the dimension of the category codes of the Chinese single characters is equal to the dimension of the visual space on the basis of the mapping module of a full connection layer, and semantic consistency of the category codes before and after mapping is restrained through a reconstruction loss function;
and the information matching module is used for matching the category codes of the Chinese characters and the visual characteristics of the images through a transform-based decoder, acquiring the characteristics related to the category codes from the visual characteristics of the images, and finally decoding and outputting the recognition results of the Chinese characters.
The other technical scheme adopted by the invention is as follows:
a zero sample Chinese single word recognition device comprises:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method described above.
The other technical scheme adopted by the invention is as follows:
a computer readable storage medium in which a processor executable program is stored, which when executed by a processor is for performing the method as described above.
The invention has the beneficial effects that: the invention realizes Chinese single character recognition of zero sample by a learnable category coding method, and solves the problems that the traditional Chinese single character recognition method depends on a large amount of labeled data and needs to consume time and money labeled data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating the steps of a zero-sample Chinese word recognition method based on learnable class codes according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an image encoder in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a learnable class code in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a fully-connected layer based mapping module in an embodiment of the invention;
FIG. 5 is a diagram of a transform-based decoder according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
As shown in fig. 1, the present embodiment provides a zero-sample chinese word recognition method based on learnable type codes, which adopts the zero-sample chinese word recognition method based on learnable type codes, and replaces the manual design mode by the learnable type codes, so that it can be flexibly adjusted according to different data. The method comprises the following steps:
s1, extracting visual features of the Chinese character image, and extracting the visual features of the Chinese character image by adopting an image encoder based on a densely connected convolutional neural network, wherein the method specifically comprises the following steps:
a DenseNet121 model is used as a backbone network to extract visual characteristics of Chinese single-character images, and as shown in figure 2, an image encoder takes RGB three-channel images of single characters as input and outputs a visual characteristic diagram obtained after downsampling is 8 times. The DenseNet121 model consists of dense connection blocks and transition modules, and in order to enable output visual characteristics to be better matched with class codes, the backbone network eliminates an activation layer and a global average pooling layer which are output finally.
S2, performing learnable type coding on the Chinese character type, adopting a depth-first search algorithm to decompose the component structure of the Chinese character, and calculating to obtain the learnable type coding, specifically:
firstly, obtaining a component sequence of decomposed Chinese single characters through a depth-first search algorithm according to a Chinese ideograph sequence dictionary, wherein the component sequence can be represented as a tree data structure as shown in FIG. 3, so that the depth information and the relative position information of each component can be obtained; the depth information represents the depth of the component in the tree, and the relative position information represents the position of the component relative to the parent node of the component; then, the learnable category code corresponding to each Chinese single character is obtained by calculation, and the calculation process is expressed as formula (1):
Figure BDA0003490579340000051
wherein i represents a part in a sequence of parts R, liIndicating the depth information of the part, gammaiShown are relative position information of the part, α and β are learnable parameters, and initial values are set to 0.5 and 0.001, y, respectivelyiOne-hot encoding of the part.
By setting two learnable parameters of alpha and beta, the network can continuously adjust the codes of the categories in the training process, so that the codes can be better matched with the visual characteristics. Finally, the learnable category codes are spliced with the depth information and the relative position information of each component in dimensionality to obtain the final learnable category codes, and the calculation process is expressed as the formula (2):
Figure BDA0003490579340000052
wherein
Figure BDA0003490579340000061
And
Figure BDA0003490579340000062
the normalized depth information and relative position information are represented,
Figure BDA0003490579340000063
a splicing operation is shown. After the depth information and the relative position information are spliced, the category coding can not only represent the contained component information, but also represent the depth and the relative position of each component, so that the category coding can contain richer information, and the identification accuracy of the network is improved.
S3, based on the mapping module of the full connection layer, mapping the category code of the Chinese single character to the visual space, specifically:
the fully-connected layer-based mapping module, as shown in fig. 4, is composed of a fully-connected layer, and output elements of the fully-connected layer-based mapping module are all obtained by performing linear operation on input elements. The mapping module is used for converting the category codes into the dimension same as the visual feature of the image and fusing the depth information and the relative position information. In order to make semantic consistency of class codes before and after passing through a mapping module, a reconstruction loss function is adopted for constraint, the method is to calculate the mean square error of the class codes before and after mapping, and the calculation process is expressed as formula (3):
Figure BDA0003490579340000064
wherein L isreIs a function of the loss of the reconstruction,
Figure BDA0003490579340000065
indicating the mapped class code, phi (y)i) Representing the class code before mapping, b and wTIs thatThe shifting of the bias and weights of the fully-connected layers.
S4, matching the category code of the Chinese single character and the visual characteristic of the image by adopting a transform-based decoder, and decoding and outputting the recognition result of the Chinese single character, wherein the specific steps are as follows:
the transform-based decoder, as shown in fig. 5, matches the category code of the chinese word with the visual features of the image by using a multi-head attention mechanism, and obtains the features related to the category code from the visual features of the image, where the calculation process is represented by formula (4):
MultiHead(Q,K,V)=Concat(head1,...,headh)WO
headi=Attention(QWi Q,KWi K,VWi V)
Figure BDA0003490579340000066
wherein the multi-head Attention is realized by MultiHead (Q, K, V), the Attention is calculated by Attention (Q, K, V), Q represents the category coding of Chinese single words, K and V represent the visual characteristics of images, Wi Q、Wi K、Wi VAre all projection matrices that can be learned, dkShown is the dimension size, head, of Q, K, ViIndicating the attention of a certain head. In the calculation of Attention (Q, K, V), matrix multiplication of Q representing the category code and K representing the image feature is equivalent to matching the category code and the visual feature, and then an Attention matrix is calculated by a softmax function, and finally the Attention matrix is multiplied by V representing the visual feature to obtain the feature related to the category code.
After the characteristics related to the class codes are obtained, a feedforward neural network is adopted to decode the characteristics, the feedforward neural network is composed of three full-connection layers, in addition, the operations of residual connection and layer normalization are also adopted, and finally the recognition result of the Chinese single character is output. In the network training stage, in order to fit the network prediction result and the distribution of the real labels, a cross entropy loss function is adopted as an optimization target of the network, and the expression of the cross entropy loss function is shown as formula (5):
Figure BDA0003490579340000071
wherein p isiIs the label probability of class i, qiIs the prediction probability for class i and K is the total number of classes.
The invention realizes the Chinese single character recognition of zero sample by a method capable of learning class coding, and the method can self-adaptively adjust the class coding according to different data. In addition, the invention has simple and flexible realization process and can be transplanted into a mainstream character recognition framework. In summary, the method provided by the embodiment of the present invention has at least the following beneficial effects, compared with the prior art:
(1) the invention designs a zero sample recognition model for Chinese single characters, solves the problems that the existing Chinese single character recognition method depends on a large amount of marked data and needs to consume time and money marked data, ensures that the recognition model has better generalization capability, has simple and flexible realization process, and can be transplanted to a mainstream character recognition framework.
(2) The invention focuses on the problem of zero sample recognition of Chinese single characters, and compared with the existing zero sample Chinese single character recognition method, the method adopts learnable class coding, thereby replacing a manual design mode and enabling the method to be flexibly adjusted according to different data.
(3) The invention adopts a transform-based decoder which can quickly decode without post-processing operation, so that the decoder can be conveniently applied to actual scenes.
The present embodiment further provides a zero-sample chinese word recognition system, including:
the characteristic extraction module is used for extracting visual characteristics of the Chinese single character image;
the category coding module is used for carrying out learnable category coding on Chinese single character categories, decomposing the component structure of the Chinese single characters by adopting a depth-first search algorithm and calculating to obtain learnable category coding;
the information mapping module is used for mapping the category codes of the Chinese single characters into a visual space, the dimension of the category codes of the Chinese single characters is equal to the dimension of the visual space on the basis of the mapping module of a full connection layer, and semantic consistency of the category codes before and after mapping is restrained through a reconstruction loss function;
and the information matching module is used for matching the category codes of the Chinese characters and the visual characteristics of the images through a transform-based decoder, acquiring the characteristics related to the category codes from the visual characteristics of the images, and finally decoding and outputting the recognition results of the Chinese characters.
The zero-sample Chinese character recognition system can execute the zero-sample Chinese character recognition method provided by the embodiment of the method of the invention, can execute any combination implementation steps of the embodiment of the method, and has corresponding functions and beneficial effects of the method.
This embodiment also provides a zero sample chinese word recognition device, including:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of fig. 1.
The zero-sample Chinese character recognition device can execute the zero-sample Chinese character recognition method provided by the embodiment of the method of the invention, can execute any combination implementation steps of the embodiment of the method, and has corresponding functions and beneficial effects of the method.
The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.
The embodiment also provides a storage medium, which stores an instruction or a program capable of executing the zero-sample Chinese character recognition method provided by the embodiment of the method of the invention, and when the instruction or the program is run, the method can be executed by any combination of the embodiment of the method, and the method has corresponding functions and beneficial effects.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Further, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A zero sample Chinese single character recognition method is characterized by comprising the following steps:
extracting visual characteristics of the Chinese single character image;
performing learnable category coding on Chinese single character categories, adopting a depth-first search algorithm to decompose the component structure of the Chinese single characters, and calculating to obtain learnable category coding;
mapping the category codes of the Chinese single characters into a visual space, enabling the dimension of the category codes of the Chinese single characters to be equal to the dimension of the visual space on the basis of a mapping module of a full connection layer, and constraining semantic consistency of the category codes before and after mapping through a reconstruction loss function;
matching the category code of the Chinese single character with the visual characteristics of the image through a transform-based decoder, acquiring the characteristics related to the category code from the visual characteristics of the image, and finally decoding and outputting the recognition result of the Chinese single character.
2. The zero-sample Chinese word recognition method of claim 1, wherein the extracting visual features of Chinese word images comprises:
and extracting the visual characteristics of the Chinese single character image by adopting an image encoder based on a densely connected convolutional neural network.
3. The zero-sample Chinese character recognition method of claim 2, wherein the image encoder uses a DenseNet121 model as a backbone network for extracting visual features of the image;
the backbone network adopts an 8-time down-sampling mode, and in order to enable the output visual features to be better matched with the class codes, the backbone network eliminates the final output activation layer and the global average pooling layer.
4. The zero-sample Chinese word recognition method of claim 1, wherein the learnable class coding is performed on Chinese word classes by using a depth-first search algorithm to decompose the component structure of Chinese words and calculate the learnable class coding, and comprises:
obtaining a component sequence of decomposed Chinese single characters through a depth-first search algorithm according to a Chinese ideograph sequence dictionary, wherein the component sequence is represented as a tree data structure to obtain depth information and relative position information of each component; wherein the depth information indicates the depth of the component in the tree, and the relative position information indicates the position of the component relative to the parent node;
calculating to obtain the corresponding learnable class code of each Chinese single character, wherein the calculation process is expressed as formula (1):
Figure FDA0003490579330000011
wherein i represents a part in a sequence of parts R, liIndicating depth information, gamma, of the partiShowing the relative position information of the part, alpha and beta being learnable parameters, yiOne-hot encoding of the part;
splicing the learnable category code obtained by calculation with the depth information and the relative position information of each component in dimensionality to obtain the final learnable category code, wherein the calculation process is represented as formula (2):
Figure FDA0003490579330000021
wherein,
Figure FDA0003490579330000022
and
Figure FDA0003490579330000023
the normalized depth information and relative position information are represented,
Figure FDA0003490579330000024
a splicing operation is shown.
5. The zero-sample Chinese word recognition method of claim 1, wherein the fully-connected layer-based mapping module is composed of a fully-connected layer; the output elements of the full connection layer are all obtained by performing linear operation on the input elements;
the fully-connected layer maps the category codes of the Chinese single characters into a visual space, and the dimension of the category codes is equal to that of the visual space.
6. The zero-sample Chinese word recognition method of claim 5, wherein the reconstruction loss function is used to calculate the mean square error of the class code before and after mapping, and the calculation process is expressed as formula (3):
Figure FDA0003490579330000025
wherein L isreIs a function of the loss of the reconstruction,
Figure FDA0003490579330000026
indicating the mapped class code, phi (y)i) Representing the class code before mapping, b and wTIs the transpose of the bias and weight of the fully-connected layer, and N is the number of class codes.
7. The method of claim 1, wherein the transform-based decoder specifically operates as follows:
matching the category code of the Chinese single character and the visual characteristic of the image by adopting a multi-head attention mechanism, acquiring the characteristic related to the category code from the visual characteristic of the image, and expressing the calculation process as shown in the formula (4):
MultiHead(Q,K,V)=Concat(head1,…,headh)WO
Figure FDA0003490579330000027
Figure FDA0003490579330000028
wherein, the multi-head Attention is realized by MultiHead (Q, K, V), the Attention is calculated by Attention (Q, K, V), Q represents the category code of Chinese single words, K and V represent the visual characteristics of images,
Figure FDA0003490579330000029
are all learnable projection matrices, dkShown is dimension Q, K, V, WOA parameter representing multi-head attention;
after the characteristics related to the class codes are obtained, decoding the characteristics by adopting a feedforward neural network, wherein the feedforward neural network consists of three full-connection layers and finally outputting the recognition result of the Chinese single characters; in the stage of feedforward neural network training, a cross entropy loss function is adopted as an optimization target of the network, and the expression of the cross entropy loss function is as follows:
Figure FDA00034905793300000210
wherein p isiIs the label probability of class i, qiIs the prediction probability for class i and k is the total number of classes.
8. A zero sample Chinese word recognition system is characterized by comprising:
the characteristic extraction module is used for extracting visual characteristics of the Chinese single character image;
the category coding module is used for carrying out learnable category coding on Chinese single character categories, decomposing the component structure of the Chinese single characters by adopting a depth-first search algorithm and calculating to obtain learnable category coding;
the information mapping module is used for mapping the category codes of the Chinese single characters into a visual space, the dimension of the category codes of the Chinese single characters is equal to the dimension of the visual space on the basis of the mapping module of a full connection layer, and semantic consistency of the category codes before and after mapping is restrained through a reconstruction loss function;
and the information matching module is used for matching the category codes of the Chinese characters and the visual characteristics of the images through a transform-based decoder, acquiring the characteristics related to the category codes from the visual characteristics of the images, and finally decoding and outputting the recognition results of the Chinese characters.
9. A zero-sample Chinese word recognition device is characterized by comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is adapted to perform the method according to any one of claims 1 to 7 when executed by the processor.
CN202210095194.2A 2022-01-26 2022-01-26 Zero-sample Chinese single-word recognition method, system, device and storage medium Active CN114529917B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210095194.2A CN114529917B (en) 2022-01-26 2022-01-26 Zero-sample Chinese single-word recognition method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210095194.2A CN114529917B (en) 2022-01-26 2022-01-26 Zero-sample Chinese single-word recognition method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN114529917A true CN114529917A (en) 2022-05-24
CN114529917B CN114529917B (en) 2024-08-23

Family

ID=81623128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210095194.2A Active CN114529917B (en) 2022-01-26 2022-01-26 Zero-sample Chinese single-word recognition method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN114529917B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863407A (en) * 2022-07-06 2022-08-05 宏龙科技(杭州)有限公司 Multi-task cold start target detection method based on visual language depth fusion
CN117218667A (en) * 2023-11-07 2023-12-12 华侨大学 Chinese character recognition method and system based on character roots

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749556A (en) * 2020-08-04 2021-05-04 腾讯科技(深圳)有限公司 Multi-language model training method and device, storage medium and electronic equipment
CN113723421A (en) * 2021-09-06 2021-11-30 华南理工大学 Zero sample Chinese character recognition method based on matching category embedding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112749556A (en) * 2020-08-04 2021-05-04 腾讯科技(深圳)有限公司 Multi-language model training method and device, storage medium and electronic equipment
CN113723421A (en) * 2021-09-06 2021-11-30 华南理工大学 Zero sample Chinese character recognition method based on matching category embedding

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863407A (en) * 2022-07-06 2022-08-05 宏龙科技(杭州)有限公司 Multi-task cold start target detection method based on visual language depth fusion
CN117218667A (en) * 2023-11-07 2023-12-12 华侨大学 Chinese character recognition method and system based on character roots
CN117218667B (en) * 2023-11-07 2024-03-08 华侨大学 Chinese character recognition method and system based on character roots

Also Published As

Publication number Publication date
CN114529917B (en) 2024-08-23

Similar Documents

Publication Publication Date Title
CN109471895B (en) Electronic medical record phenotype extraction and phenotype name normalization method and system
RU2691214C1 (en) Text recognition using artificial intelligence
CN113656570A (en) Visual question answering method and device based on deep learning model, medium and equipment
CN110188827B (en) Scene recognition method based on convolutional neural network and recursive automatic encoder model
CN110837733A (en) Language model training method and system in self-reconstruction mode and computer readable medium
CN114529917A (en) Zero-sample Chinese single character recognition method, system, device and storage medium
CN112163429B (en) Sentence correlation obtaining method, system and medium combining cyclic network and BERT
CN112860847B (en) Video question-answer interaction method and system
CN116168401A (en) Training method of text image translation model based on multi-mode codebook
CN114372465A (en) Legal named entity identification method based on Mixup and BQRNN
CN114528835A (en) Semi-supervised specialized term extraction method, medium and equipment based on interval discrimination
CN117576534A (en) Two-stage image description generation method, system, device and storage medium
US11941360B2 (en) Acronym definition network
CN115964480A (en) Text classification method and device, electronic equipment and computer-readable storage medium
CN117851565A (en) Text visual question-answering method and system based on multi-source interaction
CN116484868A (en) Cross-domain named entity recognition method and device based on diffusion model generation
CN116028888A (en) Automatic problem solving method for plane geometry mathematics problem
CN111881257B (en) Automatic matching method, system and storage medium based on subject word and sentence subject matter
CN113449524B (en) Named entity identification method, system, equipment and medium
CN114896415A (en) Entity relation joint extraction method and device based on lightweight self-attention mechanism
CN115270792A (en) Medical entity identification method and device
CN114881038A (en) Chinese entity and relation extraction method and device based on span and attention mechanism
CN114611510A (en) Method and device for assisting machine reading understanding based on generative model
CN114372467A (en) Named entity extraction method and device, electronic equipment and storage medium
CN115617959A (en) Question answering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant