CN114626551A - Training method of text recognition model, text recognition method and related device - Google Patents

Training method of text recognition model, text recognition method and related device Download PDF

Info

Publication number
CN114626551A
CN114626551A CN202210283937.9A CN202210283937A CN114626551A CN 114626551 A CN114626551 A CN 114626551A CN 202210283937 A CN202210283937 A CN 202210283937A CN 114626551 A CN114626551 A CN 114626551A
Authority
CN
China
Prior art keywords
text
vector
training
topic
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210283937.9A
Other languages
Chinese (zh)
Inventor
陈维识
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202210283937.9A priority Critical patent/CN114626551A/en
Publication of CN114626551A publication Critical patent/CN114626551A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a training method of a text recognition model, a text recognition method and a related device, which are used for solving the problem of insufficient training samples in a topic business scene and accelerating the training speed through a pre-training model. The training method comprises the following steps: acquiring a target text, wherein the target text comprises a first text, a first mask text and a second text; inputting the first text, the first mask text and the second text into a text recognition model to obtain a topic prediction result, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein the topic prediction result is output by the text recognition model and is used for representing whether the first text and the second text belong to the same topic type; and determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic tag and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.

Description

Training method of text recognition model, text recognition method and related device
Technical Field
The present disclosure relates to the field of natural language processing technologies, and in particular, to a training method for a text recognition model, a text recognition method, and a related apparatus.
Background
With the rapid development of the internet, forums, communities, websites and the like emerge, and communication interaction is provided for the same enthusiasts. In such websites, a plurality of different communication groups are usually divided, and each communication group provides the same type of topics for the interested users to perform communication and interaction, for example, a pet-raising related topic belongs to the pet communication group, a fishing related topic belongs to the fishing communication group, and the like. Users interested in pet care can communicate and interact in the pet communication group based on the topic of a certain pet care.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a training method for a text recognition model, the training method comprising:
acquiring a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by mask processing of the first text, the first text and the second text are labeled with topic tags, and the topic tags are used for representing whether the first text and the second text belong to the same topic type;
inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-trained model, and the pre-trained model is used for recognizing whether the two texts are similar;
determining a target loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjusting a parameter of the text recognition model based on the target loss function value.
In a second aspect, the present disclosure provides a text recognition method, the method comprising:
acquiring a text to be recognized and a comparison text;
obtaining a topic identification result of whether the text to be identified and the comparison text belong to the same topic type through a text identification model, wherein the text identification model is obtained through a training method of the text identification model according to the first aspect of the disclosure.
In a third aspect, the present disclosure provides a training apparatus for a text recognition model, the training apparatus comprising:
the system comprises a first obtaining module, a second obtaining module and a processing module, wherein the first obtaining module is used for obtaining a target text, the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by performing mask processing on the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type;
an input module, configured to input the first text, the first mask text, and the second text into the text recognition model, so as to obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, where an initialization parameter of the text recognition model is determined based on a parameter of a pre-training model, and the pre-training model is used for identifying whether the two texts are similar;
an adjustment module to determine an objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjust a parameter of the text recognition model based on the objective loss function value.
In a fourth aspect, the present disclosure provides a text recognition apparatus, the apparatus comprising:
the second acquisition module is used for acquiring the text to be recognized and the comparison text;
and the recognition result module is used for obtaining a topic recognition result of whether the text to be recognized and the comparison text belong to the same topic type through a text recognition model, wherein the text recognition model is obtained through the training method of the text recognition model in the first aspect of the disclosure.
In a fifth aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect of the present disclosure.
In a sixth aspect, the present disclosure provides an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to implement the steps of the method of the first aspect of the present disclosure.
Through the technical scheme, the pre-training model is used for identifying whether the two texts are similar or not, so that the parameters of the text recognition model are initialized through the pre-training model, which is equivalent to the rough adjustment of the parameters of the text recognition model. And then, training the text recognition model by using the target text marked with the topic label, and finely adjusting parameters of the text recognition model on the basis of coarse adjustment. Therefore, the training speed of the text recognition model can be increased, a small amount of target texts marked with topic labels can be used for training the text recognition model, and the marking cost is reduced. In addition, semi-supervised learning is carried out according to the first text vector, the first mask vector, the second text vector, the topic tag and the topic prediction result, so that the robustness of the text recognition model can be improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flow diagram of a method for training a text recognition model provided in accordance with an exemplary embodiment;
FIG. 2 is a block diagram of a text recognition model provided in accordance with an exemplary embodiment;
FIG. 3 is a schematic diagram of a first encoding network provided in accordance with an exemplary embodiment;
FIG. 4 is a block diagram of a pre-trained model provided in accordance with an exemplary embodiment;
FIG. 5 is a block diagram of a predictive network provided in accordance with an exemplary embodiment;
FIG. 6 is a block diagram of another text recognition model provided in accordance with an exemplary embodiment;
FIG. 7 is a flow diagram of a method of text recognition provided in accordance with an exemplary embodiment;
FIG. 8 is a block diagram of an apparatus for training a text recognition model provided in accordance with an exemplary embodiment;
FIG. 9 is a block diagram of a text recognition apparatus provided in accordance with an exemplary embodiment;
FIG. 10 is a block diagram of an electronic device provided in accordance with an example embodiment.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more complete and thorough understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
All actions of acquiring signals, information or data in the present disclosure are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.
As background, websites such as forums and communities usually divide a plurality of different communication groups, each communication group provides the same type of topics for interested users to communicate and interact, and the topic is a topic for providing users with communication and interaction. For example, "do pet cats lose hair as a manifestation of vitamin deficiency? The topic of "the user can issue the own opinion on the topic, and can also communicate with opinions issued by other users. Because the topics of the same communication group usually belong to the same type, when a new topic needs to be published in a certain communication group, whether the new topic conforms to the topic type of the communication group needs to be judged first, and if each new topic is manually judged, the timeliness and timeliness of publishing the new topic are affected.
In the related art, text recognition technology has achieved good results for recognition of text similarity, and public recognition models and labeled training samples can be obtained through the internet. However, a better recognition result cannot be obtained for a service scene that judges whether a new topic conforms to the topic type of a certain communication group. For example, the content is "is hair loss of pet cat is a manifestation of vitamin deficiency? "text 1 and content of" will a domestic pet turtle not interact with the owner? "text 2, text 1 and text 2 are not similar from a text similarity perspective. If the text 1 and the text 2 are used as topics to be identified by using a model for identifying text similarity, identification results that the two types do not belong to the same topic type can be obtained. However, in an actual topic business scenario, text 1 and text 2 may be the same topic type belonging to the pet communication group. However, for such a specific business scenario, especially when a new communication group is created, a large number of training samples cannot be provided for model training, and therefore a text recognition model with a good topic recognition effect cannot be obtained.
In order to solve the above technical problems, the present disclosure provides a training method of a text recognition model, a text recognition method and a related apparatus. Initializing parameters of a text recognition model through parameters of a pre-training model which can recognize whether two texts are similar, and then training the text recognition model by using a small amount of target texts marked with topic labels, thereby obtaining a text recognition model which can recognize whether a first text and a second text belong to the same topic type. The method can not only accelerate the training speed of the text recognition model, but also train the text recognition model by utilizing a small amount of target texts marked with topic labels, thereby reducing the marking cost.
The present disclosure is described below with reference to specific examples.
Fig. 1 is a training method of a text recognition model according to an exemplary embodiment, the training method including:
and S101, acquiring a target text.
The target text comprises a first text, a first mask text and a second text, the first mask text is obtained by performing mask processing on the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type.
S102, inputting the first text, the first mask text and the second text into a text recognition model to obtain a topic prediction result, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein the topic prediction result is output by the text recognition model and is used for representing whether the first text and the second text belong to the same topic type.
The initialization parameters of the text recognition model are determined based on the parameters of the pre-training model, and the pre-training model is used for recognizing whether two texts are similar or not.
S103, determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic tag and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.
By adopting the training method, firstly, the parameters of the text recognition model are initialized based on the parameters of the pre-training model, then the text recognition model is trained by utilizing the target text marked with the topic label, and finally, the parameters of the text recognition model are further adjusted according to the target loss function value. Because the pre-training model is used for identifying whether two texts are similar or not, the parameters of the text recognition model are initialized through the pre-training model, which is equivalent to the rough adjustment of the parameters of the text recognition model. And then, training the text recognition model by using the target text marked with the topic label, and finely adjusting the parameters of the text recognition model on the basis of coarse adjustment. Therefore, the training speed of the text recognition model can be increased, a small amount of target texts marked with topic labels can be used for training the text recognition model, and the marking cost is reduced. In addition, semi-supervised learning is carried out according to the first text vector, the first mask vector, the second text vector, the topic tag and the topic prediction result, so that the robustness of the text recognition model can be improved.
The following first explains the structure of the text recognition model provided in the embodiment of the present disclosure.
In a possible manner, referring to fig. 2, the text recognition model includes a first coding network, a second coding network, and a prediction network, and the first text, the first mask text, and the second text are input into the text recognition model, so as to obtain a topic prediction result output by the text recognition model and used for characterizing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, where: firstly, a first text is input into a first coding network to obtain a first text vector, a first mask text is input into the first coding network to obtain a first mask vector, and a second text is input into a second coding network to obtain a second text vector. And then inputting the first text vector and the second text vector into a prediction network to obtain a topic prediction result for representing whether the first text and the second text belong to the same topic type.
For example, referring to fig. 3, the first coding network may be an Encoder part (Encoder) in a transform model, and includes a first coding unit and a second coding unit, and the first coding unit is consistent with the second coding unit in structure, and includes a Multi-Head Attention layer (Multi-Head Attention), a residual chaining & regularization layer (Add & Norm), and two Feed-Forward network layers (Feed Forward). Taking the first text as an input example, the first coding unit performs feature processing to obtain a first feature vector, and then the first feature vector is input into the second coding unit to perform feature processing to output the first text vector. The processing flow of the first coding network may refer to the processing flow of the encoder in the transform model of the related art, and is not described herein again.
Accordingly, a first masked text is input into a first encoding network to output a first mask vector, and a second text is input into a second encoding network to output a second text vector. The structure of the second coding network may be the same as that of the first coding network, or may be different from that of the first coding network, for example, the number of coding units of the second coding network may not be the same as that of the first coding network, and may be specifically set according to requirements, which is not limited in the present disclosure.
In a possible manner, the parameters of the first coding network and the parameters of the second coding network are initialized as follows: firstly, a pre-training sample is obtained, wherein the pre-training sample comprises a first pre-training text and a second pre-training text. And then inputting the pre-training sample into a pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text or not, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and that of the second coding network. And initializing the parameters of the first coding network and the parameters of the second coding network according to the parameters of the pre-training coding network of the trained pre-training model.
Illustratively, referring to fig. 4, the pre-trained model includes a first pre-trained coding network, a second pre-trained coding network, and a pre-trained prediction network, and weight sharing (share weights) is performed between the first pre-trained coding network and the second pre-trained coding network. And inputting the first pre-training text into the first pre-training coding network to obtain a first pre-training vector, and inputting the second pre-training text into the second pre-training coding network to obtain a second pre-training vector. And then inputting the first pre-training vector and the second pre-training vector into a pre-training prediction network to obtain a pre-training prediction result. The pre-training model may refer to a text similarity recognition model in the related art, and the pre-training sample may obtain training data of the disclosed text similarity recognition model from the internet, where the pre-training sample is labeled with a label indicating whether the first pre-training text and the second pre-training text are similar.
It should be noted that, when the structure of the first coding network is different from the structure of the second coding network, the structure of the first pre-training coding network is the same as the structure of the first coding network, and the structure of the second pre-training coding network is the same as the structure of the second coding network. The parameters of the first coding network can be initialized based on the parameters of the first pre-training coding network of the trained pre-training model, and the parameters of the second coding network can be initialized based on the parameters of the second pre-training coding network of the trained pre-training model. Under the condition that the structure of the first coding network is the same as that of the second coding network, the structure of the first pre-training coding network is the same as that of the second pre-training coding network, and the first coding network and the second coding network can be initialized by using any pre-training coding network, namely, the parameters of the text recognition model are roughly adjusted.
Preferably, the model with the same structure of the first coding network and the model with the same structure of the second coding network are selected, so that the structure of the first pre-training coding network is the same as that of the second pre-training coding network, the pre-training model can be converged faster in the training process of the pre-training model, and the training speed of the pre-training model is increased. Initializing parameters of the first coding network and parameters of the second coding network according to parameters of a pre-training coding network of a trained pre-training model, namely performing coarse adjustment on the parameters of the text recognition model to obtain a text recognition model capable of predicting whether two texts are similar, further training the text recognition model by using a small amount of target texts marked with topic labels, and performing fine adjustment on the parameters of the text recognition model on the basis of the coarse adjustment to obtain a final text recognition model conforming to a topic service scene. Therefore, parameter initialization is carried out on the text recognition model by using the pre-training model, the problem that the number of target texts marked with the topic labels is insufficient can be solved, and the training speed of the text recognition model is accelerated.
In a possible manner, the prediction network includes a first decoding unit, a second decoding unit, and a determining unit, and the obtaining of the topic prediction result used for representing whether the first text and the second text belong to the same topic type by inputting the first text vector and the second text vector into the prediction network may be: firstly, a first text vector is used as a key vector and a value vector of a first decoding unit, a second text vector is used as a query vector of the first decoding unit, and the first text vector and the second text vector are input into the first decoding unit for feature processing, so that a second feature vector is obtained. And then inputting the first text vector as a key vector sum value vector of the second decoding unit and the second feature vector as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector. And finally, inputting the third feature vector into a judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining a topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
For example, referring to fig. 5, the prediction network may be a Decoder part (Decoder) in a transform model, and includes a first decoding unit, a second decoding unit, and a discriminating unit, and the first decoding unit is consistent with the second decoding unit in structure, and includes a Multi-Head Attention layer (Multi-Head Attention), a residual linking & regularization layer (Add & Norm), and two Feed-Forward network layers (Feed Forward).
First, a first text vector is used as a key vector (K: a vector representing the correlation between the inquired information and other information) and a value vector (V: a vector representing the inquired information) of a first decoding unit, and a second text vector is used as an inquiry vector (Q) of the first decoding unit, so that a second feature vector output by the first decoding unit is obtained. And then, inputting the first text vector as a key vector sum value vector of a second decoding unit and the second feature vector as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector. And finally, inputting the third feature vector into a judging unit, wherein the judging unit comprises a global average pooling layer, a full-link layer and a classifier, the classifier can be realized by Sigmoid Activation to obtain the probability value that the first text and the second text belong to the same topic type, and obtaining a topic prediction result based on the probability value that the first text and the second text belong to the same topic type. For example, a preset probability value is set, and if the probability value output by the classifier is greater than the preset probability value, it indicates that the first text and the second text belong to the same topic type, where the preset probability value may be 100%, 80%, and the like, which is not limited by this disclosure. In addition, the processing flow of the prediction network may refer to the processing flow of the decoder in the transform model of the related art, and is not described herein again.
In a possible approach, determining the objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result may be: first, a first loss function value is determined according to a first text vector, a first mask vector and a second text vector, then a second loss function value is determined according to a topic tag and a topic prediction result, and finally a target loss function value is determined based on the first loss function value and the second loss function value.
For example, determining the first loss function value from the first text vector, the first mask vector, and the second text vector may be: first, a first loss vector is obtained by subtracting the first text vector and the first mask vector, and a second loss vector is obtained by subtracting the first text vector and the second text vector. And subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain a first loss function value.
For example, the first loss function value may be calculated independent of the target text tagged topic tag, and may be determined by the following calculation:
L1=||H1-H1′||-||H1-H2||
wherein L is1Represents the first loss function value, H1Representing a first text vector, H1' denotes a first mask vector, H2Represents a second text vector, | H1-H1' | | denotes the euclidean distance of the first loss vector, | | H1-H2And | | | represents the euclidean distance of the second loss vector, wherein the euclidean distance is obtained by L2-Norm operation.
Alternatively, in other possible implementations, the first loss function value may be calculated by adding a weighting factor to the euclidean distance of the first loss vector and/or adding a weighting factor to the euclidean distance of the second loss vector.
For example, the calculation of the second loss function value may be dependent on the topic label of the target text label, and the second loss function value may be determined by the following calculation:
L2=CE(Y1,Y2)
wherein L is2Denotes the second loss function value, Y1Topic tag, Y, representing a target text annotation2The topic prediction result of the target text is represented, and CE represents a Binary Cross Entropy loss function (Binary Cross Entropy).
Further, after obtaining the first loss function value and the second loss function value, the first loss function value and the second loss function value may be added to obtain a target loss function, or the target loss function value may be determined by the following calculation formula:
L=aL1+bL2
where L represents the target loss function value, a represents the weight coefficient of the first loss function value, b represents the weight coefficient of the second loss function value, and the specific values of a and b may be determined as desired, which is not limited by this disclosure.
It should be noted that, in the training process of the text recognition model, the final goal is to determine the parameters of the text recognition model so that the objective loss function value is minimum, that is, the parameter θ of the text recognition model satisfies:
θ=argminθL
wherein argmin represents the value of the parameter θ of the text recognition model when the target loss function reaches the minimum value.
It should be appreciated that for the calculation of the first loss function value, contrast learning is introduced such that the smaller the difference between the first text vector and the first mask vector, and the larger the difference between the first text vector and the second text vector, and thus the smaller the first loss function value, the smaller the target loss function value is ultimately reduced, such that the text recognition model is trained. Therefore, the optimization of the text recognition model is simpler by introducing the contrast learning, the generalization capability of the text recognition model is improved, and the related technology can be referred to for the content of the contrast learning, which is not repeated herein.
In addition, the text recognition model provided by the embodiment of the disclosure calculates the target loss function value by determining the first loss function value independent of the topic label and determining the second loss function value dependent on the topic label, and performs semi-supervised learning by using consistency regularization, thereby improving the robustness of the text recognition model.
In another possible implementation manner, referring to fig. 6, the second text may be masked to obtain a second masked text, and the second masked text is input to the second coding network to obtain a second mask vector. In calculating the first loss function value, the second text vector and the second mask vector may be subtracted to obtain a third loss vector, and the second text vector and the first text vector may be subtracted to obtain a fourth loss vector. And subtracting the Euclidean distance of the fourth loss vector from the Euclidean distance of the third loss vector to obtain a first loss function value. Or selecting the first mask text and the second mask text to participate in the calculation of the first loss function value together, for example, in the above manner, obtaining a loss function based on the first mask text, and obtaining another loss function based on the second mask text, and then taking an average value of the two calculation results as the first loss function value, which is not limited by the present disclosure.
The target text used in the training process of the text recognition model can take the first text and the second text obtained by taking two topics from the topics of the same communication group as positive samples, and can take the first text and the second text obtained by randomly taking two topics from the topics of different communication groups as negative samples. Generally, the number of negative samples is much larger than that of positive samples, and in the embodiment of the present disclosure, in order to obtain a better training effect of the text recognition model, the number of positive samples may be controlled to be one third of the number of negative samples. And the target text can be selectively acquired according to the topic types at different angles and the requirements of different identification granularities, so that the identification granularity and the identification angle of the text identification model can be adjusted.
It should be noted that, in the text recognition model in the related art, a large number of labeled samples are usually required to train the text recognition model to obtain the trained text recognition model for topic recognition. However, when the text recognition model is trained in the embodiment of the present disclosure, the pre-training model is trained through the existing text similarity recognition model and the published labeled sample, then the parameters of the text recognition model are coarsely adjusted according to the parameters of the pre-training model, then the text recognition model is trained by using a small amount of target texts labeled with topic labels, and the parameters of the text recognition model are finely adjusted on the basis of the coarse adjustment. Therefore, the problem that the number of target texts marked with topic labels is insufficient is solved, marking cost is reduced, and the training speed of a text recognition model is increased. The number of target texts for training the text recognition model can be far smaller than the number of labeled samples for training the text recognition model in the related art.
Based on the same inventive concept, the embodiment of the present disclosure further provides a text recognition method, including:
and S701, acquiring a text to be recognized and a comparison text.
S702, obtaining a topic identification result whether the text to be identified and the comparison text belong to the same topic type through a text identification model.
The text recognition model is obtained by the training method of the text recognition model.
For example, taking the example of publishing a new topic in a certain communication group, the new topic to be published may be taken as a text to be recognized, and then the published topic is selected as a comparison text in the communication group. And inputting the text to be recognized into a first coding network of the text recognition model to obtain a text vector to be recognized, and inputting the comparison text into a second coding network of the text recognition model to obtain a comparison text vector. And inputting the text vector to be recognized and the comparison text vector into a prediction network to obtain a probability value that the text to be recognized and the comparison text belong to the same topic type, if the probability value is greater than a preset probability value, the text to be recognized and the comparison text belong to the same topic type, and a new topic to be issued can be issued in the communication group.
Or, a certain new topic to be published is existed, but the communication group suitable for publishing the new topic to be published is not determined, the new topic to be published can be used as a text to be identified, the published topic of each communication group is taken as a plurality of comparison texts, the steps of the text identification method are repeated, a target comparison text belonging to the same topic type as the new topic to be published is determined from the plurality of comparison texts, and the communication group where the target comparison text is located is determined as the communication group capable of publishing the new topic to be published.
By adopting the method, when a new topic is published in a certain communication group, whether the new topic accords with the topic type of the communication group is judged through the text recognition model, or the communication group suitable for publishing is determined for a certain new topic, and the timeliness and timeliness of publishing the new topic are improved by comparing with a manual judgment mode.
In a possible implementation manner, because the text recognition model uses the first mask text and the second mask text in the training process, in order to maintain the input consistency of the text recognition model and avoid the abnormality of the final output prediction result caused by inconsistent input parameters, the first preset value may be automatically input as the first mask text and the second preset value may be automatically input as the second mask text in the actual application process of the text recognition model. Since only the text to be recognized and the comparison text participate in the recognition process of the text recognition model, the first preset value and the second preset value may be any values, and the first preset value and the second preset value do not affect the prediction result, for example, the first preset value and the second preset value may be vectors which are all 0, which is not limited by the present disclosure.
In a possible implementation manner, multiple topics may be selected as multiple comparison texts in the same communication group, multiple topics which are published most recently are usually selected, for example, 20 topics which are published most recently are selected as multiple comparison texts, then multiple probability values of the new topic to be published and the multiple comparison texts, which belong to the same topic type, are obtained respectively, an average value of the multiple probability values is taken as a target probability value, and if the target probability value is greater than a preset probability value, the new topic to be published may be published in the communication group. By means of averaging the probability values, errors of the final prediction results caused by errors of single comparison texts are avoided.
Based on the same inventive concept, an embodiment of the present disclosure further provides a training apparatus for a text recognition model, and with reference to fig. 8, the training apparatus 800 includes:
the first obtaining module 801 is configured to obtain a target text, where the target text includes a first text, a first mask text, and a second text, where the first mask text is obtained by performing mask processing on the first text, and the first text and the second text are labeled with a topic tag, and the topic tag is used to represent whether the first text and the second text belong to the same topic type.
An input module 802, configured to input the first text, the first mask text, and the second text into the text recognition model, and obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, where an initialization parameter of the text recognition model is determined based on a parameter of a pre-training model, and the pre-training model is used to identify whether the two texts are similar.
An adjusting module 803, configured to determine an objective loss function value according to the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjust a parameter of the text identification model based on the objective loss function value.
Optionally, the adjusting module 803 includes:
a first loss module to determine a first loss function value from the first text vector, the first mask vector, and the second text vector;
a second loss module to determine a second loss function value from the topic tag and the topic prediction result;
a target loss module to determine the target loss function value based on the first loss function value and the second loss function value.
Optionally, the first loss module is to:
subtracting the first text vector and the first mask vector to obtain a first loss vector, and subtracting the first text vector and the second text vector to obtain a second loss vector;
and subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain the first loss function value.
Optionally, the text recognition model includes a first coding network, a second coding network and a prediction network, and the input module 802 is configured to:
inputting the first text into the first coding network to obtain a first text vector, inputting the first mask text into the first coding network to obtain a first mask vector, and inputting the second text into the second coding network to obtain a second text vector;
and inputting the first text vector and the second text vector into the prediction network to obtain a topic prediction result for representing whether the first text and the second text belong to the same topic type.
Optionally, the training apparatus 800 includes an initialization module, configured to initialize the parameters of the first coding network and the parameters of the second coding network by:
acquiring a pre-training sample, wherein the pre-training sample comprises a first pre-training text and a second pre-training text;
inputting the pre-training sample into the pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text or not, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and that of the second coding network;
and initializing the parameters of the first coding network and the parameters of the second coding network according to the trained parameters of the pre-training coding network of the pre-training model.
Optionally, the prediction network includes a first decoding unit, a second decoding unit and a discrimination unit, and the input module 802 is configured to:
inputting the first text vector as a key vector and a value vector of the first decoding unit and the second text vector as a query vector of the first decoding unit into the first decoding unit for feature processing to obtain a second feature vector;
inputting the first text vector serving as a key vector sum value vector of the second decoding unit and the second feature vector serving as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector;
and inputting the third feature vector into the judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining the topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
By adopting the training device, firstly, the parameters of the text recognition model are initialized based on the parameters of the pre-training model, then the text recognition model is trained by utilizing the target text marked with the topic label, and finally, the parameters of the text recognition model are further adjusted according to the target loss function value. Because the pre-training model is used for identifying whether two texts are similar or not, the parameters of the text recognition model are initialized through the pre-training model, which is equivalent to the rough adjustment of the parameters of the text recognition model. And then, training the text recognition model by using the target text marked with the topic label, and finely adjusting the parameters of the text recognition model on the basis of coarse adjustment. Therefore, the training speed of the text recognition model can be increased, a small amount of target texts marked with topic labels can be used for training the text recognition model, and the marking cost is reduced. In addition, semi-supervised learning is carried out according to the first text vector, the first mask vector, the second text vector, the topic tag and the topic prediction result, so that the robustness of the text recognition model can be improved.
Based on the same inventive concept, an embodiment of the present disclosure further provides a text recognition apparatus, and referring to fig. 9, the apparatus 900 includes:
a second obtaining module 901, configured to obtain a text to be recognized and a comparison text.
The recognition result module 902 is configured to obtain, through a text recognition model, a topic recognition result whether the text to be recognized and the comparison text belong to the same topic type, where the text recognition model is obtained through the above training method of the text recognition model.
By adopting the device, when a new topic is published in a certain communication group, whether the new topic accords with the topic type of the communication group is judged through the text recognition model, or the communication group suitable for publishing is determined for a certain new topic, and the timeliness and timeliness of publishing the new topic are improved by comparing with a manual judgment mode.
Based on the same concept, embodiments of the present disclosure further provide a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processing device, implements any of the above-mentioned steps of the text recognition model training method or the text recognition method.
Based on the same concept, an embodiment of the present disclosure further provides an electronic device, including:
a storage device having a computer program stored thereon;
and the processing device is used for executing the computer program in the storage device so as to realize the training method of any text recognition model or the steps of the text recognition method.
Referring now to FIG. 10, a block diagram of an electronic device 1000 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communication device 1009 may allow the electronic device 1000 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the communication may be performed using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by mask processing of the first text, the first text and the second text are labeled with topic tags, and the topic tags are used for representing whether the first text and the second text belong to the same topic type; inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-trained model, and the pre-trained model is used for recognizing whether the two texts are similar; determining a target loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjusting a parameter of the text recognition model based on the target loss function value.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a text to be recognized and a comparison text; and obtaining a topic identification result of whether the text to be identified and the comparison text belong to the same topic type through a text identification model, wherein the text identification model is obtained through any one of the training methods of the text identification model provided by the disclosure.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of the module does not in some cases constitute a limitation of the module itself, and for example, the first acquiring module may also be described as a "module that acquires target text".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides a training method of a text recognition model, the training method including: acquiring a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by mask processing of the first text, the first text and the second text are labeled with topic tags, and the topic tags are used for representing whether the first text and the second text belong to the same topic type; inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-trained model, and the pre-trained model is used for recognizing whether the two texts are similar; determining a target loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjusting a parameter of the text recognition model based on the target loss function value.
Example 2 provides the training method of example 1, the determining an objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, comprising: determining a first loss function value from the first text vector, the first mask vector, and the second text vector; determining a second loss function value from the topic tag and the topic prediction result; determining the target loss function value based on the first loss function value and the second loss function value.
Example 3 provides the training method of example 2, the determining a first loss function value from the first text vector, the first mask vector, and the second text vector, comprising: subtracting the first text vector and the first mask vector to obtain a first loss vector, and subtracting the first text vector and the second text vector to obtain a second loss vector; and subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain the first loss function value.
Example 4 provides the training method of any one of examples 1 to 3, wherein the text recognition model includes a first coding network, a second coding network, and a prediction network, and the inputting the first text, the first mask text, and the second text into the text recognition model results in a topic prediction result output by the text recognition model and used for characterizing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, includes: inputting the first text into the first coding network to obtain a first text vector, inputting the first mask text into the first coding network to obtain a first mask vector, and inputting the second text into the second coding network to obtain a second text vector; and inputting the first text vector and the second text vector into the prediction network to obtain a topic prediction result for representing whether the first text and the second text belong to the same topic type.
Example 5 provides the training method of example 4, wherein the parameters of the first coding network and the parameters of the second coding network are initialized by: acquiring a pre-training sample, wherein the pre-training sample comprises a first pre-training text and a second pre-training text; inputting the pre-training sample into the pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text or not, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and that of the second coding network; and initializing the parameters of the first coding network and the parameters of the second coding network according to the trained parameters of the pre-training coding network of the pre-training model.
Example 6 provides the training method of example 4, wherein the prediction network includes a first decoding unit, a second decoding unit, and a discrimination unit, and the inputting the first text vector and the second text vector into the prediction network obtains a topic prediction result for characterizing whether the first text and the second text belong to the same topic type, including: inputting the first text vector as a key vector and a value vector of the first decoding unit and the second text vector as a query vector of the first decoding unit into the first decoding unit for feature processing to obtain a second feature vector; inputting the first text vector serving as a key vector sum value vector of the second decoding unit and the second feature vector serving as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector; and inputting the third feature vector into the judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining the topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
Example 7 provides, in accordance with one or more embodiments of the present disclosure, a text recognition method, the method comprising: acquiring a text to be recognized and a comparison text; obtaining a topic identification result of whether the text to be identified and the comparison text belong to the same topic type through a text identification model, wherein the text identification model is obtained through the training method of the text identification model in any one of examples 1-6.
Example 8 provides a training apparatus of a text recognition model, according to one or more embodiments of the present disclosure, the training apparatus including: the system comprises a first obtaining module, a second obtaining module and a processing module, wherein the first obtaining module is used for obtaining a target text, the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by performing mask processing on the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type; an input module, configured to input the first text, the first mask text, and the second text into the text recognition model, so as to obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, where an initialization parameter of the text recognition model is determined based on a parameter of a pre-training model, and the pre-training model is used for identifying whether the two texts are similar; an adjustment module to determine an objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjust a parameter of the text recognition model based on the objective loss function value.
In accordance with one or more embodiments of the present disclosure, example 9 provides a text recognition apparatus, the apparatus comprising: the second acquisition module is used for acquiring the text to be recognized and the comparison text; and the recognition result module is used for obtaining a topic recognition result of whether the text to be recognized and the comparison text belong to the same topic type through a text recognition model, wherein the text recognition model is obtained through the training method of the text recognition model in any one of examples 1 to 6.
Example 10 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing device, performs the steps of the method of any of examples 1-7, in accordance with one or more embodiments of the present disclosure.
Example 11 provides, in accordance with one or more embodiments of the present disclosure, an electronic device, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to implement the steps of the method of any of examples 1 to 7.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (11)

1. A training method of a text recognition model, the training method comprising:
acquiring a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by mask processing of the first text, the first text and the second text are labeled with topic tags, and the topic tags are used for representing whether the first text and the second text belong to the same topic type;
inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-trained model, and the pre-trained model is used for recognizing whether the two texts are similar;
determining a target loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjusting a parameter of the text recognition model based on the target loss function value.
2. The training method of claim 1, wherein the determining an objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result comprises:
determining a first loss function value from the first text vector, the first mask vector, and the second text vector;
determining a second loss function value from the topic tag and the topic prediction result;
determining the target loss function value based on the first loss function value and the second loss function value.
3. The training method of claim 2, wherein determining a first loss function value from the first text vector, the first mask vector, and the second text vector comprises:
subtracting the first text vector and the first mask vector to obtain a first loss vector, and subtracting the first text vector and the second text vector to obtain a second loss vector;
and subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain the first loss function value.
4. The training method as claimed in any one of claims 1 to 3, wherein the text recognition model comprises a first coding network, a second coding network and a prediction network, and the inputting the first text, the first mask text and the second text into the text recognition model results in a topic prediction result output by the text recognition model for characterizing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, comprises:
inputting the first text into the first coding network to obtain a first text vector, inputting the first mask text into the first coding network to obtain a first mask vector, and inputting the second text into the second coding network to obtain a second text vector;
and inputting the first text vector and the second text vector into the prediction network to obtain a topic prediction result for representing whether the first text and the second text belong to the same topic type.
5. Training method according to claim 4, characterized in that the parameters of the first coding network and the parameters of the second coding network are initialized by:
acquiring a pre-training sample, wherein the pre-training sample comprises a first pre-training text and a second pre-training text;
inputting the pre-training sample into the pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text or not, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and that of the second coding network;
and initializing the parameters of the first coding network and the parameters of the second coding network according to the trained parameters of the pre-training coding network of the pre-training model.
6. The training method as claimed in claim 4, wherein the prediction network comprises a first decoding unit, a second decoding unit and a discrimination unit, and the inputting the first text vector and the second text vector into the prediction network to obtain a topic prediction result for characterizing whether the first text and the second text belong to the same topic type comprises:
inputting the first text vector as a key vector and a value vector of the first decoding unit and the second text vector as a query vector of the first decoding unit into the first decoding unit for feature processing to obtain a second feature vector;
inputting the first text vector serving as a key vector sum value vector of the second decoding unit and the second feature vector serving as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector;
and inputting the third feature vector into the judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining the topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
7. A method of text recognition, the method comprising:
acquiring a text to be recognized and a comparison text;
obtaining a topic identification result of whether the text to be identified and the comparison text belong to the same topic type through a text identification model, wherein the text identification model is obtained through the training method of the text identification model as claimed in any one of claims 1-6.
8. An apparatus for training a text recognition model, the apparatus comprising:
the system comprises a first obtaining module, a second obtaining module and a processing module, wherein the first obtaining module is used for obtaining a target text, the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by performing mask processing on the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type;
an input module, configured to input the first text, the first mask text, and the second text into the text recognition model, so as to obtain a topic prediction result output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, where an initialization parameter of the text recognition model is determined based on a parameter of a pre-training model, and the pre-training model is used for identifying whether the two texts are similar;
an adjustment module to determine an objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, and adjust a parameter of the text recognition model based on the objective loss function value.
9. A text recognition apparatus, characterized in that the apparatus comprises:
the second acquisition module is used for acquiring the text to be recognized and the comparison text;
a recognition result module, configured to obtain, through a text recognition model, a topic recognition result whether the text to be recognized and the comparison text belong to the same topic type, where the text recognition model is obtained through the training method of the text recognition model according to any one of claims 1 to 6.
10. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 7.
11. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 7.
CN202210283937.9A 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device Pending CN114626551A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210283937.9A CN114626551A (en) 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210283937.9A CN114626551A (en) 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device

Publications (1)

Publication Number Publication Date
CN114626551A true CN114626551A (en) 2022-06-14

Family

ID=81904141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210283937.9A Pending CN114626551A (en) 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device

Country Status (1)

Country Link
CN (1) CN114626551A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628177A (en) * 2023-05-22 2023-08-22 福建省网络与信息安全测评中心 Interactive data processing method and system for network security platform
CN117668563A (en) * 2024-01-31 2024-03-08 苏州元脑智能科技有限公司 Text recognition method, text recognition device, electronic equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628177A (en) * 2023-05-22 2023-08-22 福建省网络与信息安全测评中心 Interactive data processing method and system for network security platform
CN116628177B (en) * 2023-05-22 2023-11-14 福建省网络与信息安全测评中心 Interactive data processing method and system for network security platform
CN117668563A (en) * 2024-01-31 2024-03-08 苏州元脑智能科技有限公司 Text recognition method, text recognition device, electronic equipment and readable storage medium
CN117668563B (en) * 2024-01-31 2024-04-30 苏州元脑智能科技有限公司 Text recognition method, text recognition device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN111476309A (en) Image processing method, model training method, device, equipment and readable medium
CN114626551A (en) Training method of text recognition model, text recognition method and related device
CN109961032B (en) Method and apparatus for generating classification model
CN113515942A (en) Text processing method and device, computer equipment and storage medium
CN112348081A (en) Transfer learning method for image classification, related device and storage medium
US11763204B2 (en) Method and apparatus for training item coding model
CN112149699A (en) Method and device for generating model and method and device for recognizing image
CN111930981A (en) Data processing method for sketch retrieval
CN113111917B (en) Zero sample image classification method and device based on dual self-encoders
CN113140012B (en) Image processing method, device, medium and electronic equipment
CN111915689B (en) Method, apparatus, electronic device, and computer-readable medium for generating an objective function
CN116541608B (en) House source recommendation method and device, electronic equipment and storage medium
CN113033707A (en) Video classification method and device, readable medium and electronic equipment
CN116258911A (en) Training method, device, equipment and storage medium for image classification model
CN115984868A (en) Text processing method, device, medium and equipment
CN116030375A (en) Video feature extraction and model training method, device, equipment and storage medium
CN115375657A (en) Method for training polyp detection model, detection method, device, medium, and apparatus
CN111914535B (en) Word recognition method and device, computer equipment and storage medium
CN112417260B (en) Localized recommendation method, device and storage medium
CN114547308A (en) Text processing method and device, electronic equipment and storage medium
CN116883708A (en) Image classification method, device, electronic equipment and storage medium
CN114187557A (en) Method, device, readable medium and electronic equipment for determining key frame
CN113222050A (en) Image classification method and device, readable medium and electronic equipment
CN117392260B (en) Image generation method and device
CN117392379B (en) Method and device for detecting target

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination