CN114626551B - Training method of text recognition model, text recognition method and related device - Google Patents

Training method of text recognition model, text recognition method and related device Download PDF

Info

Publication number
CN114626551B
CN114626551B CN202210283937.9A CN202210283937A CN114626551B CN 114626551 B CN114626551 B CN 114626551B CN 202210283937 A CN202210283937 A CN 202210283937A CN 114626551 B CN114626551 B CN 114626551B
Authority
CN
China
Prior art keywords
text
vector
training
topic
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210283937.9A
Other languages
Chinese (zh)
Other versions
CN114626551A (en
Inventor
陈维识
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202210283937.9A priority Critical patent/CN114626551B/en
Publication of CN114626551A publication Critical patent/CN114626551A/en
Application granted granted Critical
Publication of CN114626551B publication Critical patent/CN114626551B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a training method of a text recognition model, a text recognition method and a related device, which are used for solving the problem of insufficient training samples in a topic business scene and accelerating training speed through a pre-training model. The training method comprises the following steps: acquiring a target text, wherein the target text comprises a first text, a first mask text and a second text; inputting the first text, the first mask text and the second text into a text recognition model to obtain a topic prediction result, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein the topic prediction result is used for representing whether the first text and the second text belong to the same topic type or not, and the first mask vector is output by the text recognition model; and determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.

Description

Training method of text recognition model, text recognition method and related device
Technical Field
The disclosure relates to the technical field of natural language processing, in particular to a training method of a text recognition model, a text recognition method and a related device.
Background
With the rapid development of the Internet, forums, communities and other websites emerge, and communication interaction is provided for the same lovers. In such websites, a plurality of different communication teams are typically divided, each communication team providing the same type of topic for communication interaction by interested users, e.g. pet raising related topics belong to the pet communication teams, fishing related topics belong to the fishing communication teams, etc. Users interested in fostering pets can interact in a pet communication group based on a topic of fostering pets.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a training method of a text recognition model, the training method comprising:
Obtaining a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not;
Inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result, corresponding to the first text, a first text vector, corresponding to the second text and a first mask vector, corresponding to the first mask text, which are output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-training model, and the pre-training model is used for recognizing whether the two texts are similar;
Determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.
In a second aspect, the present disclosure provides a text recognition method, the method comprising:
Acquiring a text to be identified and a comparison text;
and obtaining a topic identification result of whether the text to be identified and the comparison text belong to the same topic type or not through a text identification model, wherein the text identification model is obtained through the training method of the text identification model in the first aspect of the disclosure.
In a third aspect, the present disclosure provides a training device for a text recognition model, the training device comprising:
The first acquisition module is used for acquiring target texts, wherein the target texts comprise a first text, a first mask text and a second text, the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not;
The input module is used for inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result, a first text vector, a second text vector and a first mask vector, wherein the topic prediction result is output by the text recognition model and is used for representing whether the first text and the second text belong to the same topic type, the first text vector corresponds to the first text, the second text vector corresponds to the second text and the first mask vector corresponds to the first mask text, the initialization parameter of the text recognition model is determined based on the parameter of a pre-training model, and the pre-training model is used for recognizing whether two texts are similar;
And the adjusting module is used for determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result and adjusting parameters of the text recognition model based on the target loss function value.
In a fourth aspect, the present disclosure provides a text recognition apparatus, the apparatus comprising:
the second acquisition module is used for acquiring the text to be identified and the comparison text;
the recognition result module is used for obtaining a topic recognition result whether the text to be recognized and the comparison text belong to the same topic type or not through a text recognition model, wherein the text recognition model is obtained through the training method of the text recognition model in the first aspect of the disclosure.
In a fifth aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which when executed by a processing device performs the steps of the method of the first aspect of the present disclosure.
In a sixth aspect, the present disclosure provides an electronic device, comprising:
a storage device having a computer program stored thereon;
Processing means for executing said computer program in said storage means to carry out the steps of the method of the first aspect of the disclosure.
Through the technical scheme, the pre-training model is used for identifying whether two texts are similar, so that the parameters of the text identification model are initialized through the pre-training model, which is equivalent to coarse adjustment of the parameters of the text identification model. And then training the text recognition model by using the target text marked with the topic label, and fine adjustment can be carried out on parameters of the text recognition model on the basis of coarse adjustment. Therefore, the training speed of the text recognition model can be increased, a small amount of target texts marked with topic labels can be used for training the text recognition model, and the marking cost is reduced. In addition, the robustness of the text recognition model can be improved by performing semi-supervised learning according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flowchart of a training method for a text recognition model provided in accordance with an exemplary embodiment;
FIG. 2 is a schematic diagram of a text recognition model provided in accordance with an exemplary embodiment;
FIG. 3 is a schematic diagram of a first encoding network provided in accordance with an exemplary embodiment;
FIG. 4 is a schematic diagram of the structure of a pre-training model provided in accordance with an exemplary embodiment;
FIG. 5 is a schematic diagram of a predictive network architecture provided in accordance with an exemplary embodiment;
FIG. 6 is a schematic diagram of another text recognition model provided in accordance with an exemplary embodiment;
FIG. 7 is a flowchart of a text recognition method provided in accordance with an exemplary embodiment;
FIG. 8 is a block diagram of a training device for a text recognition model provided in accordance with an exemplary embodiment;
FIG. 9 is a block diagram of a text recognition device provided in accordance with an exemplary embodiment;
fig. 10 is a block diagram of an electronic device provided in accordance with an exemplary embodiment.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
All actions in this disclosure to obtain signals, information or data are performed in compliance with the corresponding data protection legislation policies of the country of location and to obtain authorization granted by the owner of the corresponding device.
As described in the background art, websites such as forums and communities generally divide into a plurality of different communication groups, and each communication group provides a topic of the same type for interested users to interact with each other, where the topic is a subject for providing a communication interaction for the users. For example, do there are "pet cats falling hair is vitamin deficient performance? "topic, the user can post his own opinion on the topic under the topic, and can also communicate with the opinion posted by other users. Since topics of the same communication group generally belong to the same type, when a new topic needs to be issued in a certain communication group, whether the new topic accords with the topic type of the communication group needs to be judged first, and if each new topic is judged manually, timeliness and timeliness of issuing the new topic can be affected.
In the related art, the text recognition technology has achieved good effect on recognition of text similarity, and a public recognition model and a labeled training sample can be obtained through the internet. However, aiming at a business scenario for judging whether the new topic accords with the topic type of a certain communication group, a better identification result cannot be obtained. For example, what is "is pet cat hair loss a vitamin-deficient manifestation? Text 1 and content of "do a domestic pet turtle not interact with the owner? "text 2, text 1 and text 2 are dissimilar from the perspective of text similarity. If the text 1 and the text 2 are identified as topics by using a model for identifying the similarity of the texts, an identification result that the two are not of the same topic type is obtained. However, in an actual topic traffic scenario, text 1 and text 2 may be of the same topic type as belonging to the pet communication team. However, for this specific business scenario, especially when a new communication group is created, a text recognition model with good recognition effect of the topics cannot be obtained because a large number of training samples cannot be provided for model training.
In order to solve the technical problems, the disclosure provides a training method of a text recognition model, a text recognition method and a related device. Parameters of a text recognition model are initialized through parameters of a pre-training model capable of recognizing whether two texts are similar or not, and then the text recognition model is trained by a small number of target texts marked with topic labels, so that the text recognition model capable of recognizing whether a first text and a second text belong to the same topic type or not is obtained. The training speed of the text recognition model can be increased, the text recognition model can be trained by using a small amount of target texts marked with topic labels, and the marking cost is reduced.
The present disclosure is described below in connection with specific embodiments.
FIG. 1 is a training method for a text recognition model, according to an exemplary embodiment, the training method comprising:
s101, acquiring a target text.
The target text comprises a first text, a first mask text and a second text, wherein the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not.
S102, inputting the first text, the first mask text and the second text into a text recognition model to obtain a topic prediction result, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, wherein the topic prediction result is used for representing whether the first text and the second text belong to the same topic type, and the first mask vector is output by the text recognition model.
The initialization parameters of the text recognition model are determined based on parameters of a pre-training model, and the pre-training model is used for recognizing whether two texts are similar.
S103, determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.
By adopting the training method, firstly, the parameters of the text recognition model are initialized based on the parameters of the pre-training model, then the text recognition model is trained by utilizing the target text marked with the topic label, and finally, the parameters of the text recognition model are further adjusted according to the target loss function value. Since the pre-training model is used for identifying whether two texts are similar, initializing parameters of the text recognition model through the pre-training model is equivalent to performing coarse adjustment on the parameters of the text recognition model. And then training the text recognition model by using the target text marked with the topic label, and fine adjustment can be carried out on parameters of the text recognition model on the basis of coarse adjustment. Therefore, the training speed of the text recognition model can be increased, a small amount of target texts marked with topic labels can be used for training the text recognition model, and the marking cost is reduced. In addition, the robustness of the text recognition model can be improved by performing semi-supervised learning according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result.
The structure of the text recognition model provided by the embodiment of the present disclosure will be described first.
In a possible manner, referring to fig. 2, the text recognition model includes a first encoding network, a second encoding network and a prediction network, and the first text, the first mask text and the second text are input into the text recognition model to obtain a topic prediction result, a first text vector corresponding to the first text, a second text vector corresponding to the second text and a first mask vector corresponding to the first mask text, which are output by the text recognition model and are used for representing whether the first text and the second text belong to the same topic type, where the first mask vector may be: first, inputting a first text into a first coding network to obtain a first text vector, inputting a first mask text into the first coding network to obtain a first mask vector, and inputting a second text into a second coding network to obtain a second text vector. And then inputting the first text vector and the second text vector into a prediction network to obtain a topic prediction result used for representing whether the first text and the second text belong to the same topic type.
Illustratively, referring to fig. 3, the first encoding network may be an encoder section (Encoder) in a transform model, including a first encoding unit and a second encoding unit, and the first encoding unit is identical to the second encoding unit in structure, including a Multi-Head Attention layer (Multi-Head Attention), a residual link & regularization layer (Add & Norm), and two Feed Forward network layers (Feed Forward). Taking the first text as an input example, performing feature processing through a first coding unit to obtain a first feature vector, and then inputting the first feature vector into a second coding unit to perform feature processing to output the first text vector. The processing flow of the first encoding network may refer to the processing flow of the encoder in the related art transducer model, which is not described herein.
Accordingly, inputting the first mask text into the first encoding network outputs a first mask vector, and inputting the second text into the second encoding network outputs a second text vector. The structure of the second coding network may be the same as the structure of the first coding network, or may be different from the structure of the first coding network, for example, the number of coding units of the second coding network may not be identical to the number of coding units of the first coding network, and may be specifically set according to the requirement, which is not limited in this disclosure.
In a possible manner, the parameters of the first encoding network and the parameters of the second encoding network are initialized as follows: first, a pre-training sample is acquired, wherein the pre-training sample comprises a first pre-training text and a second pre-training text. And inputting the pre-training sample into a pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and the second coding network. And initializing the parameters of the first coding network and the parameters of the second coding network according to the parameters of the pre-training coding network of the trained pre-training model.
For example, referring to fig. 4, the pre-training model includes a first pre-training encoding network, a second pre-training encoding network, and a pre-training prediction network, and weight sharing between the first pre-training encoding network and the second pre-training encoding network (SHARE WEIGHTS). The first pre-training text is input into a first pre-training encoding network to obtain a first pre-training vector, and the second pre-training text is input into a second pre-training encoding network to obtain a second pre-training vector. And inputting the first pretraining vector and the second pretraining vector into a pretraining prediction network to obtain pretraining prediction results. The pre-training model can refer to a text similarity recognition model in the related technology, and the pre-training sample can acquire training data of the disclosed text similarity recognition model from the Internet and is marked with a label whether the first pre-training text and the second pre-training text are similar or not.
It should be noted that, when the structure of the first coding network and the structure of the second coding network are different, the structure of the first pre-training coding network is the same as the structure of the first coding network, and the structure of the second pre-training coding network is the same as the structure of the second coding network. The parameters of the first encoding network may thus be initialized based on the parameters of the first pre-trained encoding network of the trained pre-trained model, and the parameters of the second encoding network may be initialized based on the parameters of the second pre-trained encoding network of the trained pre-trained model. Under the condition that the structure of the first coding network is the same as that of the second coding network, the structure of the first pre-training coding network is the same as that of the second pre-training coding network, and any pre-training coding network can be utilized to initialize parameters of the first coding network and the second coding network, namely coarse adjustment is carried out on parameters of a text recognition model.
Preferably, a model with the same structure of the first coding network and the same structure of the second coding network is selected, so that the structure of the first pre-training coding network and the structure of the second pre-training coding network are the same, the pre-training model can be converged more quickly in the training process of the pre-training model, and the training speed of the pre-training model is increased. Initializing parameters of a first coding network and parameters of a second coding network according to parameters of a pre-training coding network of a trained pre-training model, which is equivalent to coarse tuning of parameters of a text recognition model, so as to obtain the text recognition model capable of predicting whether two texts are similar, further training the text recognition model by using a small amount of target texts marked with topic labels, and fine tuning of parameters of the text recognition model on the basis of coarse tuning so as to obtain the final text recognition model conforming to topic business scenes. Therefore, the text recognition model is initialized by utilizing the pre-training model, the problem of insufficient target text quantity marked with the topic label can be solved, and the training speed of the text recognition model is increased.
In a possible manner, the prediction network includes a first decoding unit, a second decoding unit, and a discriminating unit, and the inputting the first text vector and the second text vector into the prediction network, to obtain a topic prediction result for characterizing whether the first text and the second text belong to the same topic type may be: the first text vector is used as a key vector sum value vector of the first decoding unit, and the second text vector is used as a query vector of the first decoding unit and is input into the first decoding unit for feature processing, so that a second feature vector is obtained. And then, the first text vector is used as a key vector sum value vector of the second decoding unit, the second feature vector is used as a query vector of the second decoding unit, and the second feature vector is input into the second decoding unit for feature processing, so that a third feature vector is obtained. And finally, inputting the third feature vector into a judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining a topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
For example, referring to fig. 5, the prediction network may be a Decoder part (Decoder) in a transform model, including a first decoding unit, a second decoding unit, and a discriminating unit, and the first decoding unit is consistent with the second decoding unit in structure, including a Multi-Head Attention layer (Multi-Head Attention), a residual link & regularization layer (Add & Norm), and two Feed Forward network layers (Feed Forward).
First, a first text vector is used as a key vector (K: a vector representing the correlation between the information to be queried and other information) and a value vector (V: a vector representing the information to be queried) of a first decoding unit, and a second text vector is used as a query vector (Q) of the first decoding unit, so as to obtain a second feature vector output by the first decoding unit. And then, the first text vector is used as a key vector sum value vector of the second decoding unit, and the second feature vector is used as a query vector of the second decoding unit and is input into the second decoding unit for feature processing, so that a third feature vector is obtained. And finally, inputting the third feature vector into a judging unit, wherein the judging unit comprises a global average pooling layer, a full connection layer and a classifier, the classifier can be realized by Sigmoid Activation to obtain probability values of the first text and the second text belonging to the same topic type, and obtaining a topic prediction result based on the probability values of the first text and the second text belonging to the same topic type. For example, a preset probability value is set, and if the probability value output by the classifier is greater than the preset probability value, the first text and the second text belong to the same topic type, and the preset probability value can be 100%, 80%, etc., which is not limited in the disclosure. In addition, the processing flow of the prediction network may refer to the processing flow of the decoder in the related art transducer model, which is not described herein.
In a possible manner, determining the objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result may be: first, a first loss function value is determined according to the first text vector, the first mask vector and the second text vector, then a second loss function value is determined according to the topic label and the topic prediction result, and finally a target loss function value is determined based on the first loss function value and the second loss function value.
For example, determining the first loss function value from the first text vector, the first mask vector, and the second text vector may be: first, the first text vector and the first mask vector are subtracted to obtain a first loss vector, and the first text vector and the second text vector are subtracted to obtain a second loss vector. And then subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain a first loss function value.
For example, the calculation of the first loss function value is independent of the target text annotated topic label, and the first loss function value may be determined by the following calculation formula:
L1=||H1-H1′||-||H1-H2||
Wherein L 1 represents a first loss function value, H 1 represents a first text vector, H 1 'represents a first mask vector, H 2 represents a second text vector, |h 1-H1' || represents the euclidean distance of the first loss vector, and|h 1-H2 | represents the euclidean distance of the second loss vector, wherein the euclidean distance is obtained by L2-Norm operation.
Or in other possible implementations, the first loss function value may be calculated by adding a weight coefficient to the euclidean distance of the first loss vector and/or adding a weight coefficient to the euclidean distance of the second loss vector.
Illustratively, the calculation of the second loss function value is dependent on the target text annotated topic label, which can be determined by the following calculation:
L2=CE(Y1,Y2)
Where L 2 denotes the second loss function value, Y 1 denotes the topic label of the target text label, Y 2 denotes the topic prediction result of the target text, and CE denotes the binary cross entropy loss function (Binary Cross Entropy).
Further, after the first loss function value and the second loss function value are obtained, the first loss function value and the second loss function value may be added to obtain the target loss function, or the target loss function value may be determined by the following calculation formula:
L=aL1+bL2
Where L represents the objective loss function value, a represents the weight coefficient of the first loss function value, b represents the weight coefficient of the second loss function value, and specific values of a and b may be determined according to the need, which is not limited by the present disclosure.
It should be noted that, in the training process of the text recognition model, the final objective is to determine the parameters of the text recognition model so as to minimize the objective loss function value, that is, the parameters θ of the text recognition model satisfy:
θ=argminθL
wherein argmin represents the value of the parameter θ of the text recognition model when the target loss function reaches the minimum value.
It should be appreciated that for the calculation of the first loss function value, contrast learning is introduced such that the smaller the difference between the first text vector and the first mask vector, and the larger the difference between the first text vector and the second text vector, the smaller the first loss function value, and finally the target loss function value is reduced, so that the text recognition model completes training. Therefore, the optimization of the text recognition model is simpler by introducing contrast learning, the generalization capability of the text recognition model is improved, and the content of the contrast learning can refer to the related technology, so that the disclosure is not repeated here.
In addition, the text recognition model provided by the embodiment of the disclosure calculates the target loss function value through the first loss function value which is determined independent of the topic label and the second loss function value which is determined dependent on the topic label, semi-supervised learning is performed by using consistency regularization, and robustness of the text recognition model is improved.
In other possible implementations, referring to fig. 6, the second text may be masked to obtain a second mask text, and the second mask text may be input into a second encoding network to obtain a second mask vector. In calculating the first loss function value, the second text vector and the second mask vector may be subtracted to obtain a third loss vector, and the second text vector and the first text vector may be subtracted to obtain a fourth loss vector. And then subtracting the Euclidean distance of the third loss vector from the Euclidean distance of the fourth loss vector to obtain a first loss function value. Or selecting the first mask text and the second mask text to participate in the calculation of the first loss function value together, for example, according to the above manner, obtaining a loss function based on the first mask text, obtaining another loss function based on the second mask text, and then taking an average value of the calculation results of the two as the first loss function value, which is not limited in the disclosure.
The target text used in the training process of the text recognition model can take a first text and a second text obtained from two topics in the same communication group as positive samples, and can take a first text and a second text obtained from two topics in different communication groups as negative samples. In general, the number of negative samples is much larger than the number of positive samples, and in the embodiment of the disclosure, in order to obtain a better training effect of the text recognition model, the number of positive samples may be controlled to be one third of the number of negative samples. Moreover, aiming at the topic types at different angles and the requirements of different recognition granularities, the target text can be selectively acquired, so that the recognition granularity and the recognition angle of the text recognition model can be adjusted.
It should be noted that, in the text recognition model in the related art, a large number of labeling samples are generally required to train the text recognition model to obtain a trained text recognition model for topic recognition. However, when the text recognition model is trained, the training of the pre-training model is firstly performed through the existing text similarity recognition model and the disclosed labeling sample, then the parameters of the text recognition model are coarsely adjusted according to the parameters of the pre-training model, then the text recognition model is trained by using a small amount of target texts labeled with topic labels, and the parameters of the text recognition model are finely adjusted on the basis of coarse adjustment. Therefore, the problem of insufficient target text quantity marked with topic labels is solved, marking cost is reduced, and training speed of a text recognition model is increased. Wherein the number of target texts for training the text recognition model can be much smaller than the number of labeling samples for training the text recognition model in the related art.
Based on the same inventive concept, the embodiments of the present disclosure further provide a text recognition method, which includes:
s701, acquiring a text to be identified and a comparison text.
S702, obtaining a topic identification result of whether the text to be identified and the comparison text belong to the same topic type or not through a text identification model.
The text recognition model is obtained through the training method of the text recognition model.
By way of example, taking the issue of a new topic within a communication group as text to be identified, the new topic to be issued may be taken as text to be identified, and then the issued topic is selected as a comparison text within the communication group. Inputting the text to be recognized into a first coding network of the text recognition model to obtain a text vector to be recognized, and inputting the comparison text into a second coding network of the text recognition model to obtain a comparison text vector. Inputting the text vector to be identified and the comparison text vector into a prediction network to obtain a probability value that the text to be identified and the comparison text belong to the same topic type, and if the probability value is larger than a preset probability value, the text to be identified and the comparison text belong to the same topic type, and a new topic to be distributed can be distributed in the communication group.
Or a certain new topic to be distributed is available, but the communication group with the new topic to be distributed suitable for distribution is not determined, the new topic to be distributed can be used as a text to be identified, the distributed topic of each communication group is used as a plurality of comparison texts, the steps of the text identification method are repeated, a target comparison text which belongs to the same topic type as the new topic to be distributed is determined from the plurality of comparison texts, and the communication group with the target comparison text is determined as the communication group with the new topic to be distributed.
By adopting the method, when a new topic is released in a certain communication group, whether the new topic accords with the topic type of the communication group or not can be judged through the text recognition model, or the communication group suitable for releasing is determined for the new topic, and compared with a manual judgment mode, timeliness and timeliness of releasing the new topic are improved.
In a possible implementation manner, since the text recognition model uses the first mask text and the second mask text in the training process, in order to keep the input consistency of the text recognition model and avoid the exception of the finally output prediction result caused by inconsistent input parameters, a first preset value can be automatically input as the first mask text and a second preset value can be automatically input as the second mask text in the actual application process of the text recognition model. Since only the text to be recognized and the comparison text participate in the recognition process of the text recognition model, the first preset value and the second preset value can be any values, and the first preset value and the second preset value do not affect the prediction result, for example, the first preset value and the second preset value can be vectors with all 0, which is not limited in the disclosure.
In a possible implementation manner, multiple topics may be selected as multiple comparison texts in the same communication group, multiple topics which are recently published are generally selected, for example, 20 topics which are recently published are selected as multiple comparison texts, then multiple probability values of a new topic to be published and multiple comparison texts belonging to the same topic type are obtained respectively, an average value of the multiple probability values is taken as a target probability value, and if the target probability value is greater than a preset probability value, the new topic to be published may be published in the communication group. By taking the average probability value, errors of the final prediction result caused by errors of single comparison text are avoided.
Based on the same inventive concept, the embodiment of the present disclosure further provides a training device for a text recognition model, and referring to fig. 8, the training device 800 includes:
A first obtaining module 801, configured to obtain a target text, where the target text includes a first text, a first mask text, and a second text, where the first mask text is obtained by masking the first text, and the first text and the second text are marked with a topic label, and the topic label is used to characterize whether the first text and the second text belong to a same topic type.
The input module 802 is configured to input the first text, the first mask text, and the second text into the text recognition model, obtain a topic prediction result that is output by the text recognition model and is used for representing whether the first text and the second text belong to a same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, where an initialization parameter of the text recognition model is determined based on a parameter of a pre-training model, and the pre-training model is used for recognizing whether two texts are similar.
An adjustment module 803 is configured to determine a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label, and the topic prediction result, and adjust a parameter of the text recognition model based on the target loss function value.
Optionally, the adjusting module 803 includes:
a first penalty module configured to determine a first penalty function value based on the first text vector, the first mask vector, and the second text vector;
the second loss module is used for determining a second loss function value according to the topic label and the topic prediction result;
a target loss module for determining the target loss function value based on the first loss function value and the second loss function value.
Optionally, the first loss module is configured to:
subtracting the first text vector and the first mask vector to obtain a first loss vector, and subtracting the first text vector and the second text vector to obtain a second loss vector;
And subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain the first loss function value.
Optionally, the text recognition model includes a first encoding network, a second encoding network, and a prediction network, and the input module 802 is configured to:
Inputting the first text into the first coding network to obtain a first text vector, inputting the first mask text into the first coding network to obtain a first mask vector, and inputting the second text into the second coding network to obtain a second text vector;
Inputting the first text vector and the second text vector into the prediction network to obtain a topic prediction result used for representing whether the first text and the second text belong to the same topic type.
Optionally, the training device 800 includes an initialization module, where the initialization module is configured to initialize parameters of the first encoding network and parameters of the second encoding network by:
obtaining a pre-training sample, wherein the pre-training sample comprises a first pre-training text and a second pre-training text;
Inputting the pre-training sample into the pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and the second coding network;
And initializing the parameters of the first coding network and the parameters of the second coding network according to the trained parameters of the pre-training coding network of the pre-training model.
Optionally, the prediction network includes a first decoding unit, a second decoding unit, and a discriminating unit, and the input module 802 is configured to:
Inputting the first text vector serving as a key vector and a value vector of the first decoding unit and the second text vector serving as a query vector of the first decoding unit into the first decoding unit for feature processing to obtain a second feature vector;
Inputting the first text vector serving as a key vector and a value vector of the second decoding unit and the second feature vector serving as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector;
And inputting the third feature vector into the judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining the topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
By adopting the training device, firstly, the parameters of the text recognition model are initialized based on the parameters of the pre-training model, then the text recognition model is trained by utilizing the target text marked with the topic label, and finally, the parameters of the text recognition model are further adjusted according to the target loss function value. Since the pre-training model is used for identifying whether two texts are similar, initializing parameters of the text recognition model through the pre-training model is equivalent to performing coarse adjustment on the parameters of the text recognition model. And then training the text recognition model by using the target text marked with the topic label, and fine adjustment can be carried out on parameters of the text recognition model on the basis of coarse adjustment. Therefore, the training speed of the text recognition model can be increased, a small amount of target texts marked with topic labels can be used for training the text recognition model, and the marking cost is reduced. In addition, the robustness of the text recognition model can be improved by performing semi-supervised learning according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result.
Based on the same inventive concept, the embodiments of the present disclosure further provide a text recognition apparatus, referring to fig. 9, the apparatus 900 includes:
the second obtaining module 901 is configured to obtain a text to be identified and a comparison text.
The recognition result module 902 is configured to obtain, through a text recognition model, a topic recognition result of whether the text to be recognized and the comparison text belong to the same topic type, where the text recognition model is obtained through the training method of the text recognition model.
By adopting the device, when a new topic is released in a certain communication group, whether the new topic accords with the topic type of the communication group or not can be judged through the text recognition model, or the communication group suitable for releasing is determined for the new topic, and compared with a manual judgment mode, timeliness and timeliness of releasing the new topic are improved.
Based on the same conception, the embodiments of the present disclosure also provide a computer readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the training method or the text recognition method of any one of the text recognition models described above.
Based on the same concept, the embodiments of the present disclosure also provide an electronic device including:
a storage device having a computer program stored thereon;
and the processing device is used for executing the computer program in the storage device to realize the training method of any text recognition model or the steps of the text recognition method.
Referring now to fig. 10, a schematic diagram of an electronic device 1000 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 10 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
In general, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1007 including, for example, a Liquid Crystal Display (LCD), speaker, vibrator, etc.; storage 1008 including, for example, magnetic tape, hard disk, etc.; and communication means 1009. The communication means 1009 may allow the electronic device 1000 to communicate wirelessly or by wire with other devices to exchange data. While fig. 6 shows an electronic device 1000 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 1009, or installed from the storage device 1008, or installed from the ROM 1002. The above-described functions defined in the method of the embodiment of the present disclosure are performed when the computer program is executed by the processing device 1001.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, communications may be made using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: obtaining a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not; inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result, corresponding to the first text, a first text vector, corresponding to the second text and a first mask vector, corresponding to the first mask text, which are output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-training model, and the pre-training model is used for recognizing whether the two texts are similar; determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.
Or the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a text to be identified and a comparison text; and obtaining a topic identification result of whether the text to be identified and the comparison text belong to the same topic type or not through a text identification model, wherein the text identification model is obtained through the training method of the text identification model provided by any one of the above disclosure.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of the module is not limited to the module itself in some cases, and for example, the first acquisition module may be also described as "a module for acquiring the target text".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In accordance with one or more embodiments of the present disclosure, example 1 provides a training method of a text recognition model, the training method comprising: obtaining a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not; inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result, corresponding to the first text, a first text vector, corresponding to the second text and a first mask vector, corresponding to the first mask text, which are output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-training model, and the pre-training model is used for recognizing whether the two texts are similar; determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.
According to one or more embodiments of the present disclosure, example 2 provides the training method of example 1, the determining the objective loss function value from the first text vector, the first mask vector, the second text vector, the topic tag, and the topic prediction result, comprising: determining a first loss function value from the first text vector, the first mask vector, and the second text vector; according to the topic label and the topic determining a second loss function value according to the prediction result; the objective loss function value is determined based on the first loss function value and the second loss function value.
According to one or more embodiments of the present disclosure, example 3 provides the training method of example 2, the determining a first loss function value from the first text vector, the first mask vector, and the second text vector comprising: subtracting the first text vector and the first mask vector to obtain a first loss vector, and subtracting the first text vector and the second text vector to obtain a second loss vector; and subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain the first loss function value.
According to one or more embodiments of the present disclosure, example 4 provides the training method of any one of examples 1 to 3, the text recognition model includes a first encoding network, a second encoding network, and a prediction network, the inputting the first text, the first mask text, and the second text into the text recognition model, obtaining a topic prediction result, which is output by the text recognition model and is used for characterizing whether the first text and the second text belong to a same topic type, a first text vector corresponding to the first text, a second text vector corresponding to the second text, and a first mask vector corresponding to the first mask text, and includes: inputting the first text into the first coding network to obtain a first text vector, inputting the first mask text into the first coding network to obtain a first mask vector, and inputting the second text into the second coding network to obtain a second text vector; inputting the first text vector and the second text vector into the prediction network to obtain a topic prediction result used for representing whether the first text and the second text belong to the same topic type.
According to one or more embodiments of the present disclosure, example 5 provides the training method of example 4, the parameters of the first encoding network and the parameters of the second encoding network are initialized by: obtaining a pre-training sample, wherein the pre-training sample comprises a first pre-training text and a second pre-training text; inputting the pre-training sample into the pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and the second coding network; and initializing the parameters of the first coding network and the parameters of the second coding network according to the trained parameters of the pre-training coding network of the pre-training model.
According to one or more embodiments of the present disclosure, example 6 provides the training method of example 4, the prediction network includes a first decoding unit, a second decoding unit, and a discriminating unit, the inputting the first text vector and the second text vector into the prediction network, resulting in a topic prediction result for characterizing whether the first text and the second text belong to a same topic type, including: inputting the first text vector serving as a key vector and a value vector of the first decoding unit and the second text vector serving as a query vector of the first decoding unit into the first decoding unit for feature processing to obtain a second feature vector; inputting the first text vector serving as a key vector and a value vector of the second decoding unit and the second feature vector serving as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector; and inputting the third feature vector into the judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining the topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
In accordance with one or more embodiments of the present disclosure, example 7 provides a text recognition method, the method comprising: acquiring a text to be identified and a comparison text; and obtaining a topic recognition result of whether the text to be recognized and the comparison text belong to the same topic type or not through a text recognition model, wherein the text recognition model is obtained through a training method of the text recognition model in any one of examples 1-6.
Example 8 provides a training apparatus of a text recognition model, according to one or more embodiments of the present disclosure, the training apparatus comprising: the first acquisition module is used for acquiring target texts, wherein the target texts comprise a first text, a first mask text and a second text, the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not; the input module is used for inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result, a first text vector, a second text vector and a first mask vector, wherein the topic prediction result is output by the text recognition model and is used for representing whether the first text and the second text belong to the same topic type, the first text vector corresponds to the first text, the second text vector corresponds to the second text and the first mask vector corresponds to the first mask text, the initialization parameter of the text recognition model is determined based on the parameter of a pre-training model, and the pre-training model is used for recognizing whether two texts are similar; and the adjusting module is used for determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result and adjusting parameters of the text recognition model based on the target loss function value.
Example 9 provides a text recognition apparatus according to one or more embodiments of the present disclosure, the apparatus comprising: the second acquisition module is used for acquiring the text to be identified and the comparison text; the recognition result module is configured to obtain, through a text recognition model, a topic recognition result of whether the text to be recognized and the comparison text belong to the same topic type, where the text recognition model is obtained through a training method of the text recognition model according to any one of examples 1 to 6.
According to one or more embodiments of the present disclosure, example 10 provides a computer-readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the method of any one of examples 1 to 7.
Example 11 provides an electronic device according to one or more embodiments of the present disclosure, comprising: a storage device having a computer program stored thereon; processing means for executing the computer program in the storage means to implement the steps of the method of any one of examples 1 to 7.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims (11)

1. A method for training a text recognition model, the method comprising:
Obtaining a target text, wherein the target text comprises a first text, a first mask text and a second text, the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not;
Inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result, corresponding to the first text, a first text vector, corresponding to the second text and a first mask vector, corresponding to the first mask text, which are output by the text recognition model and used for representing whether the first text and the second text belong to the same topic type, wherein initialization parameters of the text recognition model are determined based on parameters of a pre-training model, and the pre-training model is used for recognizing whether the two texts are similar;
Determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result, and adjusting parameters of the text recognition model based on the target loss function value.
2. The training method of claim 1, wherein the determining the objective loss function value from the first text vector, the first mask vector, the second text vector, the topic label, and the topic prediction result comprises:
determining a first loss function value from the first text vector, the first mask vector, and the second text vector;
determining a second loss function value according to the topic label and the topic prediction result;
The objective loss function value is determined based on the first loss function value and the second loss function value.
3. The training method of claim 2, wherein the determining a first loss function value from the first text vector, the first mask vector, and the second text vector comprises:
subtracting the first text vector and the first mask vector to obtain a first loss vector, and subtracting the first text vector and the second text vector to obtain a second loss vector;
And subtracting the Euclidean distance of the first loss vector from the Euclidean distance of the second loss vector to obtain the first loss function value.
4. A training method as claimed in any one of claims 1 to 3, wherein said text recognition model comprises a first encoding network, a second encoding network and a prediction network, said inputting said first text, said first mask text and said second text into said text recognition model, obtaining a topic prediction result outputted by said text recognition model for characterizing whether said first text and said second text belong to the same topic type, a first text vector corresponding to said first text, a second text vector corresponding to said second text and a first mask vector corresponding to said first mask text, comprises:
Inputting the first text into the first coding network to obtain a first text vector, inputting the first mask text into the first coding network to obtain a first mask vector, and inputting the second text into the second coding network to obtain a second text vector;
Inputting the first text vector and the second text vector into the prediction network to obtain a topic prediction result used for representing whether the first text and the second text belong to the same topic type.
5. The training method of claim 4, wherein the parameters of the first encoding network and the parameters of the second encoding network are initialized by:
obtaining a pre-training sample, wherein the pre-training sample comprises a first pre-training text and a second pre-training text;
Inputting the pre-training sample into the pre-training model for training, wherein the pre-training model is used for judging whether the first pre-training text is similar to the second pre-training text, and the pre-training model comprises a pre-training coding network, wherein the structure of the pre-training coding network is the same as that of the first coding network and the second coding network;
And initializing the parameters of the first coding network and the parameters of the second coding network according to the trained parameters of the pre-training coding network of the pre-training model.
6. The training method of claim 4, wherein the prediction network comprises a first decoding unit, a second decoding unit, and a discrimination unit, wherein the inputting the first text vector and the second text vector into the prediction network results in a topic prediction result for characterizing whether the first text and the second text belong to a same topic type, comprises:
Inputting the first text vector serving as a key vector and a value vector of the first decoding unit and the second text vector serving as a query vector of the first decoding unit into the first decoding unit for feature processing to obtain a second feature vector;
Inputting the first text vector serving as a key vector and a value vector of the second decoding unit and the second feature vector serving as a query vector of the second decoding unit into the second decoding unit for feature processing to obtain a third feature vector;
And inputting the third feature vector into the judging unit to obtain a probability value that the first text and the second text belong to the same topic type, and obtaining the topic prediction result based on the probability value that the first text and the second text belong to the same topic type.
7. A method of text recognition, the method comprising:
Acquiring a text to be identified and a comparison text;
obtaining a topic recognition result of whether the text to be recognized and the comparison text belong to the same topic type or not through a text recognition model, wherein the text recognition model is obtained through the training method of the text recognition model according to any one of claims 1-6.
8. A training device for a text recognition model, the training device comprising:
The first acquisition module is used for acquiring target texts, wherein the target texts comprise a first text, a first mask text and a second text, the first mask text is obtained by masking the first text, the first text and the second text are marked with topic labels, and the topic labels are used for representing whether the first text and the second text belong to the same topic type or not;
The input module is used for inputting the first text, the first mask text and the second text into the text recognition model to obtain a topic prediction result, a first text vector, a second text vector and a first mask vector, wherein the topic prediction result is output by the text recognition model and is used for representing whether the first text and the second text belong to the same topic type, the first text vector corresponds to the first text, the second text vector corresponds to the second text and the first mask vector corresponds to the first mask text, the initialization parameter of the text recognition model is determined based on the parameter of a pre-training model, and the pre-training model is used for recognizing whether two texts are similar;
And the adjusting module is used for determining a target loss function value according to the first text vector, the first mask vector, the second text vector, the topic label and the topic prediction result and adjusting parameters of the text recognition model based on the target loss function value.
9. A text recognition device, the device comprising:
the second acquisition module is used for acquiring the text to be identified and the comparison text;
The recognition result module is used for obtaining a topic recognition result of whether the text to be recognized and the comparison text belong to the same topic type or not through a text recognition model, wherein the text recognition model is obtained through the training method of the text recognition model according to any one of claims 1-6.
10. A computer readable medium on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-7.
11. An electronic device, comprising:
a storage device having a computer program stored thereon;
Processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-7.
CN202210283937.9A 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device Active CN114626551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210283937.9A CN114626551B (en) 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210283937.9A CN114626551B (en) 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device

Publications (2)

Publication Number Publication Date
CN114626551A CN114626551A (en) 2022-06-14
CN114626551B true CN114626551B (en) 2024-08-02

Family

ID=81904141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210283937.9A Active CN114626551B (en) 2022-03-21 2022-03-21 Training method of text recognition model, text recognition method and related device

Country Status (1)

Country Link
CN (1) CN114626551B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628177B (en) * 2023-05-22 2023-11-14 福建省网络与信息安全测评中心 Interactive data processing method and system for network security platform
CN117668563B (en) * 2024-01-31 2024-04-30 苏州元脑智能科技有限公司 Text recognition method, text recognition device, electronic equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036146A (en) * 2020-08-25 2020-12-04 广州视源电子科技股份有限公司 Comment generation method and device, terminal device and storage medium
CN113553858A (en) * 2021-07-29 2021-10-26 北京达佳互联信息技术有限公司 Training and text clustering of text vector characterization models

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120323968A1 (en) * 2011-06-14 2012-12-20 Microsoft Corporation Learning Discriminative Projections for Text Similarity Measures
US11468239B2 (en) * 2020-05-22 2022-10-11 Capital One Services, Llc Joint intent and entity recognition using transformer models
CN113592593B (en) * 2021-07-29 2023-05-30 平安科技(深圳)有限公司 Training and application method, device, equipment and storage medium of sequence recommendation model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036146A (en) * 2020-08-25 2020-12-04 广州视源电子科技股份有限公司 Comment generation method and device, terminal device and storage medium
CN113553858A (en) * 2021-07-29 2021-10-26 北京达佳互联信息技术有限公司 Training and text clustering of text vector characterization models

Also Published As

Publication number Publication date
CN114626551A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN110288049B (en) Method and apparatus for generating image recognition model
CN114626551B (en) Training method of text recognition model, text recognition method and related device
CN113515942A (en) Text processing method and device, computer equipment and storage medium
CN113140012B (en) Image processing method, device, medium and electronic equipment
CN113033682B (en) Video classification method, device, readable medium and electronic equipment
CN110674349A (en) Video POI (Point of interest) identification method and device and electronic equipment
CN113449070A (en) Multimodal data retrieval method, device, medium and electronic equipment
CN111090993A (en) Attribute alignment model training method and device
CN116894188A (en) Service tag set updating method and device, medium and electronic equipment
CN113033707B (en) Video classification method and device, readable medium and electronic equipment
CN111915689B (en) Method, apparatus, electronic device, and computer-readable medium for generating an objective function
CN116258911A (en) Training method, device, equipment and storage medium for image classification model
CN117150122A (en) Federal training method, device and storage medium of terminal recommendation model
US11763204B2 (en) Method and apparatus for training item coding model
CN115984868A (en) Text processing method, device, medium and equipment
CN113986958B (en) Text information conversion method and device, readable medium and electronic equipment
CN116129003A (en) Digital face generation method and device, storage medium and electronic equipment
CN113222050B (en) Image classification method and device, readable medium and electronic equipment
CN111914535B (en) Word recognition method and device, computer equipment and storage medium
CN112417260B (en) Localized recommendation method, device and storage medium
CN117409194A (en) Image semantic segmentation model optimization method and device, electronic equipment and storage medium
CN114510911A (en) Text processing method and device, computer equipment and storage medium
CN114187557A (en) Method, device, readable medium and electronic equipment for determining key frame
CN114140723A (en) Multimedia data identification method and device, readable medium and electronic equipment
CN112287159A (en) Retrieval method, electronic device and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant