CN115759292A - Model training method and device, semantic recognition method and device, and electronic device - Google Patents

Model training method and device, semantic recognition method and device, and electronic device Download PDF

Info

Publication number
CN115759292A
CN115759292A CN202211497733.1A CN202211497733A CN115759292A CN 115759292 A CN115759292 A CN 115759292A CN 202211497733 A CN202211497733 A CN 202211497733A CN 115759292 A CN115759292 A CN 115759292A
Authority
CN
China
Prior art keywords
text
target
semantic
training
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211497733.1A
Other languages
Chinese (zh)
Inventor
陈超凡
方俊
刘超
刘涵宇
汪力
高天昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202211497733.1A priority Critical patent/CN115759292A/en
Publication of CN115759292A publication Critical patent/CN115759292A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a training method of a text feature extraction model, a training method of a semantic recognition model, a semantic recognition method, a semantic recognition device, an electronic device, a storage medium and a program product, and relates to the technical field of artificial intelligence. The training method of the text feature extraction model mainly comprises the following steps: respectively extracting text features of a plurality of text sentence samples in the training sample set to obtain a text vector set; semantic feature extraction is respectively carried out on a plurality of text statement samples in the training sample set to obtain a semantic vector set; determining text similarity between at least two text vectors in the text vector set and semantic similarity between at least two semantic vectors in the semantic vector set; determining a target training sample set from the training sample set; and training the text feature extraction model by using the target training sample set to obtain a trained text feature extraction model.

Description

Model training method and device, semantic recognition method and device, and electronic device
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a training method for a text feature extraction model, a training method for a semantic recognition model, a recognition method, an apparatus, an electronic device, a storage medium, and a program product.
Background
With the continuous development of artificial intelligence technology, the technology of making a machine understand a text sentence input by a user, understand the inherent meaning in the text sentence, and make corresponding feedback becomes more and more mature. In these operations, accurate understanding of semantics and rapidity of feedback become computer intelligence.
For the recognition and understanding of the semantics, natural language processing tools such as: the method comprises the steps of carrying out automatic sequence marking on an input character sequence by a named entity recognition model, attribute marking or a syntax parsing tree and the like, and then sending a marked text into a classification or sequence model to learn potential semantic information.
In the course of implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: the above model needs a large number of training samples to train, and the generation of training samples generally needs to rely on manual labeling work, and then causes the cost of labor high while, the treatment effeciency is low.
Disclosure of Invention
In view of the above, the present disclosure provides a training method for a text feature extraction model, a training method for a semantic recognition model, a semantic recognition method, a semantic recognition device, an electronic device, a storage medium, and a program product.
One aspect of the present disclosure provides a training method for a text feature extraction model, including:
respectively extracting text features of a plurality of text sentence samples in a training sample set to obtain a text vector set, wherein text vectors in the text vector set correspond to the text sentence samples in the training sample set one by one;
performing semantic feature extraction on a plurality of text statement samples in the training sample set respectively to obtain a semantic vector set, wherein semantic vectors in the semantic vector set correspond to the text statement samples in the training sample set one by one;
determining text similarity between at least two text vectors in the text vector set and semantic similarity between at least two semantic vectors in the semantic vector set;
determining a target training sample set from the training sample set, wherein the target training sample set comprises a target training sample pair of which the text similarity and the semantic similarity meet a predetermined condition; and
and training a text feature extraction model by using the target training sample set to obtain a trained text feature extraction model.
According to an embodiment of the present disclosure, the training of the text feature extraction model by using the target training sample set to obtain a trained text feature extraction model includes:
inputting the target training sample set into the text feature extraction model to obtain a target sample feature vector set, wherein the target sample feature vectors in the target sample feature vector set comprise feature vectors fused with semantic features and text features;
inputting the target sample feature vector set into a loss function to obtain a loss value, wherein the loss function is constructed based on the feature vector similarity between the target sample feature vector pairs; and
and adjusting parameters of the text feature extraction model based on the loss value to obtain the trained text feature extraction model.
According to an embodiment of the present disclosure, the training method of the text feature extraction model further includes, before the step of inputting the target sample feature vector pair into a loss function to obtain a loss value:
determining a target similarity parameter from a plurality of preset initial similarity parameters;
determining a first loss function based on the target similarity parameter, wherein the first loss function is used for determining the association relationship between the positive target sample feature vector pairs, and the feature vector similarity between the positive target sample feature vector pairs is greater than or equal to a first predetermined vector similarity threshold; and
and determining the loss function based on the first loss function and a second loss function, wherein the second loss function is used for representing the association relationship between the negative target sample feature vector pairs, and the feature vector similarity between the negative target sample feature vector pairs is smaller than the first preset vector similarity threshold.
According to an embodiment of the present disclosure, the training method of the text feature extraction model further includes, before the text feature extraction is performed on each of the plurality of text statement samples in the training sample set to obtain the text vector set:
determining respective text lengths of a plurality of initial text sentence samples to obtain a plurality of text lengths; and
and determining the training sample set from the plurality of initial text sentence samples based on the plurality of text lengths and a predetermined text length threshold.
According to an embodiment of the present disclosure, the training method of the text feature extraction model further includes:
and under the condition that the text similarity is determined to be greater than a preset text similarity threshold value and the semantic similarity is determined to be less than a preset semantic similarity threshold value, determining that the text similarity and the semantic similarity meet the preset condition.
Another aspect of the present disclosure provides a training method for a semantic recognition model, including:
inputting a plurality of texts to be labeled into a text feature extraction model to obtain a plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one, wherein the text feature extraction model is trained by utilizing a training method of the text feature model;
clustering the plurality of sample feature vectors to obtain a cluster set, wherein the plurality of target sample feature vectors of the cluster set are all feature vectors of the same category;
determining a plurality of target texts to be labeled from the plurality of texts to be labeled based on a plurality of target sample feature vectors of the cluster set, wherein the plurality of target texts to be labeled correspond to the plurality of target sample feature vectors one to one;
labeling the texts to be labeled of the plurality of targets to obtain labeling results of the texts to be labeled of the plurality of targets which are the same; and
generating a plurality of target samples based on the plurality of target texts to be labeled and the labeling result, wherein the plurality of target samples correspond to the plurality of target texts to be labeled one by one;
and training a semantic recognition model by using the target sample to obtain the trained semantic recognition model.
According to an embodiment of the present disclosure, the clustering the plurality of sample feature vectors to obtain a cluster set includes:
calculating the similarity between any two sample feature vectors in the plurality of sample feature vectors to obtain a plurality of clustering feature vector similarities; and
and clustering the plurality of sample feature vectors based on the similarity of the plurality of clustering feature vectors and a second vector similarity threshold to obtain the clustering set.
According to an embodiment of the present disclosure, the training method of the semantic recognition model further includes, before the step of inputting the plurality of texts to be labeled into the text feature extraction model and obtaining the plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one:
determining respective text types of a plurality of initial texts to be labeled; and
and determining the plurality of texts to be marked from the plurality of initial texts to be marked based on the text type.
Another aspect of the present disclosure provides a semantic recognition method, including:
inputting the text to be recognized into the semantic recognition model to obtain a semantic recognition result,
the semantic recognition model is obtained by training by using a training method of the semantic recognition model.
Another aspect of the present disclosure provides a training apparatus for a text feature extraction model, including:
a text vector set obtaining module, configured to perform text feature extraction on each of a plurality of text sentence samples in a training sample set to obtain a text vector set, where text vectors in the text vector set correspond to the text sentence samples in the training sample set one to one;
a semantic vector set obtaining module, configured to perform semantic feature extraction on each of a plurality of text statement samples in the training sample set to obtain a semantic vector set, where semantic vectors in the semantic vector set correspond to the text statement samples in the training sample set one to one;
a similarity determining module, configured to determine a text similarity between at least two text vectors in the text vector set and a semantic similarity between at least two semantic vectors in the semantic vector set;
a target training sample set determining module, configured to determine a target training sample set from the training sample set, where the target training sample set includes a target training sample pair whose text similarity and semantic similarity satisfy a predetermined condition; and
and the text feature extraction model obtaining module is used for training a text feature extraction model by using the target training sample set to obtain a trained text feature extraction model.
Another aspect of the present disclosure provides a training apparatus for a semantic recognition model, including:
the system comprises a sample feature vector obtaining module, a text feature extraction module and a text feature extraction module, wherein the sample feature vector obtaining module is used for inputting a plurality of texts to be labeled into a text feature extraction model to obtain a plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one, and the text feature extraction model is trained by utilizing a training device of the text feature extraction model;
a cluster set obtaining module, configured to cluster the multiple sample feature vectors to obtain a cluster set, where the multiple target sample feature vectors of the cluster set are feature vectors of the same category;
a target text to be labeled determining module, configured to determine, based on a plurality of target sample feature vectors of the cluster set, a plurality of target texts to be labeled from the plurality of texts to be labeled, where the plurality of target texts to be labeled correspond to the plurality of target sample feature vectors one to one;
a labeling result obtaining module, configured to label the multiple target texts to be labeled to obtain labeling results that are the same among the multiple target texts to be labeled; and
a target sample generation module, configured to generate a plurality of target samples based on the plurality of target texts to be labeled and the labeling result, where the plurality of target samples correspond to the plurality of target texts to be labeled one by one;
and the semantic recognition model obtaining module is used for training a semantic recognition model by using the target sample to obtain a trained semantic recognition model.
Another aspect of the present disclosure provides a semantic recognition apparatus, including:
a semantic recognition result obtaining module for inputting the text to be recognized into the semantic recognition model to obtain a semantic recognition result,
the semantic recognition model is obtained by training by using a training device of the semantic recognition model.
Yet another aspect of the present disclosure provides an electronic device including:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.
Yet another aspect of the disclosure provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to implement the method as described above.
Yet another aspect of the disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
According to the embodiment of the disclosure, text feature extraction is respectively carried out on a plurality of text statement samples in a training sample set to obtain a text vector set, wherein text vectors in the text vector set correspond to the text statement samples in the training sample set one by one; semantic feature extraction is respectively carried out on a plurality of text statement samples in the training sample set to obtain a semantic vector set, wherein semantic vectors in the semantic vector set correspond to the text statement samples in the training sample set one by one; determining text similarity between at least two text vectors in the text vector set and semantic similarity between at least two semantic vectors in the semantic vector set; determining a target training sample set from the training sample set, wherein the target training sample set comprises target training sample pairs of which the text similarity and the semantic similarity meet a preset condition; and training the text feature extraction model by using the target training sample set to obtain a trained text feature extraction model technical means, determining the target training sample set by using the text similarity and the semantic similarity, so that the text feature extraction model can learn the features of a target training sample pair meeting a preset condition in the training process, and further the trained text feature extraction model can accurately extract feature vectors in text sentences, and the feature vectors can fuse text features and semantic features.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of the embodiments of the present disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an exemplary system architecture to which the disclosed training methods and apparatus of text feature extraction models may be applied;
FIG. 2 schematically illustrates a flow diagram of a method of training a text feature extraction model according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a flow chart of a method of training a semantic recognition model according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram of a semantic recognition method according to an embodiment of the present disclosure;
FIG. 5 schematically shows a block diagram of a training apparatus for a text feature extraction model according to an embodiment of the present disclosure;
FIG. 6 schematically shows a block diagram of a training apparatus of a semantic recognition model according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a semantic recognition apparatus according to an embodiment of the present disclosure; and
fig. 8 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "A, B and at least one of C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a training method of a text feature extraction model. Respectively extracting text features of a plurality of text statement samples in a training sample set to obtain a text vector set, wherein text vectors in the text vector set correspond to the text statement samples in the training sample set one by one; semantic feature extraction is respectively carried out on a plurality of text statement samples in the training sample set to obtain a semantic vector set, wherein semantic vectors in the semantic vector set correspond to the text statement samples in the training sample set one by one; determining text similarity between at least two text vectors in the text vector set and semantic similarity between at least two semantic vectors in the semantic vector set; determining a target training sample set from the training sample set, wherein the target training sample set comprises target training sample pairs of which the text similarity and the semantic similarity meet preset conditions; and training the text feature extraction model by using the target training sample set to obtain a trained text feature extraction model.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.
In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.
Fig. 1 schematically illustrates an exemplary system architecture 100 to which a training method of a text feature extraction model may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the training method of the text feature extraction model provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the training device of the text feature extraction model provided by the embodiment of the present disclosure may be generally disposed in the server 105. The training method of the text feature extraction model provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the training device for the text feature extraction model provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105.
For example, the training sample set may be originally stored in any of the terminal devices 101, 102, or 103 (e.g., the terminal device 101, but not limited to) or stored on an external storage device and may be imported into the terminal device 101. Then, the terminal device 101 may transmit the training sample set to the server 105, and the server 105 thereof receiving the training sample set performs the training method of the text feature extraction model provided by the embodiment of the present disclosure.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.
Fig. 2 schematically shows a flow chart of a training method of a text feature extraction model according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S201 to S205.
In operation S201, text feature extraction is performed on each of a plurality of text sentence samples in the training sample set, so as to obtain a text vector set.
According to the embodiment of the disclosure, the text vectors in the text vector set correspond to the text sentence samples in the training sample set one by one.
According to the embodiment of the disclosure, a text feature extraction can be respectively performed on a plurality of text sentence samples by using a text conversion method, so as to obtain a text vector set. The text conversion method may include at least one of: bag of words method, N-Gram method, TF-IDF method, word2Vec method. Any text conversion method may be used as long as the text can be converted into a vector, and the description thereof is omitted here.
According to an embodiment of the present disclosure, the text vectors in the text vector set are used to embody the features related to the "character" in the text sentence sample.
In operation S202, semantic feature extraction is performed on each of the text sentence samples in the training sample set to obtain a semantic vector set.
According to the embodiment of the disclosure, semantic vectors in the semantic vector set correspond to text statement samples in the training sample set one by one.
According to the embodiment of the disclosure, semantic features of a plurality of text sentence samples can be respectively extracted by using a semantic extraction method to obtain a semantic vector set. The semantic extraction method may include at least one of: simCSE (Simple contextual Learning of sequence entries), BERT (Bidirectional Encoder replication from transformations), ERNIE (Enhanced Language replication with information Entities). Any semantic extraction method may be used as long as it can extract semantic information in a text, and details are not described herein.
According to the embodiment of the disclosure, the semantic vectors in the semantic vector set are used for embodying the features related to the semantics in the text sentence sample.
In operation S203, a text similarity between at least two text vectors in the text vector set and a semantic similarity between at least two semantic vectors in the semantic vector set are determined.
In operation S204, a target training sample set is determined from the training sample set.
According to an embodiment of the present disclosure, the target training sample set includes a target training sample pair whose text similarity and semantic similarity satisfy a predetermined condition.
In operation S205, the text feature extraction model is trained by using the target training sample set, so as to obtain a trained text feature extraction model.
According to an embodiment of the present disclosure, the text feature extraction model may be a coding model, and may include at least one of: such as Convolutional Neural Networks (Convolutional Neural Networks), recurrent Neural Networks (Recurrent Neural Networks), attention Mechanism (Attention Mechanism). The text feature extraction model is only required to be a text feature extraction model which can encode text statement samples and obtain feature vectors with text features and semantic features fused.
According to an embodiment of the present disclosure, determining a target training sample set including a target training sample pair from a training sample set may refer to: the target training sample set comprises a target training sample pair, and the text similarity and the semantic similarity of the target training sample pair meet preset conditions.
According to an embodiment of the present disclosure, the target training sample pair satisfying the predetermined condition may include: and the text similarity is greater than a preset text similarity threshold, and the semantic similarity is greater than a preset semantic similarity threshold.
For example, the target training sample pair includes a first text sentence sample and a second text sentence sample, the first text sentence sample may be "this skirt is really beautiful", and the second text sentence sample may be "this skirt is really good. The text similarity between the first text sentence sample and the second text sentence sample is greater than a predetermined similarity threshold, and the semantic similarity is greater than the predetermined similarity threshold. But is not limited thereto. The pair of target training samples satisfying the predetermined condition may further include: and the text similarity is greater than a preset text similarity threshold, and the semantic similarity is smaller than a preset semantic similarity threshold. For example, the target training sample pair includes a third text sentence sample and a fourth text sentence sample, the third text sentence sample may be "the quality of the skirt is not poor at all", and the fourth text sentence sample may be "the quality of the skirt is not good at all". The text similarity between the third text sentence sample and the fourth text sentence sample is larger than a preset similarity threshold, and the semantic similarity is smaller than the preset similarity threshold.
According to the embodiment of the disclosure, the target training sample set is determined by using the dual standards of text similarity and semantic similarity, so that the text feature extraction model can learn the features of the target training sample pair meeting the preset conditions in the training process, and further, the trained text feature extraction model can accurately extract the feature vectors in the text sentences, so that the feature vectors can be fused with the text features and the semantic features.
According to an optional embodiment of the present disclosure, in a case where it is determined that the text similarity is greater than a predetermined text similarity threshold and the semantic similarity is less than a predetermined semantic similarity threshold, it is determined that the text similarity and the semantic similarity satisfy a predetermined condition.
According to the embodiment of the disclosure, the training sample pair with the text similarity larger than the preset text similarity threshold and the semantic similarity smaller than the preset semantic similarity threshold is used as the target training sample pair meeting the preset condition, so that the character overlap ratio between the target training sample pair is high, and the semantics are opposite. The text feature extraction model is trained by utilizing the target training sample set comprising the target training sample pairs, so that the text feature extraction model can fully learn features between the target training sample pairs, and further, a feature vector extracted from a text sentence by the trained text feature extraction model can effectively identify and code the text sentence pair with one-word difference, multiple aspects of semantic vectors and text vectors can be fused, and the matching degree between the feature vector and the text sentence is improved, and meanwhile, the accuracy of representing the text sentence is improved.
The method shown in fig. 2 is further described below with reference to specific embodiments.
According to an embodiment of the present disclosure, for operation S205 shown in fig. 2, training a text feature extraction model using a target training sample set to obtain a trained text feature extraction model, the following operations may be included.
For example, the target training sample set is input into the text feature extraction model, and a target sample feature vector set is obtained. And inputting the target sample feature vector set into a loss function to obtain a loss value. And adjusting parameters of the text feature extraction model based on the loss value to obtain a trained text feature extraction model.
According to an embodiment of the present disclosure, a target sample feature vector in a target sample feature vector set includes a feature vector in which semantic features and text features are fused. The loss function is constructed based on the feature vector similarity between the target sample feature vector pair.
According to an embodiment of the present disclosure, adjusting parameters of a text feature extraction model based on a loss value to obtain a trained text feature extraction model may include: and adjusting parameters of the text feature extraction model based on the loss value until a preset training condition is met, and taking the model meeting the preset training condition as a trained text feature extraction model.
According to an embodiment of the present disclosure, the predetermined training condition may include at least one of: and (4) presetting a training round, presetting a training time length and converging the loss value.
According to an embodiment of the present disclosure, the loss function is constructed based on feature vector similarity between a pair of target sample feature vectors. The similarity of the feature vectors between the positive training sample pairs can be drawn closer and the similarity of the feature vectors between the negative training sample pairs can be drawn farther by using the loss function. In addition, the text feature extraction model is trained by using the loss function and combining the target sample feature vector set, so that the accuracy of the trained text feature extraction model for representing the text sentence can be improved.
According to an embodiment of the present disclosure, before inputting the target sample feature vector pair into the loss function to obtain a loss value, the training method of the text feature extraction model may further include the following operations.
For example, the target similarity parameter is determined from a plurality of initial similarity parameters set in advance. A first loss function is determined based on the target similarity parameter. The first loss function is used for determining the incidence relation between the positive target sample feature vector pairs, and the feature vector similarity between the positive target sample feature vector pairs is larger than or equal to a first preset vector similarity threshold value. Based on the first loss function and the second loss function, a loss function is determined. The second loss function is used for representing the incidence relation between the negative target sample feature vector pairs, and the feature vector similarity between the negative target sample feature vector pairs is smaller than the first preset vector similarity threshold value.
According to an embodiment of the present disclosure, the loss function may be as shown in equation (1).
Figure BDA0003962220380000131
Where s represents a temperature hyperparameter, which in the disclosed embodiment may be 64; m represents a target similarity parameter; n represents the number of text sentence samples in the target training sample set; i represents a sample feature vector of the ith text sentence sample in the target training sample set, which may be an anchor sample feature vector, for example;
Figure BDA0003962220380000141
representing cosine values between the sample feature vector of the ith text statement sample and the sample feature vector of the yi text statement sample, wherein the sample feature vector of the yi text statement sample and the sample feature vector of the ith text statement sample are a positive target sample feature vector pair; cos (theta) j,i ) Representing cosine values between the sample feature vector of the jth text statement sample and the sample feature vector of the ith text statement sample; and the sample feature vector of the jth text statement sample and the sample feature vector of the ith text statement sample are a negative target sample feature vector pair.
Figure BDA0003962220380000142
As a first loss function, will
Figure BDA0003962220380000143
Or alternatively
Figure BDA0003962220380000144
As a second loss function. Based on the first loss function and the second loss function, a loss function is determined.
According to an embodiment of the present disclosure, m may be a target similarity parameter. M may be determined from a plurality of initial similarity parameters set in advance. The training difficulty can be increased by using the target similarity parameter, so that the difference of the feature vector similarity between the positive sample feature vector pair and the negative sample feature vector pair and the positive and negative samples is increased, and the collapse of the sample feature vector is avoided when the text statement sample is coded. The loss function is used for forcing the similarity of the feature vectors between the feature vector pairs of the positive target samples to be close, and the similarity of the feature vectors between the feature vector pairs of the negative target samples, which have high coincidence degree at a character level but opposite semantic level, is pulled away aiming at the condition that the quality of the object is not good at all and the quality of the object is not bad at all.
According to the embodiment of the disclosure, before the text feature extraction is performed on each of the plurality of text sentence samples in the training sample set to obtain the text vector set, the training method of the text feature extraction model may further include the following operations.
For example, the text length of each of the plurality of initial text sentence samples is determined, resulting in a plurality of text lengths. A set of training samples is determined from a plurality of initial text sentence samples based on a plurality of text lengths and a predetermined text length threshold.
According to an embodiment of the present disclosure, determining a training sample set from a plurality of initial text sentence samples based on a plurality of text lengths and a predetermined text length threshold may include: the method comprises the steps of screening a plurality of initial text statement samples by using a preset text length threshold value, and filtering the initial text statement samples which do not conform to the preset text length threshold value, so that the problem that the training statement samples in a training sample set are too long in text length due to the fact that the training statement samples contain spaces, invalid characters and the like is solved, and the training sample set can be prevented from containing abnormal data.
FIG. 3 schematically shows a flow chart of a training method of a semantic recognition model according to an embodiment of the present disclosure.
As shown in fig. 3, the method includes operations S301 to S306.
In operation S301, a plurality of texts to be labeled are input into the text feature extraction model, and a plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one are obtained.
According to the embodiment of the present disclosure, the text feature extraction model is trained by using the above-mentioned training method of the text feature extraction model shown in fig. 2.
In operation S302, a plurality of sample feature vectors are clustered to obtain a cluster set.
According to an embodiment of the present disclosure, the plurality of target sample feature vectors of the cluster set are all feature vectors of the same category.
In operation S303, a plurality of target texts to be labeled are determined from the plurality of texts to be labeled based on the plurality of target sample feature vectors of the cluster set.
According to the embodiment of the disclosure, a plurality of target texts to be labeled correspond to a plurality of target sample feature vectors one to one.
In operation S304, a plurality of target texts to be labeled are labeled, so as to obtain a labeling result that the plurality of target texts to be labeled have the same labeling result.
In operation S305, a plurality of target samples are generated based on a plurality of target texts to be labeled and a labeling result.
According to the embodiment of the disclosure, the target samples correspond to the target texts to be labeled one by one.
In operation S306, the semantic recognition model is trained using the target sample, resulting in a trained semantic recognition model.
According to embodiments of the present disclosure, the semantic recognition model may include at least one of: GPT, T5, mT5. But is not limited to this, and may be a pre-trained model based on the above model. The initial model or the pre-trained model may be trained using supervised learning. By using the training method provided by the embodiment of the disclosure, the feature vectors with the same category can be used as a cluster set in a clustering manner, and then unified labeling is performed on a plurality of target texts to be labeled in the cluster set, so that the data labeling amount is reduced, the processing efficiency is improved, and the problems of under-fitting or poor generalization and the like caused by too little data are avoided. Meanwhile, the categories of the texts to be labeled of the targets are determined by determining the similarity among the sample feature vectors, the sample feature vectors are extracted by using a text feature extraction model, and the representation precision of the sample feature vectors can be improved by using the text feature extraction model, so that the clustering precision is improved.
According to an embodiment of the present disclosure, for operation S302 shown in fig. 3, clustering the plurality of sample feature vectors to obtain a cluster set may include the following operations.
For example, the similarity between any two sample feature vectors in the plurality of sample feature vectors is calculated to obtain a plurality of cluster feature vector similarities. And clustering the plurality of sample feature vectors based on the similarity of the plurality of clustering feature vectors and the second vector similarity threshold to obtain a clustering set.
According to the embodiment of the present disclosure, the similarity of the clustering feature vectors may include an euclidean distance, but is not limited thereto, as long as the similarity is determined in a manner that can represent the association relationship between the feature vectors.
According to an embodiment of the present disclosure, a plurality of sample feature vectors in the cluster set are filtered using a second vector similarity threshold. The higher the second vector similarity threshold is, the higher the vector similarity between the plurality of sample feature vectors is, and the lower the second vector similarity threshold is, the lower the vector similarity between the plurality of sample feature vectors is. The number of samples in the cluster set and the quality of the samples may be adjusted by presetting a second vector similarity threshold.
According to the embodiment of the disclosure, by using the clustering method provided by the embodiment of the disclosure, a plurality of target texts to be labeled can be clustered. The plurality of target texts to be annotated can comprise: the computer display screen is high in definition, the computer display is clear, the computer display screen is high in definition, and the like.
According to the embodiment of the disclosure, by the clustering method, the data volume of manual labeling can be reduced, and the manual labeling cost is saved while the generation quality of the target sample is ensured.
According to the embodiment of the present disclosure, before the text to be labeled is input into the text feature extraction model as shown in fig. 3 to obtain the sample feature vectors corresponding to the text to be labeled one by one, the training method of the semantic recognition model may further include the following operations.
For example, the text type of each of a plurality of initial texts to be annotated is determined. And determining a plurality of texts to be labeled from the plurality of initial texts to be labeled based on the text type.
According to an embodiment of the present disclosure, the text type may include at least one of: text length, character type.
According to the embodiment of the disclosure, a plurality of texts to be labeled can be determined from a plurality of initial texts to be labeled based on respective text types and predetermined text types of the plurality of initial texts to be labeled. Determining a plurality of texts to be labeled from the plurality of texts to be labeled based on the text type and the predetermined text type of each of the plurality of texts to be labeled, which may include: the method comprises the steps of screening a plurality of initial texts to be labeled by utilizing a preset text type, and filtering the initial texts to be labeled which do not conform to the preset text type, so that the problem that the texts are too long due to the fact that the texts contain spaces, invalid characters and the like cannot occur in the plurality of texts to be labeled, and therefore the plurality of texts to be labeled can be prevented from containing abnormal data.
Fig. 4 schematically shows a flow chart of a semantic recognition method according to an embodiment of the present disclosure.
As shown in fig. 4, the method includes operations S401 to S402.
In operation S401, a text to be recognized is acquired.
In operation S402, the text to be recognized is input into the semantic recognition model, and a semantic recognition result is obtained.
According to an embodiment of the present disclosure, the semantic recognition method may include the above-described operations S401 and S402, but is not limited thereto, and may further include only the operation S402.
According to the embodiment of the disclosure, the semantic recognition model is obtained by training by using the training method of the semantic recognition model.
According to an embodiment of the present disclosure, the text to be recognized may be an evaluation, experience, satisfaction, or the like for the article. The text to be recognized can be input into the semantic recognition model, and the semantic recognition result of the viewpoint, comment and satisfaction degree of the concerned article can be extracted from the text to be recognized by utilizing the semantic recognition model.
According to the embodiment of the disclosure, because the semantic recognition model is obtained by training with the training method shown in fig. 3, the semantic recognition model has high semantic recognition accuracy, and can effectively and accurately recognize the text to be recognized which has a word difference at a character level and has opposite semantics at a semantic level, so that the application range of the semantic recognition model is enlarged, and the processing efficiency is improved.
Fig. 5 schematically shows a block diagram of a training apparatus of a text feature extraction model according to an embodiment of the present disclosure.
As shown in fig. 5, the training apparatus 500 for text feature extraction model includes a text vector set obtaining module 510, a semantic vector set obtaining module 520, a similarity determining module 530, a target training sample set determining module 540, and a text feature extraction model obtaining module 550.
A text vector set obtaining module 510, configured to perform text feature extraction on each of a plurality of text statement samples in a training sample set to obtain a text vector set, where text vectors in the text vector set correspond to text statement samples in the training sample set one to one.
A semantic vector set obtaining module 520, configured to perform semantic feature extraction on each of the text statement samples in the training sample set to obtain a semantic vector set, where semantic vectors in the semantic vector set correspond to text statement samples in the training sample set one to one.
A similarity determining module 530, configured to determine a text similarity between at least two text vectors in the text vector set and a semantic similarity between at least two semantic vectors in the semantic vector set.
A target training sample set determining module 540, configured to determine a target training sample set from the training sample set, where the target training sample set includes a target training sample pair whose text similarity and semantic similarity satisfy a predetermined condition.
And a text feature extraction model obtaining module 550, configured to train a text feature extraction model using the target training sample set to obtain a trained text feature extraction model.
According to the embodiment of the disclosure, the text feature extraction model obtaining module comprises:
and the target sample feature vector set obtaining submodule is used for inputting the target training sample set into the text feature extraction model to obtain a target sample feature vector set, wherein the target sample feature vectors in the target sample feature vector set comprise feature vectors fused with semantic features and text features.
And the loss value obtaining submodule is used for inputting the target sample feature vector set into a loss function to obtain a loss value, wherein the loss function is constructed on the basis of the feature vector similarity between the target sample feature vector pairs.
And the text feature extraction model obtaining submodule is used for adjusting parameters of the text feature extraction model based on the loss value to obtain a trained text feature extraction model.
According to the embodiment of the present disclosure, the training device of the text feature extraction model further includes:
and the target similarity parameter determining module is used for determining a target similarity parameter from a plurality of preset initial similarity parameters before inputting the target sample feature vector pair into the loss function and obtaining a loss value.
And the first loss function determining module is used for determining a first loss function based on the target similarity parameter, wherein the first loss function is used for determining the association relation between the positive target sample feature vector pairs, and the feature vector similarity between the positive target sample feature vector pairs is greater than or equal to a first preset vector similarity threshold value.
And the loss function determining module is used for determining a loss function based on the first loss function and a second loss function, wherein the second loss function is used for representing the association relationship between the negative target sample feature vector pairs, and the feature vector similarity between the negative target sample feature vector pairs is smaller than a first preset vector similarity threshold value.
According to the embodiment of the disclosure, the training device of the text feature extraction model further comprises:
and the text length determining module is used for respectively extracting text features of the plurality of text sentence samples in the training sample set to obtain the text lengths of the plurality of initial text sentence samples before the text vector set is obtained, so as to obtain the plurality of text lengths.
And the training sample set determining module is used for determining a training sample set from a plurality of initial text sentence samples based on a plurality of text lengths and a predetermined text length threshold value.
According to the embodiment of the present disclosure, the training device of the text feature extraction model further includes:
and the similarity meeting predetermined condition determining module is used for determining that the text similarity and the semantic similarity meet predetermined conditions under the condition that the text similarity is determined to be greater than a predetermined text similarity threshold and the semantic similarity is determined to be less than a predetermined semantic similarity threshold.
Fig. 6 schematically shows a block diagram of a training apparatus of a semantic recognition model according to an embodiment of the present disclosure.
As shown in fig. 6, the training apparatus 600 for semantic recognition model includes a sample feature vector obtaining module 610, a cluster set obtaining module 620, a target text to be labeled determining module 630, a labeling result obtaining module 640, a target sample generating module 650, and a semantic recognition model obtaining module 660.
The sample feature vector obtaining module 610 is configured to input a plurality of texts to be labeled into a text feature extraction model, and obtain a plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one, where the text feature extraction model is trained by using a training apparatus of the text feature extraction model.
A cluster set obtaining module 620, configured to cluster the multiple sample feature vectors to obtain a cluster set, where the multiple target sample feature vectors in the cluster set are feature vectors of the same category.
The target to-be-labeled text determining module 630 is configured to determine multiple target to-be-labeled texts from the multiple to-be-labeled texts based on multiple target sample feature vectors of the cluster set, where the multiple target to-be-labeled texts correspond to the multiple target sample feature vectors one to one.
And a labeling result obtaining module 640, configured to label the multiple target texts to be labeled, so as to obtain labeling results that are the same among the multiple target texts to be labeled.
The target sample generating module 650 is configured to generate a plurality of target samples based on a plurality of target texts to be labeled and labeling results, where the plurality of target samples correspond to the plurality of target texts to be labeled one by one.
And the semantic recognition model obtaining module 660 is used for training the semantic recognition model by using the target sample to obtain a trained semantic recognition model.
According to an embodiment of the present disclosure, the cluster set obtaining module includes:
and the clustering feature vector similarity obtaining submodule is used for calculating the similarity between any two sample feature vectors in the plurality of sample feature vectors to obtain the similarity of the plurality of clustering feature vectors.
And the clustering submodule is used for clustering the plurality of sample feature vectors based on the similarity of the plurality of clustering feature vectors and the second vector similarity threshold value to obtain a clustering set.
According to the embodiment of the present disclosure, the training apparatus of the semantic recognition model further includes:
the text type determining module is used for determining the text types of the initial texts to be labeled before the texts to be labeled are input into the text feature extraction model and a plurality of sample feature vectors corresponding to the texts to be labeled one by one are obtained.
And the text to be labeled determining module is used for determining a plurality of texts to be labeled from a plurality of initial texts to be labeled based on the text type.
Fig. 7 schematically shows a block diagram of a semantic recognition apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the semantic recognition apparatus 700 includes a to-be-recognized text obtaining module 710 and a semantic recognition result obtaining module 720.
And a text to be recognized obtaining module 710, configured to obtain a text to be recognized.
And a semantic recognition result obtaining module 720, configured to input the text to be recognized into a semantic recognition model to obtain a semantic recognition result, where the semantic recognition model is obtained by training with a training device according to the semantic recognition model.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.
For example, any plurality of the text vector set obtaining module 510, the semantic vector set obtaining module 520, the similarity determining module 530, the target training sample set determining module 540, and the text feature extraction model obtaining module 550 may be combined and implemented in one module/unit/subunit, or any one module/unit/subunit thereof may be split into a plurality of modules/units/subunits. Alternatively, at least part of the functionality of one or more of these modules/units/sub-units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit/sub-unit. According to an embodiment of the present disclosure, at least one of the text vector set obtaining module 510, the semantic vector set obtaining module 520, the similarity determining module 530, the target training sample set determining module 540, and the text feature extraction model obtaining module 550 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementation manners of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, at least one of the text vector set obtaining module 510, the semantic vector set obtaining module 520, the similarity determining module 530, the target training sample set determining module 540, and the text feature extraction model obtaining module 550 may be at least partially implemented as a computer program module, which when executed, may perform corresponding functions.
It should be noted that the training device of the text feature extraction model in the embodiment of the present disclosure corresponds to the training method portion of the text feature extraction model in the embodiment of the present disclosure, and the description of the training device portion of the text feature extraction model specifically refers to the training method portion of the text feature extraction model, which is not described herein again.
The training device of the semantic recognition model in the embodiment of the disclosure corresponds to the training method of the semantic recognition model in the embodiment of the disclosure, and the description of the training device of the semantic recognition model refers to the training method of the semantic recognition model, which is not described herein again.
The semantic recognition device in the embodiment of the disclosure corresponds to the semantic recognition method in the embodiment of the disclosure, and the description of the semantic recognition device specifically refers to the semantic recognition method, which is not described herein again.
Fig. 8 schematically shows a block diagram of an electronic device adapted for the above described method according to an embodiment of the present disclosure. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.
In the RAM803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM802, and the RAM803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. The system 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM802 and/or RAM803 described above and/or one or more memories other than the ROM802 and RAM 803.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code being configured to cause the electronic device to implement the method for training a text feature extraction model provided by the embodiments of the present disclosure.
The computer program, when executed by the processor 801, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communication section 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (15)

1. A training method of a text feature extraction model comprises the following steps:
respectively extracting text features of a plurality of text sentence samples in a training sample set to obtain a text vector set, wherein text vectors in the text vector set correspond to the text sentence samples in the training sample set one by one;
performing semantic feature extraction on a plurality of text statement samples in the training sample set respectively to obtain a semantic vector set, wherein semantic vectors in the semantic vector set correspond to the text statement samples in the training sample set one by one;
determining text similarity between at least two text vectors in the text vector set and semantic similarity between at least two semantic vectors in the semantic vector set;
determining a target training sample set from the training sample set, wherein the target training sample set comprises target training sample pairs of which the text similarity and the semantic similarity meet a predetermined condition; and
and training a text feature extraction model by using the target training sample set to obtain a trained text feature extraction model.
2. The method of claim 1, wherein the training a text feature extraction model using the target training sample set to obtain a trained text feature extraction model comprises:
inputting the target training sample set into the text feature extraction model to obtain a target sample feature vector set, wherein the target sample feature vectors in the target sample feature vector set comprise feature vectors fused with semantic features and text features;
inputting the target sample feature vector set into a loss function to obtain a loss value, wherein the loss function is constructed based on feature vector similarity between the target sample feature vector pairs; and
and adjusting parameters of the text feature extraction model based on the loss value to obtain the trained text feature extraction model.
3. The method of claim 2, further comprising, prior to said inputting the target sample feature vector pair into a loss function, resulting in a loss value:
determining a target similarity parameter from a plurality of preset initial similarity parameters;
determining a first loss function based on the target similarity parameter, wherein the first loss function is used for determining the association relationship between the positive target sample feature vector pairs, and the feature vector similarity between the positive target sample feature vector pairs is greater than or equal to a first preset vector similarity threshold; and
and determining the loss function based on the first loss function and a second loss function, wherein the second loss function is used for characterizing the association relationship between the negative target sample feature vector pairs, and the feature vector similarity between the negative target sample feature vector pairs is smaller than the first predetermined vector similarity threshold.
4. The method of claim 1, further comprising, before the performing text feature extraction on each of the plurality of text sentence samples in the training sample set to obtain the text vector set:
determining respective text lengths of a plurality of initial text sentence samples to obtain a plurality of text lengths; and
determining the set of training samples from the plurality of initial text sentence samples based on the plurality of text lengths and a predetermined text length threshold.
5. The method of claim 1, further comprising:
and under the condition that the text similarity is determined to be larger than a preset text similarity threshold value and the semantic similarity is determined to be smaller than a preset semantic similarity threshold value, determining that the text similarity and the semantic similarity meet the preset condition.
6. A training method of a semantic recognition model comprises the following steps:
inputting a plurality of texts to be labeled into a text feature extraction model to obtain a plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one, wherein the text feature extraction model is trained by using the training method according to any one of claims 1 to 5;
clustering the plurality of sample characteristic vectors to obtain a cluster set, wherein the plurality of target sample characteristic vectors of the cluster set are all characteristic vectors of the same category;
determining a plurality of target texts to be labeled from the plurality of texts to be labeled based on a plurality of target sample feature vectors of the cluster set, wherein the plurality of target texts to be labeled correspond to the plurality of target sample feature vectors one to one;
labeling the texts to be labeled of the plurality of targets to obtain labeling results of the texts to be labeled of the plurality of targets, wherein the labeling results are the same; and
generating a plurality of target samples based on the plurality of target texts to be labeled and the labeling result, wherein the plurality of target samples correspond to the plurality of target texts to be labeled one by one;
and training a semantic recognition model by using the target sample to obtain a trained semantic recognition model.
7. The method of claim 6, wherein the clustering the plurality of sample feature vectors to obtain a cluster set comprises:
calculating the similarity between any two sample feature vectors in the plurality of sample feature vectors to obtain a plurality of clustering feature vector similarities; and
and clustering the plurality of sample feature vectors based on the similarity of the plurality of clustering feature vectors and a second vector similarity threshold to obtain the clustering set.
8. The method of claim 6, further comprising, before the inputting the plurality of texts to be labeled into the text feature extraction model, obtaining a plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one:
determining respective text types of a plurality of initial texts to be labeled; and
and determining the plurality of texts to be labeled from the plurality of initial texts to be labeled based on the text type.
9. A method of semantic recognition, comprising:
inputting the text to be recognized into a semantic recognition model to obtain a semantic recognition result,
wherein the semantic recognition model is trained by using the training method according to any one of claims 6 to 8.
10. A training device for a text feature extraction model comprises:
the text vector set obtaining module is used for respectively extracting text features of a plurality of text statement samples in a training sample set to obtain a text vector set, wherein text vectors in the text vector set correspond to the text statement samples in the training sample set one by one;
a semantic vector set obtaining module, configured to perform semantic feature extraction on each of a plurality of text statement samples in the training sample set to obtain a semantic vector set, where semantic vectors in the semantic vector set correspond to the text statement samples in the training sample set one to one;
the similarity determining module is used for determining the text similarity between at least two text vectors in the text vector set and the semantic similarity between at least two semantic vectors in the semantic vector set;
a target training sample set determining module, configured to determine a target training sample set from the training sample set, where the target training sample set includes a target training sample pair whose text similarity and semantic similarity satisfy a predetermined condition; and
and the text feature extraction model obtaining module is used for training a text feature extraction model by using the target training sample set to obtain a trained text feature extraction model.
11. A training apparatus for a semantic recognition model, comprising:
a sample feature vector obtaining module, configured to input a plurality of texts to be labeled into a text feature extraction model, and obtain a plurality of sample feature vectors corresponding to the plurality of texts to be labeled one by one, where the text feature extraction model is trained by using the training apparatus according to any one of claims 10;
a cluster set obtaining module, configured to cluster the multiple sample feature vectors to obtain a cluster set, where the multiple target sample feature vectors of the cluster set are feature vectors of the same category;
a target text to be labeled determining module, configured to determine, based on a plurality of target sample feature vectors of the cluster set, a plurality of target texts to be labeled from the plurality of texts to be labeled, where the plurality of target texts to be labeled correspond to the plurality of target sample feature vectors one to one;
a labeling result obtaining module, configured to label the multiple target texts to be labeled to obtain labeling results that are the same among the multiple target texts to be labeled; and
a target sample generation module, configured to generate a plurality of target samples based on the plurality of target texts to be labeled and the labeling result, where the plurality of target samples correspond to the plurality of target texts to be labeled one by one;
and the semantic recognition model obtaining module is used for training a semantic recognition model by using the target sample to obtain a trained semantic recognition model.
12. A semantic recognition apparatus comprising:
a semantic recognition result obtaining module for inputting the text to be recognized into the semantic recognition model to obtain a semantic recognition result,
wherein the semantic recognition model is trained by the training apparatus according to claim 11.
13. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-9.
14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 9.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 9.
CN202211497733.1A 2022-11-24 2022-11-24 Model training method and device, semantic recognition method and device, and electronic device Pending CN115759292A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211497733.1A CN115759292A (en) 2022-11-24 2022-11-24 Model training method and device, semantic recognition method and device, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211497733.1A CN115759292A (en) 2022-11-24 2022-11-24 Model training method and device, semantic recognition method and device, and electronic device

Publications (1)

Publication Number Publication Date
CN115759292A true CN115759292A (en) 2023-03-07

Family

ID=85338983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211497733.1A Pending CN115759292A (en) 2022-11-24 2022-11-24 Model training method and device, semantic recognition method and device, and electronic device

Country Status (1)

Country Link
CN (1) CN115759292A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842960A (en) * 2023-05-31 2023-10-03 海信集团控股股份有限公司 Feature extraction model training and extracting method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842960A (en) * 2023-05-31 2023-10-03 海信集团控股股份有限公司 Feature extraction model training and extracting method and device

Similar Documents

Publication Publication Date Title
US11062089B2 (en) Method and apparatus for generating information
US11023505B2 (en) Method and apparatus for pushing information
US11151177B2 (en) Search method and apparatus based on artificial intelligence
CN107679039B (en) Method and device for determining statement intention
US9923860B2 (en) Annotating content with contextually relevant comments
US20210390260A1 (en) Method, apparatus, device and storage medium for matching semantics
CN108090351B (en) Method and apparatus for processing request message
JP2021108098A (en) Review information processing method, device, computer apparatus, and medium
CN114861889B (en) Deep learning model training method, target object detection method and device
US11861918B2 (en) Image analysis for problem resolution
WO2022174496A1 (en) Data annotation method and apparatus based on generative model, and device and storage medium
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
US11501655B2 (en) Automated skill tagging, knowledge graph, and customized assessment and exercise generation
US20230114673A1 (en) Method for recognizing token, electronic device and storage medium
CN115759292A (en) Model training method and device, semantic recognition method and device, and electronic device
JP2023544925A (en) Data evaluation methods, training methods and devices, electronic equipment, storage media, computer programs
CN114580383A (en) Log analysis model training method and device, electronic equipment and storage medium
CN113761923A (en) Named entity recognition method and device, electronic equipment and storage medium
CN116048463A (en) Intelligent recommendation method and device for content of demand item based on label management
CN115620726A (en) Voice text generation method, and training method and device of voice text generation model
JP2023554210A (en) Sort model training method and apparatus for intelligent recommendation, intelligent recommendation method and apparatus, electronic equipment, storage medium, and computer program
CN114579876A (en) False information detection method, device, equipment and medium
CN114638221A (en) Business model generation method and device based on business requirements
CN112947928A (en) Code evaluation method and device, electronic equipment and storage medium
CN113705192B (en) Text processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination