CN113806645A

CN113806645A - Label classification system and training system of label classification model

Info

Publication number: CN113806645A
Application number: CN202010537636.5A
Authority: CN
Inventors: 沈大框; 张莹; 陈成才
Original assignee: Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2021-12-17

Abstract

The training system of label classification system and label classification model, the label classification system includes: the system comprises a to-be-processed data acquisition module, a to-be-processed data processing module and a processing module, wherein the to-be-processed data acquisition module is suitable for acquiring to-be-processed data which comprises to-be-processed corpora; the semantic extraction module is suitable for extracting semantic features of the data to be processed; the logic operation module is suitable for performing logic operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed; the numerical value calculation module is suitable for calculating the numerical value of each candidate category label according to the fusion characteristics of the data to be processed so as to represent the association degree of each candidate category label and the corpus to be processed; and the label obtaining module is suitable for obtaining the candidate category labels with the numerical values meeting the preset first selection condition according to the numerical values of the candidate category labels to obtain the category label prediction set. By adopting the scheme, the accuracy of the label classification prediction result can be improved.

Description

Label classification system and training system of label classification model

Technical Field

The embodiment of the specification relates to the technical field of information processing, in particular to a label classification system and a training system of a label classification model.

Background

In the era of explosion of internet information, in order to quickly acquire required information from massive information of the internet, internet information is classified and labeled with a label (Tag) of corresponding classification, and the label is usually represented by a key feature which has strong information relevance and is convenient to identify, so that a user can search and filter the label.

At present, label labeling of internet information usually adopts two modes of manual classification and automatic classification. The manual mode is high in cost and low in efficiency, and cannot meet the growth speed of internet information. A large amount of training data are needed to train the label classification model in the early stage of the automatic classification mode, and the existing label classification model is weak in structural generalization capability and poor in universality, so that the accuracy of a label classification prediction result is low.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a label classification system and a training system of a label classification model, which can improve the accuracy of a label classification prediction result.

An embodiment of the present specification provides a tag classification system, including:

the system comprises a to-be-processed data acquisition module, a to-be-processed data processing module and a processing module, wherein the to-be-processed data acquisition module is suitable for acquiring to-be-processed data which comprises to-be-processed corpora;

the semantic extraction module is suitable for extracting semantic features of the data to be processed;

the logic operation module is suitable for performing logic operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed;

the numerical value calculation module is suitable for calculating the numerical value of each candidate category label according to the fusion characteristics of the data to be processed so as to represent the association degree of each candidate category label and the corpus to be processed;

and the label obtaining module is suitable for obtaining the candidate category labels with the numerical values meeting the preset first selection condition according to the numerical values of the candidate category labels to obtain the category label prediction set.

The embodiment of the invention also provides a label classification system, which comprises:

and the label classification prediction module is suitable for extracting the semantic features of the data to be processed by adopting a preset label classification model, performing logical operation processing on the extracted semantic features and the data to be processed to obtain the fusion features of the data to be processed, calculating the numerical value of each candidate class label for labeling the linguistic data to be processed based on the fusion features of the data to be processed, obtaining the candidate class label of which the numerical value meets a preset first selection condition, and obtaining a class label prediction set.

The embodiment of the invention also provides a training system of the label classification model, which comprises the following steps:

the training data acquisition module is suitable for acquiring training data and a category label real set of the training data, wherein the training data comprises training corpora;

the model training module is suitable for inputting the training data and the category label real set into an initial label classification model to extract semantic features of the training data, performing logical operation on the extracted semantic features and the training data to obtain fusion features of the training data, calculating values of all candidate category labels based on the fusion features to represent the association degree of the candidate category labels and the training corpus, and acquiring candidate category labels of which the values meet preset first selection conditions to obtain a category label prediction set of the training data;

the error calculation module is suitable for performing error calculation on the category label real set and the category label prediction set to obtain a result error value;

the matching module is suitable for determining whether the label classification model meets the training completion condition or not according to the result error value, and determining that the label classification model completes the training when the label classification model meets the training completion condition;

and the model parameter adjusting module is suitable for adjusting the parameters of the label classification model when the label classification model does not accord with the training completion condition.

By adopting the tag classification scheme of the embodiment of the specification, after the data to be processed is obtained, the extracted semantic features of the data to be processed and the extracted semantic information in the semantic features can be fused by performing logic operation on the extracted semantic features of the data to be processed and the data to be processed, so that the influence of semantic feature extraction errors or key semantic information loss on the tag classification prediction result is avoided, the fused features contain rich semantic information, the data to be processed with complicated content or variable sources can be represented, the flexible processing of a single tag task or a multi-tag classification task is facilitated, the numerical value of each candidate class tag can be calculated more accurately, the correct candidate class tag is obtained to represent the classification information existing in the corpus to be processed, and the accuracy of the tag classification result is improved.

Drawings

FIG. 1 is a schematic structural diagram of a tag classification system in an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of another tag classification system in an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a tag classification model in an embodiment of the present specification;

FIG. 4 is a schematic structural diagram of an iteration layer in an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of another tag classification model in an embodiment of the present specification;

FIG. 6 is a schematic structural diagram of another tag classification model in an embodiment of the present specification;

FIG. 7 is a schematic structural diagram of another tag classification model in an embodiment of the present specification;

FIG. 8 is a schematic structural diagram of a training system of a label classification model in an embodiment of the present specification;

fig. 9 is a schematic structural diagram of another tag classification model in an embodiment of this specification.

Detailed Description

As described above, in the era of explosion of internet information, in order to quickly obtain required information from mass information of the internet, internet information is classified and labeled with labels (tags) of corresponding classifications. At present, label labeling of internet information usually adopts two modes of manual work and machine learning.

The manual mode is high in cost and low in efficiency, and cannot meet the growth speed of internet information. In the early stage of the machine learning mode, a large amount of training data is needed to train the label classification model.

However, the conventional label Classification model has weak generalization capability and poor universality, can only perform single label Classification on network information, and cannot efficiently process more complex Multi-label Classification (Multi-label Classification) tasks.

This is because in the multi-label classification task, it is necessary to characterize the content information of a picture or document with a plurality of class labels. Therefore, the preset category label sets are not completely independent, but have certain dependency relationship or mutual exclusion relationship. However, the number of labels often involved in the multi-label classification task is large, and complex association among class labels is formed, so that the multi-label classification task is more difficult to analyze compared with the single-label classification task, the construction difficulty and the training difficulty of a label classification model are increased, and the accuracy of a label classification prediction result is low.

In view of the above problems, embodiments of the present specification provide a tag classification scheme, where after data to be processed is acquired, fusion features of the data to be processed can be obtained by extracting semantic features of the data to be processed and performing logical operation on the extracted semantic features and the data to be processed, so as to calculate values of candidate class tags according to the fusion features of the data to be processed, and obtain a class tag prediction set used for representing classification information in the corpus to be processed.

For the purpose of enabling those skilled in the art to more clearly understand and practice the concepts, implementations and advantages of the embodiments of the present disclosure, detailed descriptions are provided below through specific application scenarios with reference to the accompanying drawings.

Referring to a schematic structural diagram of a tag classification system shown in fig. 1, in an embodiment of the present specification, the tag classification system 100 may include:

the system comprises a to-be-processed data acquisition module 101, which is suitable for acquiring to-be-processed data, wherein the to-be-processed data comprises to-be-processed corpora;

a semantic extraction module 102, adapted to extract semantic features of the data to be processed;

the logical operation module 103 is adapted to perform logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed;

a numerical value calculation module 104, adapted to calculate, according to the fusion feature of the to-be-processed data, a numerical value of each candidate category label for labeling the to-be-processed corpus, so as to represent a degree of association between each candidate category label and the to-be-processed corpus;

the tag obtaining module 105 is adapted to obtain, according to the numerical value of each candidate category tag, a candidate category tag whose numerical value meets a preset first selection condition, so as to obtain a category tag prediction set.

In a specific implementation, the to-be-processed data may include to-be-processed corpora of different language types according to actual situations. For example, the to-be-processed data may include chinese to-be-processed corpus, english to-be-processed corpus, and the like.

The corpus to be processed may be manually input text data, or text data acquired from a public network, or text data acquired from a picture by an Optical Character Recognition (OCR) technique.

In practical application, the data to be processed can be understood as belonging to an actual meaning space which can be understood by human beings, the obtained data to be processed is a group of character strings for a computer, and the computer cannot directly understand language information to be transmitted by the data to be processed, so that the data to be processed can be converted into digital data which can be understood and processed by the computer, and the data to be processed which originally belongs to the actual meaning space is mapped to the digital space where the computer is located.

In specific implementation, the semantic extraction module can perform operations such as combining, sorting, screening and the like on part or all of the data to be processed according to preset feature extraction parameters to obtain features capable of representing semantic information in the data to be processed, namely semantic features, so that a computer can understand language information to be transmitted by the data to be processed. The logic operation module can combine the extracted semantic features and the data to be processed through logic operation according to preset logic operation parameters to obtain semantic information fusion features, namely fusion features.

The number of the semantic features and the fusion features obtained by the feature extraction parameters and the logical operation parameters set according to the actual situation can be one or more.

It can be understood that the number of semantic features may not be consistent with the number of fusion features according to the actually set logical operation manner. For example, when there are multiple semantic features, the logical operation module may perform logical operation on each semantic feature and the data to be processed respectively to obtain multiple fusion features, or perform logical operation on each semantic feature and the data to be processed together to obtain one fusion feature; for another example, when there is one semantic feature, the logic operation module may perform logic operation on the semantic feature and part of data in the data to be processed respectively to obtain a plurality of fusion features, or may perform logic operation on the semantic feature and the data to be processed to obtain one fusion feature.

In specific implementation, a candidate category label set may be preset, including each candidate category label representing classification information, and according to the fusion feature of the to-be-processed data, the association degree between each candidate category label and the to-be-processed corpus may be calculated. Since each candidate class label represents the classification information, the higher the numerical value of the candidate class label is, the stronger the correlation between the classification information represented by the candidate class label and the classification information existing in the corpus to be processed is, and the candidate class label is more suitable for labeling the corpus to be processed.

In a specific implementation, the first selection condition may be set according to an actual situation.

For example, the first selection condition may be: the value is greater than a preset threshold. Namely, the tag obtaining module selects candidate class tags with values larger than a preset threshold value to obtain a class tag prediction set, wherein the class tag prediction set is used for labeling the linguistic data to be processed so as to represent the classification information of the linguistic data to be processed.

For another example, the first selection condition may be: the value is maximum. Namely, the tag obtaining module selects the candidate class tag with the largest value to obtain a class tag prediction set, wherein the class tag prediction set is used for labeling the linguistic data to be processed so as to represent the classification information of the linguistic data to be processed.

By selecting the candidate category label with the numerical value larger than the preset threshold value, the label classification system can perform a multi-label classification task, and by selecting the candidate category label with the maximum numerical value, a single-label classification task can be performed. And performing corresponding classification tasks according to preset selection conditions.

In practical applications, the first selection condition may include: the method comprises the steps of selecting conditions of a preset single-label classification task and selecting conditions of a preset multi-label classification task. The label obtaining module can obtain the selection condition of the corresponding label classification task through the received classification instruction. If a single label classification instruction is received, the label obtaining module can obtain the selection condition of the single label classification task, so that single label classification processing is realized; and after receiving the multi-label classification instruction, the label acquisition module can acquire the selection condition of the multi-label classification task, so that multi-label classification processing is realized.

By adopting the scheme, the extracted semantic features of the data to be processed and the extracted semantic information in the semantic features can be fused by performing logical operation on the extracted semantic features of the extracted data to be processed, so that the influence on the label classification prediction result caused by semantic feature extraction errors or key semantic information loss is avoided, the fused features contain rich semantic information, the data to be processed with complex content or variable sources can be represented, the flexible processing of a single-label task or a multi-label classification task is facilitated, the numerical value of each candidate class label can be calculated more accurately, the correct candidate class label is obtained to represent the classification information existing in the corpus to be processed, and the accuracy of the label classification result is improved.

In specific implementation, the semantic extraction module may extract semantic features of the data to be processed according to preset feature extraction parameters, all semantic features may not be extracted through one group of feature extraction parameters, and the semantic features extracted by the semantic extraction module may not reflect all semantic information included in the data to be processed due to a limited extraction range.

In an embodiment of the present specification, three sets of feature extraction parameters may be preset, and the semantic extraction module may obtain, according to each preset set of feature extraction parameters, a feature extraction function, that is, a feature extraction function F, respectively mapping the data to be processed into semantic features₁、F₂And F₃. Feature extraction based function F₁，F₂And F₃Respectively obtaining semantic features A of the data to be processed₁＝F₁(x)、A₂＝F₂(x) And A₃＝F₃(x) Wherein x represents the data to be processed. Based on preset logical operation parameters, the logical operation module performs semantic feature A on each group₁、A₂And A₃And carrying out logic operation on the data x to be processed to obtain fusion characteristics.

By adopting the scheme, the semantic features with different granularities can be extracted from the data to be processed by setting different feature extraction parameters, so that the extracted semantic features have diversity and universality, more semantic information contained in the data to be processed can be transmitted through the semantic features with different granularities, the capability of fusing the data to be processed with complicated content or variable sources of feature representation is enhanced, and the generalization capability and the universality of accurately predicting the different data to be processed are improved.

In specific implementation, the closer the semantic information transmitted by the fusion features on the digital space is to the semantic information contained in the data to be processed, the stronger the ability of the fusion features to represent the data to be processed is, and the higher the accuracy is. When each group of semantic features and the data to be processed are subjected to logic operation, the logic operation module performs weighted logic operation on the semantic features and the data to be processed by setting different weight coefficients and offset coefficients, wherein the weight coefficients can be set according to actual situations, and different fusion features can be obtained.

In an alternative example, as shown in fig. 1, the logic operation module 1103 may include:

a weight distribution submodule 1031 adapted to set different weight coefficients and offset coefficients for each set of semantic features and the data to be processed;

and the weighting calculation submodule 1032 is suitable for performing weighted logic operation on the semantic features of each group and the data to be processed according to the distributed weight coefficients.

Therefore, through the weighted logic operation, the importance degree of various semantic features and the data to be processed in the logic operation can be controlled, the accuracy of the logic operation result is improved, the accuracy of the fusion feature representation data to be processed is enhanced, and the reliability of the label classification prediction result is improved.

In a specific implementation, in order to quickly and reliably obtain the weight coefficients, the weight distribution sub-module may input at least one group of semantic features into a preset nonlinear function to perform nonlinear mapping processing, and distribute weight coefficients to other groups of semantic features and the data to be processed according to the processing result, and the weighting calculation sub-module performs weighted logic operation on the other groups of semantic features and the data to be processed according to the distributed weight coefficients.

For example, the semantic extraction module obtains three sets of semantic features A₁、A₂And A₃The weight assignment submodule assigns the semantic feature A₁Input non-linear function F₄The calculation result F can be obtained₄(A₁) Will F₄(A₁) Input weight coefficient calculation function F₅、F₆And F₇To obtain a weight coefficient a₁＝F₅[F₄(A₁)]、a₂＝F₆[F₄(A₁)]And a₃＝F₇[F₄(A₁)]The weight calculation submodule is then based on the assigned weight coefficient a₁、a₂And a₃Obtaining a fused feature computation function F that maps semantic features to fused features₈As semantic features A of other groups₂And A₃And the data x to be processed is assigned with a weight coefficient a₁、a₂And a₃The semantic features A of other groups₂And A₃And the data x to be processed is input into a fusion characteristic calculation function F₈Performing a weighted logical operation F₈(a₁ x，a₂A₂，a₃A₃) And obtaining the fusion characteristics.

Therefore, the weight coefficient is acquired through the semantic features, the weight acquisition efficiency can be improved, and the reliability of the weight coefficient can be improved.

In a specific implementation, in order to highlight the key semantic information and facilitate subsequent numerical calculation, iterative optimization may be performed on the fusion features, as shown in fig. 1, the tag classification system 100 may further include:

the iteration module 106 is positioned between the logic operation module 103 and the numerical value calculation module 104, and is suitable for acquiring the fusion features of the current round after the preset iteration conditions are met, extracting the semantic features of the fusion features, and performing logic operation on the semantic features extracted from the fusion features and the fusion features to obtain the fusion features after iteration; and after the situation that the iteration condition is not met is determined, taking the iterated fusion feature as the fusion feature of the data to be processed so as to determine the numerical value of each candidate class label.

The iteration condition may be set as an iteration number threshold, or may be set as another condition. The first round of acquired fusion features is fusion features obtained through processing of the logic operation module, and after the fusion features are determined to meet the preset iteration conditions, the subsequently acquired fusion features are fusion features obtained through processing of the iteration module.

It can be understood that the feature extraction parameters for the iteration module may be the same as or different from the feature extraction parameters for the semantic extraction module; similarly, the logical operation parameters for the iteration module may be the same as or different from the logical operation parameters for the logical operation module, and the embodiment of the present specification is not limited thereto.

By adopting the scheme, the iterative fusion features can highlight key semantic information more by performing semantic extraction and logical operation on the fusion features, so that the characterization capability of the fusion features is enhanced, and the accuracy of the label classification result is improved.

In specific implementation, a plurality of groups of feature extraction parameters for extracting fusion features may be preset, the iteration module extracts semantic features of the fusion features according to the feature extraction parameters of each group, to obtain semantic features of each group based on the fusion features, and then performs logical operation on the semantic features of each group based on the fusion features and the fusion features, to obtain the fusion features after iteration.

In a specific implementation, the iteration module may perform weighted logic operation on each group of semantic features based on the fusion features and the fusion features, where the method for obtaining the weight coefficient may refer to the related embodiments described above, and details are not described here.

In a specific implementation, in order to convert the data to be processed into information recognizable by a computer, the data to be processed acquisition module may extract semantic features of the data to be processed before the semantic features are extractedAnd dividing the data to obtain corresponding sequences to be processed. According to different application scenarios and different language types, the data to be processed can adopt different division methods to obtain corresponding data sequences. For convenience of explanation, the minimum component that can be divided according to a preset requirement may be referred to as a division unit. Thus, the division processing can divide the data x to be processed into n division units x₁，x₂……x_n。

For example, the data to be processed includes corpus to be processed in Chinese: { hello. And the to-be-processed data acquisition module can divide the to-be-processed corpus into { you/s/good/by adopting a dividing mode of characters and punctuation marks. Where "you", "people", "good". "all are dividing units of the linguistic data to be processed; the module for acquiring the data to be processed can also divide the corpus to be processed into { your/good/by adopting a division mode of words and punctuations. }, where "your", "good", "ok". All the units are the dividing units of the linguistic data to be processed.

It is to be understood that the "/" is merely used to illustrate the effect after the division, and is not a symbol actually existing after the division, and other symbol interval division units may also be used after the division, and the symbol of the interval division unit is not specifically limited in this embodiment of the description.

It should be noted that "{ }" herein is only used to limit the content range of the examples, and is not an essential part in representing the content of the corpus, and those skilled in the art can use other symbols that are not easily confused to limit the content range of the corpus, and the "{ }" is the same as above.

In specific implementation, the richer the sequence information contained in the data to be processed, the more accurate the semantic features can be extracted. Therefore, before extracting semantic features of to-be-processed data, based on a semantic structure of the to-be-processed corpus, the to-be-processed data obtaining module may identify attribute information of the to-be-processed corpus, and obtain a candidate attribute tag corresponding to the attribute information from a preset candidate attribute tag set to obtain an attribute tag sequence, where the to-be-processed data may further include: and the attribute label sequence, and the dividing unit of the attribute label sequence can be an attribute label.

Wherein the attribute information may include: at least one of position information of each dividing unit in the corpus to be processed and grammar information of the corpus to be processed; the syntax information may include: at least one of part-of-speech information and punctuation information. Accordingly, the attribute tag sequence obtained from the corpus to be processed may include: at least one of a sequence of position tags and a sequence of grammar tags; the sequence of grammar tags may include: at least one of part-of-speech tags and punctuation tags.

The following is a detailed description of several specific embodiments.

In an embodiment of the present specification, the presetting of a candidate location tag set may include: and the position label corresponds to each position information. After the to-be-processed corpus is divided by the to-be-processed data acquisition module, identifying position information existing in the to-be-processed corpus to obtain position information of each dividing unit, and marking corresponding position labels at each dividing unit according to the distribution positions of the dividing units in the to-be-processed corpus to obtain a position label sequence. For example, the corpus to be processed is: { hello. }, the corresponding position tag sequence may be: { 1234 }, where "1", "2", "3", and "4" are location tags indicating first, second, third, and fourth location information, respectively.

In another embodiment of the present specification, a set of candidate grammar tags is preset, which may include: and grammar labels corresponding to the grammar information. After the to-be-processed data acquisition module identifies the grammar information in the to-be-processed corpus, the grammar information of each division unit can be obtained, and corresponding grammar labels can be marked at each division unit according to the grammar information of each division unit.

The grammar tag further can include: punctuation labels and part-of-speech labels. Wherein, the punctuation mark label can be marked on the punctuation mark corresponding to the punctuation mark information; part-of-speech tags may include: the initial position labels of the part of speech information are marked at the initial word segmentation units corresponding to the part of speech information, and the non-initial position labels of the part of speech information are marked at the non-initial word segmentation units corresponding to the part of speech information.

Through the tag combination of the initial position tag and the non-initial position tag of each part of speech information, the linguistic data to be processed can be uniformly marked, the initial position and the ending position of the part of speech information in the linguistic data to be processed are obtained, and corresponding tags are marked on each dividing unit of the linguistic data to be processed by combining punctuation marks, so that the obtained grammar tag sequence can fully embody the grammar information of the linguistic data to be processed.

For example, the corpus to be processed may be: the term "leave" refers to singing and composing music by Zhang Yu. }

Then, according to the candidate grammar tag set, the following grammar tag sequences can be obtained:

{W-B NW-B NW-I W-B V-B P-B NR-B NR-I V-B V-I W-B V-B V-I W-B}。

wherein "W-B" represents a punctuation mark label; "NW-B" and "NW-I" denote the starting position tag and non-starting position tag, respectively, of a work noun; "P-B" represents the start position tag of a preposition; "NR-B" and "NR-I" represent the start position tag and the end position tag, respectively, of a person's name; "V-B" and "V-I" denote the start position tag and the end position tag, respectively, of the verb.

By adopting the scheme, the corresponding attribute tag sequence can be obtained according to the attribute information in the processed corpus, and due to the co-occurrence characteristic of the corpus to be processed and the attribute tag sequence, the added attribute tag sequence can not damage the semantic information of the corpus to be processed, and the sequence information contained in the data to be processed can be enriched.

In a specific implementation, in order to extract semantic features of data to be processed more accurately, as shown in fig. 1, the tag classification system 100 may further include: after obtaining the attribute sequence tags, the data combination module 107 may combine the corpus to be processed and the attribute tag sequence to obtain combined data to be processed, so as to extract semantic features and perform logical operation processing. Wherein, the Concat function can be adopted for combination processing.

By adopting the scheme, the semantic information of the attribute dimension can be extracted after the linguistic data to be processed and the attribute label sequence are combined, the subsequently processed features also contain the semantic information of the attribute dimension, the dimensionality of the semantic information in the semantic features and the fusion features is expanded, and the numerical value of each candidate category label can be more accurately calculated by combining the multi-dimensional semantic information.

In specific implementation, the fusion features can be represented by numerical values, vectors or matrices according to preset logical operation parameters, and cannot be in one-to-one correspondence with a preset candidate class label set.

In order to solve the above problem, as shown in fig. 1, the numerical calculation module 104 is adapted to generate a fused feature vector according to a fused feature of the data to be processed, a dimension of the fused feature vector is consistent with a total number of candidate category tags in a preset candidate category tag set, and a numerical value of each element in the fused feature vector represents a degree of association between a corresponding candidate category tag and the corpus to be processed.

For example, the preset candidate category label set is LB ═ LB₁，lb₂，lb₃And if so, obtaining a fused feature vector RX ═ RX through the feature vector generation function₁，rx₂，rx₃And d, wherein the dimension of the fusion feature vector is 3, and is consistent with the total number of candidate category labels in a preset candidate category label set. Fusing elements rx in feature vectors₁，rx₂，rx₃Respectively characterizing candidate class labels lb₁，lb₂，lb₃And the numerical value of the linguistic data to be processed is marked.

Correspondingly, the tag obtaining module 105 is adapted to determine a distribution position where an element meeting a preset first selection condition is located in the fusion feature vector, obtain a candidate category tag corresponding to the distribution position in a preset candidate category tag set, and obtain the category tag prediction set.

By adopting the scheme, the fusion feature vector with the dimension consistent with the total number of the candidate class labels is generated, so that the fusion feature vector is convenient to correspond to the distribution position of each candidate class label, and the method is favorable for accurately acquiring the candidate class labels meeting the first selection condition in the candidate class label set.

In a specific implementation, when a fusion feature vector is generated according to the fusion feature of the data to be processed, the numerical calculation module performs dimension transformation processing on the fusion feature to realize dimension reduction, performs conversion processing on the vector subjected to dimension reduction, and converts the numerical value of each element in the vector to a specified interval, so as to conveniently set a first selection condition and select a candidate class label meeting the condition.

Specifically, the numerical calculation module generates a feature vector generation function composed of parameters through a preset feature vector, inputs the fusion feature into the feature vector generation function, and performs data dimension transformation processing on the fusion feature through the feature vector generation function to obtain a q-dimensional feature transformation vector, wherein q is the total number of candidate class labels.

As an optional example, the numerical value calculation module performs nonlinear conversion processing on the q-dimensional feature transformation vector, and can convert the numerical value of each element in the q-dimensional feature transformation vector into a specified numerical value interval, so that the feature transformation vector after the nonlinear conversion processing is used as a fusion feature vector, the dimension of the fusion feature vector is consistent with the total number of candidate category labels in a preset candidate category label set, and the element in the fusion feature vector represents each candidate category label for labeling the numerical value of the corpus to be processed.

The numerical calculation module can adopt numerical calculation functions such as Sigmoid and the like to perform nonlinear conversion processing on the feature transformation vector. The Sigmoid numerical calculation function can convert the numerical value of each element in the q-dimensional feature transformation vector into a numerical value interval [0,1], the numerical value of each element in the q-dimensional feature transformation vector is independently calculated, and the feature transformation vector after nonlinear conversion can be used for a multi-label classification task.

As another optional example, the numerical value calculation module may perform probability conversion processing on the q-dimensional feature transformation vector, convert the numerical value of each element in the q-dimensional feature transformation vector into a specified probability interval, thereby using the feature transformation vector after the probability conversion processing as a fusion feature vector, where a dimension of the fusion feature vector is consistent with a total number of candidate category labels in a preset candidate category label set, and an element in the fusion feature vector represents each candidate category label for labeling the numerical value of the corpus to be processed.

The numerical calculation module can perform probability conversion processing on the feature transformation vector by adopting numerical calculation functions such as Softmax and the like. The Softmax numerical calculation function can convert numerical values of all elements in the q-dimensional feature transformation vector into a probability interval [0,1], the numerical values of all elements in the q-dimensional feature transformation vector are mutually constrained, the sum of the numerical values of all elements in the feature transformation vector after probability conversion processing is 1, and the Softmax numerical calculation function can be used for single-label classification tasks.

In a specific implementation, if the attribute information existing in the corpus to be processed is identified by a tag classification system, and corresponding candidate attribute tags are labeled at each dividing unit of the corpus to be processed, the processing parameters may further include: the attributes identify parameters.

In a specific implementation, in order to convert the data to be processed into information that can be recognized by a computer, the data to be processed acquisition module may further perform Embedding (Embedding) processing on the data to be processed before extracting semantic features of the data to be processed, and vectorize a dividing unit of the data to be processed. Specifically, each partition unit in the corpus to be processed and each candidate attribute tag in the attribute tag sequence may be characterized in a vector manner, so that both the corpus to be processed and the attribute tag sequence may be characterized in a matrix manner. The processing parameters may further include: embedding process parameters for implementing the embedding process.

By adopting the scheme, through vectorization of each dividing unit and each candidate attribute tag, a matrix with higher accuracy can be obtained, the linguistic data to be processed and the attribute tag sequence in the matrix form are convenient for subsequent feature extraction and logic operation, and the data processing efficiency is improved.

In specific implementation, the vector obtained after the embedding processing is a static vector, and the static vector has no polysemy, so that the to-be-processed data acquisition module can also encode the to-be-processed data and convert the static vector into a dynamic vector, so that the to-be-processed data can be changed according to the context information of the corpus, has polysemy, and then performs semantic feature extraction on the encoded to-be-processed data. The processing parameters may further include: and encoding the processing parameters.

In a specific implementation, as shown in fig. 1, the tag classification system 100 may further include: the parameter obtaining module 110 is adapted to obtain a preset processing parameter, and configure a module of the tag classification system according to the processing parameter.

For example, the processing parameters may include: the parameter acquisition module can configure a semantic extraction module, a logic operation module and a numerical calculation module according to the processing parameters. In a specific implementation, as shown in fig. 1, the tag classification system 100 may further include: a parameter training module 109, adapted to adjust an initial processing parameter according to preset training data, a class label real set of the training data, and a preset Loss Function (Loss Function), and take the adjusted processing parameter as a preset processing parameter; the loss function is established based on label classification prediction results of the training data, the training data comprising: training corpus, wherein the category label real set of the training data comprises: and actually labeling the candidate category label of the linguistic data to be processed.

By adopting the scheme, the processing parameters are adjusted through training, so that the numerical values of the processing parameters are converged to an ideal state, and the accuracy of the label classification prediction result is improved.

In a specific implementation, the training data obtained by the parameter training module forms a training data set, the training data set can be divided into multiple batches to train the processing parameters, the tag classification prediction operation is performed, each batch can contain a training corpus, i.e., a sentence list, and the size of the list is determined by actual conditions. Or, the training data set may be divided into training data at a sentence level according to a preset sentence end landmark symbol set, and the training data at the sentence level is divided into a plurality of times according to the division result to perform iterative training on the processing parameters, and the label classification prediction operation is performed respectively.

In a specific implementation, the training data may further include: the attribute label sequence of the corpus can be obtained by manually labeling the attribute labels of the dividing units, the attribute information existing in the corpus can be identified through a preset attribute labeling model, and corresponding candidate attribute labels are labeled at the dividing units of the corpus.

Based on the semantic structure of the corpus, the attribute information may include: at least one of position information of each partition unit in the corpus and grammar information of the corpus, where the grammar information may include: at least one of part-of-speech information and punctuation information. Accordingly, the sequence of attribute tags obtained by the corpus may include: at least one of a sequence of position tags and a sequence of grammar tags; the sequence of grammar tags may include: at least one of part-of-speech tags and punctuation tags.

For the processing process of the training data including the attribute tag sequence, reference may be specifically made to the description of the relevant portion of the data to be processed, which is not described herein again.

In specific implementation, the vector obtained through the embedding processing is a static vector, and the static vector has no polysemy, so that the parameter training module can encode the training data and convert the static vector into a dynamic vector, and therefore the dynamic vector can be changed according to the context information of the corpus and has polysemy.

In order to determine whether the training data is accurately encoded and obtain the correctly encoded training data, the parameter training module may perform decoding processing on the encoded training data, and verify whether the encoding processing result is accurate according to prediction on the encoded training data.

In an implementation, the encoded training data may be decoded using a Conditional Random Field network (CRF).

The conditional random field network may be pre-populated with a state transition matrix [ A ]]_a,bAnd a transmission matrix

[A]_a,bRepresenting the state transition probability of two time steps from the a-th state to the b-th state,

representation matrix

The t position after input is output as a candidate label [ v]_tWherein θ comprises a processing parameter. Conditional random field score

At the highest, the predicted sequence is obtained. Furthermore, the conditional random field model can be calculated by using a Viterbi (Viterbi) method

Thereby obtaining the prediction sequence corresponding to the optimal path.

Therefore, the parameter training module can jointly establish a loss function according to the label classification prediction result and the coding processing result, and adjust the initial processing parameters in a multi-dimensional manner, so that the initial processing parameters can be quickly converged, and the parameter adjusting efficiency is improved.

Taking the encoded attribute tag sequence as an example, in order to distinguish the attribute tag sequences before encoding processing and after decoding processing, the attribute tag sequence before encoding processing may be regarded as an attribute tag actual sequence, and the attribute tag sequence predicted by decoding processing may be regarded as an attribute tag prediction sequence.

According to a preset candidate attribute label set, obtaining a plurality of candidate attribute label labeling sequences after permutation and combination, predicting the probability of each candidate attribute label in the candidate attribute label labeling sequences for labeling a corresponding division unit in the attribute label sequences after coding processing by adopting a conditional random field network according to the attribute label sequences after coding processing, thereby obtaining the probability value of each candidate attribute label labeling sequence, and obtaining the candidate attribute label labeling sequences with the probability value meeting a preset second selection condition as the attribute label prediction sequences.

By matching the attribute tag true sequence and the attribute tag predicted sequence, whether the encoding processing result is accurate can be determined.

It should be understood that the above embodiments are only examples, and in an application, the corresponding encoded training data may be selected for decoding according to an actual situation, for example, an encoded syntax tag sequence, a position tag sequence, a corpus, and the like may be selected.

It can also be understood that, in practical applications, according to the modules and sub-modules included in the tag classification system, the processing parameters acquired by the parameter acquisition module may further include: the method comprises the following steps of embedding processing parameters, encoding processing parameters, attribute identification parameters, feature extraction parameters for iteration, logical operation parameters for iteration and the like, wherein the parameters can also be obtained by adjusting preset training data, a class label real set of the training data and a preset loss function, and the embodiment of the specification does not limit the specific parameter types included by the processing parameters.

Optionally, the parameter obtaining module may configure the module of the tag classification system before extracting the semantic features of the data to be processed, or may trigger and configure the module of the tag classification system at other times, for example, after the processing parameters are adjusted, the parameter obtaining module is triggered to reconfigure the module of the tag classification system.

In specific implementation, after the data to be processed is obtained, the class label prediction set of the data to be processed can be obtained through a preset label classification model.

Specifically, referring to a schematic structural diagram of another tag classification system shown in fig. 2, in an embodiment of the present specification, the tag classification system 20 may include:

a to-be-processed data obtaining module 21, adapted to obtain to-be-processed data, where the to-be-processed data includes to-be-processed corpora;

the tag classification prediction module 22 is adapted to extract semantic features of the data to be processed by using a preset tag classification model, perform logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed, calculate values of candidate class tags for labeling the corpus to be processed based on the fusion features of the data to be processed, obtain candidate class tags of which the values meet a preset first selection condition, and obtain a class tag prediction set.

By adopting the tag classification scheme, after the data to be processed is obtained, the original semantic information in the data to be processed and the extracted semantic information in the semantic features can be fused through the preset tag classification model, so that the influence on the tag classification prediction result caused by semantic feature extraction errors or key semantic information loss is avoided, the fused features contain rich semantic information, the data to be processed with complicated content or variable sources can be represented, the numerical value of each candidate class tag can be calculated more accurately, the correct candidate class tag is obtained to represent the classification information in the corpus to be processed, and the accuracy of the tag classification result is improved.

In a specific implementation, as shown in fig. 2, the tag classification prediction module 22 may include: the model building sub-module 221 may build a tag classification model through the acquired preset processing parameters, as shown in fig. 3, where the tag classification model 30 may include an input layer 31, an encoding layer 32, a feature extraction layer 33, a feature fusion layer 34, a decoding layer 35, and an output layer 36. Wherein, the feature extraction layer 33 is adapted to extract semantic features of the data to be processed.

As an alternative example, the feature extraction layer 33 may employ a convolutional neural network architecture. The feature extraction parameters include: convolutional Neural Network parameters, which may be a common Convolutional Neural Network (CNN) or a variation thereof, by extracting parameters through related features.

In an embodiment of the present specification, by setting a Dilation Rate parameter in the feature extraction parameter, the tag classification model may extract semantic features of the data to be processed through a variant of a convolutional Neural Network, namely, a Dilated Convolutional Neural Network (DCNN).

The feature extraction layer 33 may include at least one expansion convolutional neural network, parameters of each convolutional neural network may be set separately, dimensions of each expansion convolutional neural network may be one-dimensional or multi-dimensional, and when parameter values such as a convolution Kernel (Kernel), a Window (Window), and an expansion rate of each expansion convolutional neural network are the same, a receptive field of each expansion convolutional neural network is the same.

For example, the dimension of the dilated convolutional neural network is one-dimensional, that is, the dilated convolutional neural network is a one-dimensional dilated convolutional neural network, and when the convolution kernel size is 3 and the dilation rate is 2, the receptive field of each dilated convolutional neural network is 7 × 1.

For another example, the dimension of the convolutional neural network is two-dimensional, that is, the convolutional neural network is a two-dimensional convolutional neural network, and when the convolution kernel size is 3 and the expansion ratio is 4, the receptive field of each convolutional neural network is 15 × 15.

By adopting the scheme, the semantic features of the data to be processed are extracted through the expanded convolutional neural network, and the semantic information with longer distance can be extracted from the linguistic data to be processed under the condition that the parameter number is not increased and the data to be processed is not subjected to invalid character elimination preprocessing, so that the semantic features contain wider semantic information.

In specific implementation, as shown in fig. 3, the tag classification model 30 may further include a feature fusion layer 34, which is connected to the feature extraction layer 33 and the encoding layer 32, and is adapted to perform logical operation processing on the extracted semantic features and the data to be processed to obtain fusion features of the data to be processed, where the feature fusion layer 34 may adopt any Neural network architecture capable of implementing logical operation, for example, a perceptual Neural network (perceptual Neural Networks) architecture, and parameters of the feature fusion layer may be set through logical operation parameters.

It is to be understood that, in describing the embodiment of the present disclosure, in order to facilitate description of data interaction relationships among the neural networks, a neural network that independently implements a corresponding function may be regarded as a sub-model in the tag classification model, for example, a convolutional neural network that can independently implement a function of extracting semantic features of the data to be processed may be regarded as a semantic feature extraction sub-model; a neural network capable of independently implementing a logic operation processing function can be regarded as a logic operator model.

In practical application, each semantic feature extraction sub-model can be obtained based on each preset group of feature extraction parameters. And then, the logic operator model can carry out logic operation on the semantic features based on the data to be processed of each group and the data to be processed to obtain fusion features.

In a specific implementation, based on preset logical operation parameters, the logical operation submodel may perform weighted logical operation on each group of semantic features and the data to be processed.

In specific implementation, in order to quickly and reliably obtain the weight coefficients, the logic operator model may input at least one group of semantic features into a preset nonlinear function to perform nonlinear mapping processing, allocate weight coefficients to other groups of semantic features and the data to be processed based on a processing result, and perform weighted logic operation on the other groups of semantic features and the data to be processed based on the allocated weight coefficients. Wherein, nonlinear mapping processing can be performed on at least one group of semantic features by adopting nonlinear functions such as Sigmoid, Tanh, ReLU and the like.

In an embodiment of the present specification, the two semantic feature extraction submodels respectively output extracted semantic features, and the logical operation submodel may use Sigmoid nonlinear function to perform nonlinear mapping processing on a group of semantic features through a preset neural network to obtain Sigmoid (E)₁) And with another set of semantic features E₂And carrying out weighted logic operation on the data X to be processed to obtain fusion characteristics Y.

As an alternative example, the weighted logic operation may be performed using the following formula:

wherein σ ═ Sigmoid (E)₁)，

The sign is a tensor product operation.

It should be understood that the above description embodiments are only examples, and in practical applications, different numbers of semantic feature extraction submodels, nonlinear functions, and logical operation formulas may be selected according to actual situations, and this description embodiment does not limit this.

In a specific implementation, as shown in fig. 3, as an optional example, the tag classification model 30 may further include an iteration layer 37 located between the feature fusion layer 34 and the decoding layer 35, and is adapted to, after it is determined that a preset iteration condition is satisfied, obtain a fusion feature of the current round, extract a semantic feature of the fusion feature, and perform a logical operation on the semantic feature extracted from the fusion feature and the fusion feature to obtain an iterated fusion feature; and after the situation that the iteration condition is not met is determined, taking the iterated fusion feature as the fusion feature of the data to be processed so as to determine the numerical value of each candidate class label.

The semantic feature extraction submodel for extracting the semantic features of the fusion features and the semantic feature extraction submodel for extracting the semantic features of the data to be processed can adopt the same neural network architecture, can be a common convolutional neural network or an expanded convolutional neural network according to the expansion rate, and the parameters of the semantic feature extraction submodel for extracting the semantic features of the fusion features can be the same as or different from the parameters of the semantic feature extraction submodel for extracting the semantic features of the data to be processed; similarly, the logic operation sub-model for iteratively processing the fusion feature and the semantic extraction feature thereof and the logic operation sub-model for processing the data to be processed and the fusion feature thereof can adopt the same neural network architecture.

It can be understood that, in describing the embodiment of the present disclosure, in order to distinguish between the semantic feature extraction sub-model for extracting the semantic features of the fusion features and the semantic feature extraction sub-model for extracting the semantic features of the data to be processed, the semantic feature extraction sub-model for extracting the semantic features of the data to be processed may be referred to as a first semantic feature extraction sub-model, and the semantic feature extraction sub-model for extracting the semantic features of the fusion features may be referred to as a second semantic feature extraction sub-model. Similarly, the logic operator model for processing the data to be processed and the fusion characteristics thereof can be referred to as a first logic operator model, and the logic operator model for iteratively processing the fusion characteristics and the semantic extraction characteristics thereof can be referred to as a second logic operator model.

In practical application, according to a preset iteration time threshold, one or more sublayers can be preset in the iteration layer, the sublayers can be connected in series to form a multiple iteration relation, the first sublayer receives input fusion features, extracts semantic features of the fusion features, and performs logical operation on the semantic features extracted from the fusion features and the fusion features to obtain fusion features after one iteration; and the second sublayer receives the fusion features after the first iteration, extracts the semantic features of the fusion features after the first iteration, performs logical operation on the semantic features extracted from the fusion features and the fusion features to obtain the fusion features after the second iteration, and so on, and can obtain the fusion features after multiple iterations after multiple sublayers.

In specific implementation, a plurality of groups of feature extraction parameters for extracting fusion features may be preset, semantic features of the fusion features are respectively extracted based on the feature extraction parameters of each group, so as to obtain the semantic features of each group based on the fusion features, and then, logical operation is performed on the semantic features of each group based on the fusion features and the fusion features, so as to obtain the fusion features after iteration.

For example, in an embodiment of the present specification, referring to fig. 4, the iterative layer 40 may include two sublayers, namely a first sublayer 41 and a second sublayer 42, and the first sublayer 41 may include: a second semantic feature extraction submodels 411 and 412 and a second logical operator model 413, wherein the inputs of the second semantic feature extraction submodels 411 and 412 are connected with the input of the second logical operator model 413, and the outputs of the second semantic feature extraction submodels 411 and 412 are also connected with the input of the second logical operator model 413; the second sublayer 42 may include: the input of the second semantic feature extraction submodels 421 and 422 are connected with the input of the second logic operation submodel 423, and the output of the second semantic feature extraction submodels 421 and 422 is also connected with the input of the second logic operation submodel 423.

Fusing the features X₄₀₀As the input features of the first sublayer 41, the second semantic feature extraction submodel 411, the second semantic feature extraction submodel 412, and the second logical operator model 413 are respectively input, and the semantic feature X of the first sublayer 41 is obtained by the second semantic feature extraction submodel 411 and the second semantic feature extraction submodel 412₄₁₁And X₄₁₂The semantic feature X of the first sublayer₄₁₁And X₄₁₂And fusion feature X₄₀₀Performing logic operation through the second logic operation sub-model 413 to obtain the fusion characteristic of the first sub-layer 41, i.e. the fusion characteristic X after one iteration₄₁₃。

Fusing the feature matrix X after one iteration₄₁₃As the input features of the second sub-layer 42, a second semantic feature extraction sub-model 421, a second semantic feature extraction sub-model 422, and a second logical operator model 423 are respectively input, and the semantic feature X of the second sub-layer 42 is obtained through the second semantic feature extraction sub-model 421 and the second semantic feature extraction sub-model 422₄₂₁And X₄₂₂The semantic feature X of the second sub-layer 42₄₂₁And X₄₂₂And a fused feature matrix X after one iteration₄₁₃Performing logic operation through the second logic operation submodel 423 to obtain the fusion characteristic of the second sublayer 42, i.e. the fusion characteristic X after the second iteration₄₂₃。

It should be understood that the foregoing embodiment is only an example, and the iteration layer may set the number of sub-layers and the number of semantic feature extraction sub-models and logical operation sub-models included in each sub-layer according to an actual situation, which is not limited in this embodiment of the present specification.

In a specific implementation, the parameters of each second semantic feature extraction submodel may be set respectively, and the parameters of each second semantic feature extraction submodel of the same sub-layer may be the same or different. For example, the iteration layer may include three sublayers. Wherein the expansion ratio of the first sub-layer may be 2, the expansion ratio of the second sub-layer may be 4, and the expansion ratio of the third sub-layer may be 1.

In a specific implementation, with continued reference to fig. 3, the tag classification model 30 may further include: the input layer 31. The input layer 31 is adapted to perform a division process on the data to be processed to obtain a corresponding data sequence to be processed before extracting the semantic features of the data to be processed. The data sequence to be processed may include one or more partitioning units, where the partitioning unit is a minimum unit into which the data to be processed may be partitioned according to a preset requirement.

As an optional example, since the partition units of the data to be processed have various expression forms, in order to improve the extraction efficiency of the semantic features, before the semantic features of the data to be processed are extracted through the semantic extraction layer in the tag classification model, the data to be processed may be subjected to an embedding process, and the partition units of the data to be processed may be subjected to vectorization. Specifically, each partition unit in the corpus to be processed and each candidate attribute tag in the attribute tag sequence may be characterized in a vector manner, so that both the corpus to be processed and the attribute tag sequence may be characterized in a matrix manner.

For example, the embedding process may be performed by using a dictionary mapping method. And acquiring index values of the dividing units in the data to be processed in the mapping dictionary through a preset mapping dictionary to obtain the data to be processed after dictionary mapping processing. The to-be-processed data after the dictionary mapping process comprises the index values of all the dividing units, so the to-be-processed data after the dictionary mapping process can be represented in a vector mode.

In a specific implementation, with continued reference to fig. 3, the tag classification model 30 may further include: an encoding layer 32. The method is suitable for coding the dividing units in the data to be processed to obtain the coded data to be processed. The method includes the steps that each partition unit can be coded according to context information of data to be processed based on preset coding processing parameters to obtain coding feature vectors of each partition unit, the dimensionality of each coding feature vector is determined through the preset coding processing parameters, and the coded data to be processed are composed of the coding feature vectors of each partition unit, so that the coded data to be processed can be represented in a matrix mode.

As an optional example, the encoding layer 32 may encode the data to be processed by using any one of the following encoding processing manners:

1) adopting a time series neural network sub-model;

2) a preset mapping matrix is used.

Wherein the time-series neural network submodel may include: a transducer network model with self-attention mechanism (self-attention), a Bi-directional Long Short-Term Memory (BiLstm) network model, a gru (gated regenerative unit) network model, etc. The total number of row vectors or the total number of column vectors in the mapping matrix is not less than the total number of the dividing units in the data to be processed.

In a specific implementation, when the encoding layer includes the time-series neural network submodel, the time-series neural network submodel may be pre-trained before the data to be processed is encoded, so that the pre-trained time-series neural network submodel can deeply capture context information in the data to be processed. The following is illustrated by the following two methods:

the first method is to adopt a Language Model (LM) training method to perform pre-training.

Specifically, random pre-training corpora are obtained from a pre-training corpus set, an initial time sequence neural network submodel is input, the time sequence neural network submodel predicts the next word segmentation unit of the pre-training corpora under the condition of giving the above information, and when the prediction accuracy probability reaches a preset pre-training threshold value, pre-training is determined, and the pre-training time sequence neural network submodel is obtained. Otherwise, after adjusting the parameters passing through the time series neural network submodel, continuing pre-training through the pre-training corpus until the probability of accurate prediction reaches a preset pre-training threshold value.

And secondly, pre-training by using a Mask Language Model (MLM) training method.

Acquiring pre-training corpora randomly covering a preset proportion part from a pre-training corpus set, inputting the time sequence neural network submodel, predicting the covered preset proportion part by the time sequence neural network submodel under the condition of giving context information, and determining that pre-training is good when the prediction accuracy probability reaches a preset pre-training threshold value to obtain the pre-training time sequence neural network submodel. Otherwise, after adjusting the parameters passing through the time series neural network submodel, continuing pre-training through the pre-training corpus until the probability of accurate prediction reaches a preset pre-training threshold value.

It should be understood that the above-mentioned pre-training method is only an example, and in practical applications, the above-mentioned method or other pre-training methods may be selected according to a usage scenario, and this is not limited in this embodiment of the present disclosure.

In an embodiment of this specification, the pre-trained time series neural network sub-model may be a pre-trained BERT (Bidirectional Encoder for representing transformations) sub-model, and before the data to be processed is input into the tag classification model, the data to be processed may be pre-processed according to an input rule of the BERT sub-model, and specifically, the pre-trained time series neural network sub-model may be: and adding a first tag CLS before the start position of the data to be processed, and adding an end tag SEP before the end position of the data to be processed.

By adopting the scheme, the head label CLS has semantic information of the whole data to be processed after encoding, feature extraction and feature fusion, and is beneficial to a label classification model to acquire rich semantic information.

As an optional example, when the data to be processed is divided into multiple batches and input into the tag classification model for processing, a length threshold may be preset for the tag classification model, and if the length of the data to be processed in one batch does not satisfy the length threshold, Padding (Padding) processing may be performed on the data to be processed.

Therefore, the data to be processed is coded through the pre-training time sequence neural network submodel, and coding efficiency and coding result accuracy can be improved.

In a specific implementation, in order to improve the processing efficiency of the tag classification model, dimension reduction processing may be performed on the features in the tag classification model. For example, the fused features may be dimension reduced.

In specific implementation, the richer the sequence information contained in the data to be processed, the more accurate the semantic features can be extracted. Therefore, before extracting the semantic features of the to-be-processed data, based on the semantic structure of the to-be-processed corpus, the attribute information of the to-be-processed corpus may be identified, and a corresponding candidate attribute tag is selected from a preset candidate attribute tag set to obtain an attribute tag sequence, so that the to-be-processed data may further include: and the attribute label sequence, and the dividing unit of the attribute label sequence can be an attribute label.

The attribute tag sequence may be obtained through a tag classification model or a preset attribute labeling model, and the attribute tag sequence obtaining method may refer to the related embodiments in the tag classification system, which is not described herein again.

By adopting the scheme, the corresponding attribute tag sequence can be obtained according to the attribute information in the processed corpus, and due to the co-occurrence characteristic of the corpus to be processed and the attribute tag sequence, the semantic information of the corpus to be processed can not be damaged by increasing the attribute tag sequence, and the sequence information contained in the data to be processed can be enriched.

In a specific implementation, as shown in fig. 5, as an optional example, the tag classification model 30 may include: the combination layer 38, which is located between the encoding layer 32 and the feature extraction layer 33, differs from fig. 3 in that: the encoding layer 32 does not establish a connection with the feature fusion layer 34, while the combination layer establishes a connection with the feature fusion layer 34. The combination layer 38 is adapted to, after obtaining the attribute sequence tags, perform combination processing on the corpus to be processed and the attribute tag sequence to obtain combined data to be processed, so as to extract semantic features and perform logical operation processing.

In specific implementation, since the encoded corpus to be processed and the encoded attribute tag sequence can be represented in a matrix manner, a row vector or column vector splicing method can be adopted to perform combination processing according to the row vector or the column vector to obtain combined data to be processed, and the combined data to be processed can also be represented in a matrix manner.

For example, the Concat function may be used to apply n m of the encoded corpus to be processed₁Dimension row vector is respectively matched with n m corresponding distribution positions in the coded attribute label sequence₂Combining the dimensional row vectors to obtain n (m)₁+m₂) And (5) carrying out dimensional row vector, thereby obtaining combined data to be processed. Wherein n and m₁And m₂Is a natural number, and m₁And m₂May or may not be equal.

Or, a splicing method of matrix operation may be adopted to perform matrix operation processing on the encoded corpus to be processed and the encoded attribute tag sequence, so as to obtain combined data to be processed.

For example, n m-dimensional row vectors in the encoded corpus to be processed and corresponding n m-dimensional row vectors in the encoded attribute tag sequence may be added to obtain n m-dimensional row vectors, where n and m are natural numbers.

In a specific implementation, as shown in fig. 3 or 4, the tag classification model 30 may further include a decoding layer 35, adapted to calculate a value of each candidate class tag according to the fusion feature of the data to be processed, so as to represent the degree of association between each candidate class tag and the corpus to be processed, and obtain, according to the value of each candidate class tag, a candidate class tag whose value meets a preset first selection condition, so as to obtain a class tag prediction set.

In practical application, the fusion features can be represented by numerical values, vectors or matrixes according to preset logical operation parameters, and cannot be in one-to-one correspondence with a preset candidate class label set. Therefore, the decoding layer 35 may generate a fused feature vector based on the fused feature of the to-be-processed data, where a dimension of the fused feature vector is consistent with a total number of candidate category tags in a preset candidate category tag set, and a numerical value of each element in the fused feature vector represents a degree of association between a corresponding candidate category tag and the to-be-processed corpus.

And determining the distribution positions of the elements which accord with a preset first selection condition in the fusion characteristic vector according to the fusion characteristic vector, and obtaining candidate category labels corresponding to the distribution positions in a preset candidate category label set to obtain the category label prediction set.

As an optional example, a feature vector generation function is formed by preset feature vector generation parameters, the fusion features are input into the feature vector generation function, and data dimension transformation processing is performed on the fusion features through the feature vector generation function to obtain a q-dimensional feature transformation vector, where q is the total number of candidate class labels.

Wherein, the existing data dimension transformation methods such as Reshape function, Resize function, Swapaxes function, Flatten function unscqueeze function, and expand function can be adopted; or, a data dimension transformation method can be customized, and the fusion features are transformed into fusion feature vectors according to a preset data dimension transformation rule.

As an optional example, if the fusion feature is represented by a matrix, that is, a fusion feature matrix, the performing data dimension transformation on the fusion feature may specifically be: and performing position conversion on each element in the fusion feature matrix according to a preset sequence to obtain a position conversion vector, performing dimension reduction on the position conversion vector to reduce the dimension of the position conversion vector to be consistent with the total number of candidate category labels in a preset candidate category label set, taking the position conversion vector after dimension reduction as a feature conversion vector, and enabling the dimension of the obtained feature conversion vector to be consistent with the total number of candidate category labels in the preset candidate category label set.

The position conversion vector may be subjected to dimension reduction by using a neural network architecture, for example, the position conversion vector may be subjected to dimension reduction by using a Multi Layer Perceptual (MLP) neural network architecture.

As an optional example, by performing nonlinear conversion processing on a q-dimensional feature transformation vector, a numerical value of each element in the q-dimensional feature transformation vector can be transformed into a specified numerical value interval, so that the feature transformation vector after the nonlinear conversion processing is used as a fusion feature vector, a dimension of the fusion feature vector is consistent with a total number of candidate category labels in a preset candidate category label set, and an element in the fusion feature vector represents each candidate category label for labeling a numerical value of the corpus to be processed.

The feature transformation vector may be subjected to nonlinear conversion processing by using a numerical calculation function such as Sigmoid. The Sigmoid numerical calculation function can convert the numerical value of each element in the q-dimensional feature transformation vector into a numerical value interval [0,1], the numerical value of each element in the q-dimensional feature transformation vector is independently calculated, and the feature transformation vector after nonlinear conversion can be used for a multi-label classification task.

As an optional example, a q-dimensional feature transformation vector may be subjected to probability transformation processing, and a numerical value of each element in the q-dimensional feature transformation vector is transformed into a specified probability interval, so that the feature transformation vector after the probability transformation processing is used as a fusion feature vector, a dimension of the fusion feature vector is consistent with a total number of candidate category labels in a preset candidate category label set, and an element in the fusion feature vector represents each candidate category label for labeling a numerical value of the corpus to be processed.

Wherein, probability conversion processing can be performed on the feature transformation vector by adopting numerical calculation functions such as Softmax and the like. The Softmax numerical calculation function can convert numerical values of all elements in the q-dimensional feature transformation vector into a probability interval [0,1], the numerical values of all elements in the q-dimensional feature transformation vector are mutually constrained, the sum of the numerical values of all elements in the feature transformation vector after probability conversion processing is 1, and the Softmax numerical calculation function can be used for single-label classification tasks.

In order that those skilled in the art will better understand and appreciate the foregoing aspects, the detailed description and specific examples are set forth below in connection with the drawings.

In an embodiment of the present specification, as shown in fig. 6, a schematic structural diagram of another tag classification model according to the present specification is shown.

The label classification model 60 includes:

(1) the input layer 61 is adapted to perform preprocessing on the received data, and may specifically include: dividing the corpus S to be processed to obtain a corpus sequence { S to be processed consisting of division units₁,s₂…s_mMapping according to a preset mapping dictionary, respectively obtaining index values of each dividing unit in the corpus sequence to be processed in the mapping dictionary, converting the dividing units in the corpus sequence to be processed into corresponding numerical values, and obtaining the corpus to be processed after dictionary mapping processing, namely a corpus vector SID (single-scale { SID) } of the corpus to be processed₁,sid₂…sid_mIn which s is₁,s₂…s_mThe linguistic data sequence to be processed is divided into units, and m is the sum of all the divided units in the linguistic data sequence to be processed.

(2) The encoding layer 62 is adapted to perform attribute identification and encoding on the received data, and specifically may include:

(2.1) in the first area 621, the corpus vector SID to be processed is input to a preset first time sequence neural network sub-model for coding processing, so as to obtain first coding feature vectors corresponding to each division unit in the corpus S to be processed, and form a corpus feature matrix

Wherein the content of the first and second substances,

in the first encoding feature vector ES₁,ES₂…ES_mAll can beThe value of k is determined by the parameters of the time series neural network submodel for a dense vector of dimension k.

(2.2) in the second area 622, through the to-be-processed corpus vector SID, enabling the first attribute labeling sub-model to identify the grammar information of the to-be-processed corpus S, labeling corresponding grammar tags at each division unit of the to-be-processed corpus S to obtain a grammar tag sequence, and performing dictionary mapping processing on the grammar tag sequence to obtain a grammar tag sequence vector PLD ═ { pid ═ pid₁，pid₂…pid_m}。

(2.3) in the second area 622, the syntax label sequence vector PID is input to a preset second time sequence neural network submodel for coding processing, second coding feature vectors corresponding to all the dividing units in the syntax label sequence are obtained, and a syntax label feature matrix is formed

Wherein the content of the first and second substances,

of the second encoded feature vectors EP₁，EP₂…EP_mMay be a dense vector of dimension j, the value of j being determined by the parameters of the second time series neural network submodel.

(2.4) in the third area 623, enabling the second attribute labeling sub-model to identify the position information of the corpus S to be processed through the corpus vector SID to be processed, labeling corresponding position tags at each division unit of the corpus S to be processed to obtain a position tag sequence, and performing dictionary mapping processing on the position tag sequence to obtain a position tag sequence vector QID ═ QID { (QID [ q ] id [ ]₁，qid₂…qid_m}。

(2.5) in the third area 623, inputting the position label sequence vector QID into a preset mapping matrix for coding to obtain third coding feature vectors corresponding to each division unit in the grammar label sequence, and forming position labelsFeature matrix

Wherein the content of the first and second substances,

in each third encoded feature vector EP₁，EP₂…EP_mMay be dense vectors of dimension h, the value of h being determined by the parameters of the mapping matrix.

(3) The combining layer 63 is adapted to perform combining processing on the received data, and may specifically include: to the material feature matrix

Grammar tag feature matrix

And location tag feature matrix

Performing combination processing to obtain a combination feature matrix

Wherein the feature matrix is combined

Each combined feature vector of (1) can be represented by corpus feature matrix

Grammar tag feature matrix

And location tag feature matrix

The feature code vectors of corresponding distribution positions are spliced to formE.g. E_i＝{ES_i，EP_i，EQ_iI is a natural number, and i belongs to [1, m ]]And is and

in each combined vector E₁，E₂…E_mMay be dense vectors of dimensions h + j + k.

(4) The first fully connected layer 64 is adapted to perform dimension reduction processing on the received data, and may specifically include: using the first multi-layer perceptron sub-model 641, the feature matrices may be combined

In each combined feature vector E₁，E₂…E_mDimension reduction processing is carried out, so that a combined feature matrix after dimension reduction processing is obtained

Wherein, the combined feature vector E after the dimension reduction processing₁′，E₂′…E_mThe dimension of' may be p ═ h + j + k)/2^dD is a natural number, and 2^dIs a divisor of (h + j + k).

(5) The feature extraction layer 65 is adapted to perform semantic feature extraction on the received data, and may specifically include: the combined feature matrix after dimension reduction can be respectively processed by two expansion convolution submodels 651 and 652

Semantic feature extraction processing is carried out to obtain two semantic feature matrixes

And

wherein the content of the first and second substances,

each semantic feature vector in (a) may be a dense vector of dimension p,

each semantic feature vector in the semantic feature vector is a dense vector with dimension p.

(6) The feature fusion layer 66 is adapted to perform logical operation processing on the received data, and may specifically include: mapping semantic feature matrices

And

and the combined feature matrix after dimension reduction processing

Performing logic operation to obtain a fusion characteristic matrix

Wherein the content of the first and second substances,

in each fused feature vector D₁，D₂…D_mMay be a dense vector of dimensions p.

(7) The iteration layer 67 is adapted to perform iterative processing on the received data, and may specifically include: by four sub-layers, for the fusion feature matrix

Performing four iterations to obtain a fusion feature matrix after the four iterations

The dimension of each iterated fusion feature vector in the iterated fusion feature matrix may be p-dimension, the expansion rate of each first sublayer may be 2, the expansion rate of the second sublayer may be 4, the expansion rate of the third sublayer may be 8, and the expansion rate of the third sublayer is 8The four sublayer expansion ratio may be 1.

(8) The decoding layer 68 is adapted to predict the received data to obtain a class label prediction set, and specifically may include:

(8.1) fusion feature matrix after four iterations

The position conversion processing is carried out on each element according to the preset sequence to obtain a position conversion vector f ═ a₁，a₂...a_m×pWhere the position conversion vector f may be a dense vector of (m × p) dimensions, a₁，a₂...a_m×pAre elements in the fused feature vector f.

(8.2) using the second multi-layer perceptron sub-model 681, the position conversion vector f may be subjected to dimensionality reduction processing, so as to obtain a position conversion vector f' after dimensionality reduction { a ═ b₁′，a₂′...a_q', where the dimension of the position conversion vector f' after dimension reduction is q, q is the total number of candidate category labels in the preset candidate category label set, a₁′，a₂′...a_q' for the elements in the fused feature vector f ' after the dimensionality reduction, the position conversion vector f ' after the dimensionality reduction is taken as a feature transformation vector.

(8.3) carrying out nonlinear conversion processing on the position conversion vector f 'to obtain a characteristic conversion vector f' after the nonlinear conversion processing, taking the characteristic conversion vector f 'after the nonlinear conversion processing as a fusion characteristic vector, determining the distribution position of an element in the fusion characteristic vector f' which meets a preset first selection condition, obtaining a candidate class label corresponding to the distribution position in a preset candidate class label set, and obtaining the class label prediction set Y₁。

(9) Outputting the class label prediction set Y of the data to be processed through the output layer 69₁。

In one embodiment of the present specification, as shown in fig. 7, a schematic diagram of another tag classification model in the present specification is shown. The label classification model 70 differs from the label classification model 60 in fig. 6 in that: an input layer 71 and an encoding layer 72.

Specifically, after the corpus S to be processed is obtained, the grammar tag sequence PO and the position tag sequence QO are obtained through a preset attribute labeling model, and then the corpus S to be processed, the grammar tag sequence PO and the position tag sequence QO are input to the tag classification model 70 as data to be processed.

The input layer 71 divides the corpus S to be processed to obtain a corpus sequence { S ] to be processed composed of division units₁，s₂...s_mMapping according to a preset mapping dictionary, respectively obtaining index values of each division unit in the mapping dictionary in the corpus sequence to be processed, the grammar tag sequence and the position tag sequence, converting each division unit into corresponding numerical values, and obtaining the corpus to be processed, the grammar tag sequence, the position tag sequence and the classification tag sequence after dictionary mapping processing, namely a corpus vector SID to be processed is { SID ═ SID }₁，sid₂...sid_mPID, PID ═ PID, syntax label sequence vector₁，pid₂...pid_mAnd position tag sequence vector QID ═ qiid₁，qid₂...qid_mIn which s is₁，s₂…s_mThe linguistic data sequence to be processed is divided into units, and m is the sum of all the divided units in the linguistic data sequence to be processed.

Since the encoding layer 72 does not need to identify and label the attribute information, it can directly encode the received data.

In the first area 721, the corpus vector SID to be processed is input to a preset first time-series neural network sub-model for encoding, so as to obtain first encoding feature vectors corresponding to each partition unit in the corpus S to be processed, and form a corpus feature matrix

Wherein the content of the first and second substances,

to each otherAn encoding feature vector ES₁，ES₂…ES_mMay be a dense vector of dimension k, the value of k being determined by the parameters of the time series neural network submodel.

In the second region 722, the syntax label sequence vector PID is input to a preset second time sequence neural network submodel for coding processing, so as to obtain second coding feature vectors corresponding to each division unit in the syntax label sequence and form a syntax label feature matrix

Wherein the content of the first and second substances,

of the second encoded feature vectors EP₁,EP₂…EP_mMay be a dense vector of dimension j, the value of j being determined by the parameters of the second time series neural network submodel.

In the third region 723, the position tag sequence vector QID is input to a preset mapping matrix for encoding processing to obtain third encoding feature vectors corresponding to each partition unit in the syntax tag sequence, and the third encoding feature vectors form a position tag feature matrix

Wherein the content of the first and second substances,

in each third encoded feature vector EP₁,EP₂…EP_mMay be dense vectors of dimension h, the value of h being determined by the parameters of the mapping matrix.

The rest of the label classification model 70 can refer to the above description of the label classification model 60 of fig. 6, and is not repeated here.

In practical application, the type of the candidate category label set can be set according to specific requirements, so that the label classification system can be applied to the identification field of corresponding types.

For example, a candidate category label set of the relationship type may be set, and the candidate category label set may include: colleague labels, friend labels, couple labels, singing labels, nationality labels, residence labels, work labels, and category waiting labels. Therefore, the label classification system can be applied to the field of relation identification.

For another example, a candidate category label set of emotion types may be set, and the candidate category label set may include: happy tags, calm tags, angry tags, question tags, wait for category tags to be selected. Therefore, the label classification system can be applied to the emotion recognition field.

The more extensive the coverage of the candidate class labels of the relevant types in the candidate class label set, the richer the label classification system can perform identification. Taking the relationship identification field as an example, the linguistic data to be processed is as follows:

{ Xiaoming, 2020 birth, Sanyuan of Shanxi, Han nationality. };

executing the label classification system described in the related embodiment above, a category label prediction set can be obtained: { birth date, place of birth, nationality }.

In specific implementation, in order to improve the accuracy of the label classification prediction result, an initial label classification model may be trained, model parameters of the label classification model may be adjusted through preset training data, a class label real set of the training data, and a preset loss function, so that the label classification model converges to an ideal state, model training is completed, and the trained label classification model is used as a preset label classification model, thereby implementing a label classification system. In order to make the embodiments of the present disclosure more clearly understood and implemented by those skilled in the art, the following description is made with reference to the accompanying drawings in the embodiments of the present disclosure.

Referring to a schematic structural diagram of a training system of a label classification model shown in fig. 8, in an embodiment of the present specification, the training system 80 of the label classification model may include:

a training data obtaining module 81 adapted to obtain training data and a category label real set of the training data, wherein the training data includes training corpora;

a model training module 82, adapted to input the training data and the category label real set into an initial label classification model to extract semantic features of the training data, perform logical operation on the extracted semantic features and the training data to obtain fusion features of the training data, and calculate values of candidate category labels based on the fusion features to represent the degree of association between the candidate category labels and the training corpus, obtain candidate category labels whose values meet a preset first selection condition, and obtain a category label prediction set of the training data;

an error calculation module 83, adapted to perform error calculation on the category label real set and the category label prediction set to obtain a result error value;

a matching module 84 adapted to determine whether the label classification model meets a training completion condition according to the result error value, and determine that the label classification model completes training when the label classification model meets the training completion condition;

and the model parameter adjusting module 85 is adapted to adjust the parameters of the label classification model when the label classification model does not meet the training completion condition.

In a specific implementation, the corpus may include, but is not limited to, chinese and chinese punctuation marks, and the corpus of the corresponding language category may be selected according to the language category actually predicted by the tag classification model.

The training data in different fields can be acquired, so that the source of the training data is wider, and the calibrated training data can be acquired, so that the format of the training data is more uniform and standard. And the training data can be manually arranged data or data acquired from a public network.

In specific implementation, after the prediction by the label classification model, a class label prediction set of the training corpus can be obtained, and the error calculation module can calculate a result error value between the class label real set and the class label prediction set through a preset loss function.

Optionally, whether the parameters of the label classification model are adjusted or not can be determined by presetting a result error threshold and an error coincidence time threshold.

Specifically, when the result error value is greater than the result error threshold, the matching module determines that the tag classification model does not meet a first preset condition, and the model parameter adjustment module may adjust a parameter of the tag classification model. When the result error value is smaller than the result error threshold value, the error coincidence times are increased by one, the matching module determines whether the error coincidence times are larger than or equal to the error coincidence time threshold value, if yes, the matching module determines that the label classification model is in accordance with a first preset condition, the label classification model completes training, otherwise, the matching module determines that the label classification model is not in accordance with the first preset condition, and the model parameter adjusting module can adjust parameters of the label classification model.

The model parameter adjusting module can adjust the parameters of the label classification model by adopting one of a gradient descent method and a back propagation method.

In specific implementation, in order to verify whether the adjusted label classification model completes training, the model training module may input the training data and the label labeling real sequence of the training data into the adjusted label classification model again, and the adjusted label classification model performs the label classification prediction operation again until the label classification model meets the condition of completing training.

According to the scheme, the extracted semantic features of the training data and the training data are subjected to logical operation, so that original semantic information in the training data and extracted semantic information in the semantic features can be fused, the diversity of the semantic information in the fusion features is reserved, the label classification model can obtain richer feature information from the fusion features, the generalization capability and universality of the label classification model are enhanced, and the accuracy of the label classification prediction result is improved.

In a specific implementation, the training data may further include: the attribute label sequence of the corpus can be obtained by manually labeling the attribute labels of the dividing units, the attribute information existing in the corpus can be identified through the label classification model or a preset attribute labeling model, and corresponding candidate attribute labels are labeled at the dividing units of the corpus.

Based on the semantic structure of the corpus, the attribute information may include: at least one of position information of each dividing unit in the corpus to be processed and grammar information of the corpus to be processed; the syntax information may include: at least one of part-of-speech information and punctuation information. Accordingly, the attribute tag sequence obtained from the corpus to be processed may include: at least one of a sequence of position tags and a sequence of grammar tags; the sequence of grammar tags may include: at least one of part-of-speech tags and punctuation tags. For the processing process of the training data including the attribute tag sequence, reference may be specifically made to the description of the relevant part of the tag classification system, which is not described herein again.

In a specific implementation, after the class label prediction set of the training data is obtained, the error calculation module may calculate an error between the class label real set and the class label prediction set through a preset loss function. And the loss function can be established according to the global or local prediction result of the label classification model.

For example, the following first loss function loss may be established based on the label classification prediction result₁The error calculation module calculates a first loss function loss₁And the calculated numerical value is used as a result error value between the label labeling prediction sequence and the label labeling real sequence:

wherein, y_iRepresenting the ith element in the label classification real vector y corresponding to the category label real set,

and representing the ith element in the fused feature vector corresponding to the class label prediction set.

Further optionally, the tag classification true vector corresponding to the category tag true set may be obtained by:

according to the classification information actually existing in the training corpus and the distribution position of the candidate class label actually used for labeling the training corpus in a preset candidate class label set, a real class label vector can be generated. For example, the candidate category label set may be { origin, birth date, nationality of the country of origin of the friend, the candidate category label actually used for labeling the corpus corresponding to the classification information may be { origin, birth date, nationality of the country of origin of the friend of the country of the friend of the country of the friend of the country of origin of the friend of the country of the friend of the country of origin of the friend of the country of origin of the country of the.

Wherein "1" may indicate that the candidate category label at the corresponding location is valid, that is, the candidate category label at the corresponding location is actually used for labeling the corpus, and "0" may indicate that the candidate category label at the corresponding location is invalid, that is, the candidate category label at the corresponding location is not actually used for labeling the corpus. It is understood that, in the implementation, other values may be used to represent the valid bit and the invalid bit, and the embodiment of the present specification is not limited thereto.

For example, in order to determine whether the training data is accurately encoded and obtain an accurate encoded attribute tag sequence, the tag classification model may decode the encoded training data, predict the encoded training data, and verify whether the encoding result is accurate. Therefore, a loss function can be established jointly according to the label classification prediction result and the coding processing result, the initial processing parameters can be adjusted in a multi-dimensional mode, the initial processing parameters can be converged quickly, and the parameter adjusting efficiency is improved.

In an embodiment of the present specification, the training system of the label classification model may perform the following operations:

1) training data and a category label real set of the training data are obtained, wherein the training data comprises training corpora and attribute label sequences.

2) Inputting the training data and the category label real set into an initial label classification model to perform coding processing on the training data, extracting semantic features of the training data, and performing logical operation on the extracted semantic features and the training data to obtain fusion features of the training data.

3) And calculating the numerical value of each candidate class label based on the fusion characteristics so as to represent the correlation degree of each candidate class label and the training corpus, and acquiring the candidate class label of which the numerical value meets a preset first selection condition to obtain a class label prediction set of the training data.

4) Based on the coded attribute label sequences, the label classification model calculates the probability value of each candidate attribute label labeling sequence, obtains the candidate attribute label labeling sequence with the probability value meeting the preset second selection condition, and obtains the attribute label prediction sequence of the training data.

5) And calculating a first error between the category label real set and the category label prediction set, calculating a second error between the attribute label sequence and the attribute label prediction sequence, and calculating the first error and the second error to obtain a result error value.

6) And determining whether the label classification model meets the training completion condition or not based on the result error value, and adjusting the parameters of the label classification model when the label classification model does not meet the training completion condition.

7) And inputting the training data and the class label real set of the training data into the adjusted label classification model until the label classification model meets the training completion condition.

In specific implementation, a loss function may be jointly established based on a prediction result output by the label classification model and a decoding processing result, and a parameter of the label classification model may be adjusted by using a gradient descent method or a back propagation method based on a preset jointly established loss function.

Optionally, the tag classification model uses a conditional random field network to decode the encoded attribute tag sequence, so as to obtain an attribute tag prediction sequence.

Wherein the conditional random field network can be preset with a state transition matrix [ A ]]_a，bAnd a transmission matrix

[A]_a，bRepresenting the state transition probability of two time steps from the a-th state to the b-th state,

representing encoded attribute tag sequences (i.e., attribute tag feature matrix)

Outputting as candidate attribute label at t position after inputting

Wherein θ contains the parameters of the entire label classification model. Conditional random field score

And when the maximum value is reached, obtaining the attribute label prediction sequence. Moreover, the conditional random field model can be calculated by a Viterbi method

Thereby, a candidate attribute label labeling sequence corresponding to the optimal path can be obtained as an attribute label prediction sequence.

First loss sub-function loss established based on the label classification prediction result_labelAnd a second loss sub-function loss established based on the prediction result of the decoding process_classifyJointly establishing a second loss function loss₂Second loss function loss₂The method specifically comprises the following steps:

loss₂＝λ₁loss_label+λ₂loss_classify；

wherein the content of the first and second substances,

representing an attribute tag feature matrix containing T attribute tag feature vectors,

representing an attribute tag prediction sequence comprising T candidate attribute tags;

λ₁and λ₂Is a positive number

It is understood that the above-described embodiments are merely exemplary, and the loss function may be established according to actual situations.

In specific implementation, after the loss functions are jointly established, the model training module may adjust the weight coefficients of the loss sub-functions, so as to automatically control the parameter adjusting direction and parameter adjusting strength of the model, for example, if the loss function is loss₂＝λ₁loss_label+λ₂loss_classifyWhen lambda is₁Greater than λ₂The control gradient descent method and the back propagation method tend to adjust the parameters of the class label prediction when lambda is₁Less than λ₂The control gradient descent method and the back propagation method tend to adjustThe attribute tags the predicted parameters.

In an embodiment of the present specification, as shown in fig. 9, a schematic structural diagram of another tag classification model of the present specification is shown. The difference from the label classification model shown in fig. 6 and 7 is that: a decoding layer 91 and an output layer 92.

Specifically, training data is input into a label classification model to obtain an iterated fusion feature matrix [ D ]_train]Then, the decoding layer 91 may perform position conversion processing on each element in the iterated fusion feature matrix according to a preset order to obtain a position conversion vector f_trainWith the second multi-layer perceptron sub-model 101, the decoding layer 91 may also convert the position of the vector f_trainDimension reduction is carried out, so that a position conversion vector f after dimension reduction is obtained_train' As a feature transformation vector, a feature transformation vector f is transformed_train' performing nonlinear conversion processing to obtain a characteristic transformation vector f after the nonlinear conversion processing_train", as the fusion feature vector.

Based on the encoded attribute tag sequence, i.e., the attribute tag feature matrix [ ET ], the decoding layer 91 may calculate the probability value of each candidate attribute tag label sequence by using the conditional random field submodel 912, obtain a candidate attribute tag label sequence whose probability value meets a preset second selection condition, and obtain an attribute tag prediction sequence of the training data.

Then, the decoding layer 91 may calculate an error between the category label real set and the category label prediction set according to a preset loss function, so as to obtain a resulting error value loss. The output layer 92 may output the result error value loss, so as to determine whether the label classification model is trained. The loss function may refer to the related embodiments described above, and will not be described herein again.

It can be understood that, the process of obtaining the corresponding fusion feature matrix and attribute tag feature matrix according to the training data by the tag classification model may refer to the description of the related embodiments of the tag classification system, and is not described herein again.

In practical application, the type of the candidate class label set can be set according to specific requirements, so that the label classification model trained by the training system can be applied to the identification field of corresponding types.

For example, a candidate category label set of the relationship type may be set, and the candidate category label set may include: colleague labels, friend labels, couple labels, singing labels, nationality labels, residence labels, work labels, and category waiting labels. Therefore, the trained label classification model can be applied to the field of relation recognition.

For another example, a candidate category label set of emotion types may be set, and the candidate category label set may include: happy tags, calm tags, angry tags, question tags, wait for category tags to be selected. Therefore, the trained label classification model can be applied to the emotion recognition field

The wider the coverage of the candidate class labels of the relevant types in the candidate class label set is, the richer the label classification model completing the training can perform identification.

It should be noted that, in practical applications, each module and sub-module included in the label classification system and the training system of the label classification model may be implemented by using a corresponding hardware circuit, device, module, or the like. For example, the to-be-processed data acquisition module, the model training module, and the like can be executed by data processing chips such as a single chip microcomputer and an FPGA. The modules and sub-modules can be controlled by the same processing device, or can be executed by different processing devices, and the different processors can be distributed on the same hardware device, or can be distributed on different hardware devices.

It is to be understood that the terms first, second, etc. may be used for convenience in this description to distinguish one from another. And the terms "first," "second," "third," etc. prefix herein are used merely to distinguish one term from another, and do not denote any order, size, or importance, etc.

Although the disclosed embodiments are disclosed above, the disclosed embodiments are not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the embodiments of the present disclosure, and it is therefore intended that the scope of the embodiments of the present disclosure be limited only by the terms of the appended claims.

Claims

1. A label sorting system, comprising:

2. The tag classification system according to claim 1, wherein the semantic extraction module is adapted to extract semantic features of the data to be processed respectively according to preset feature extraction parameters of each group to obtain semantic features of each group;

the logic operation module is suitable for performing logic operation on each group of semantic features and the data to be processed to obtain fusion features.

3. The label classification system according to claim 2, wherein the logical operation module comprises:

the weight distribution submodule is suitable for inputting at least one group of semantic features into a preset nonlinear function to carry out nonlinear mapping processing and distributing weight coefficients for other groups of semantic features and the data to be processed according to a processing result;

and the weighting calculation submodule is suitable for performing weighted logic operation on the semantic features of the other groups and the data to be processed according to the distributed weight coefficients.

4. The label classification system according to any one of claims 1-3, further comprising: the iteration module is positioned between the logic operation module and the numerical calculation module;

the iteration module is suitable for acquiring the fusion feature of the current round after the iteration module is determined to meet the preset iteration condition, extracting the semantic feature of the fusion feature, and performing logical operation on the semantic feature extracted from the fusion feature and the fusion feature to obtain the fusion feature after iteration; and after the situation that the iteration condition is not met is determined, taking the iterated fusion feature as the fusion feature of the data to be processed so as to determine the numerical value of each candidate class label.

5. The tag classification system according to claim 1, wherein the to-be-processed data obtaining module is further adapted to, before extracting the semantic features of the to-be-processed data, identify attribute information existing in the to-be-processed corpus, and obtain attribute tags corresponding to the attribute information to obtain an attribute tag sequence, where the attribute information includes: at least one of position information of each dividing unit in the corpus to be processed and grammar information of the corpus to be processed;

the label sorting system further comprises:

and the data combination module is suitable for combining the linguistic data to be processed and the attribute label sequence to obtain combined data to be processed for extracting semantic features.

6. The tag classification system according to claim 5, wherein the numerical computation module is adapted to generate a fused feature vector according to a fused feature of the data to be processed, a dimension of the fused feature vector is consistent with a total number of candidate class tags in a preset candidate class tag set, and a numerical value of each element in the fused feature vector represents a degree of association between a corresponding candidate class tag and the corpus to be processed;

the label obtaining module is suitable for determining the distribution positions of the elements of which the numerical values meet the preset first selection condition in the fusion feature vector, obtaining candidate category labels corresponding to the distribution positions in a preset candidate category label set, and obtaining the category label prediction set.

7. The label classification system according to claim 1, further comprising: the method comprises the following steps:

the parameter acquisition module is suitable for acquiring preset processing parameters and configuring the semantic extraction module, the logic operation module and the numerical calculation module according to the processing parameters, wherein the processing parameters comprise: feature extraction parameters, logical operation parameters and numerical calculation parameters.

8. The label classification system according to claim 7, further comprising:

the parameter training module is suitable for adjusting initial processing parameters through preset training data, a category label real set of the training data and a preset loss function, and taking the adjusted processing parameters as preset processing parameters;

wherein the loss function is established based on label classification prediction results of the training data, the training data comprising: training corpus, wherein the category label real set of the training data comprises: and actually labeling the candidate category label of the linguistic data to be processed.

9. A label sorting system, comprising:

10. A system for training a label classification model, comprising: