CN114764865A - Data classification model training method, data classification method and device - Google Patents

Data classification model training method, data classification method and device Download PDF

Info

Publication number
CN114764865A
CN114764865A CN202110003555.1A CN202110003555A CN114764865A CN 114764865 A CN114764865 A CN 114764865A CN 202110003555 A CN202110003555 A CN 202110003555A CN 114764865 A CN114764865 A CN 114764865A
Authority
CN
China
Prior art keywords
data classification
classification model
training
updated
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110003555.1A
Other languages
Chinese (zh)
Inventor
杨奕凡
文瑞
陈曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110003555.1A priority Critical patent/CN114764865A/en
Publication of CN114764865A publication Critical patent/CN114764865A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a data classification model training method, a data classification device, computer equipment and a storage medium. The method comprises the following steps: training a coding layer of the initial data classification model based on a current training sample corresponding to a current data classification task to obtain an intermediate data classification model; training a coding layer of the intermediate data classification model based on an updated training sample corresponding to the next data classification task to obtain an updated data classification model; respectively inputting the forward trained samples of the updated training samples into the intermediate data classification model and the coding layer of the updated data classification model to obtain corresponding first features and second features; training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model; and finally training to obtain the target data classification model through alternate training of the coding layer and the mapping layer. By adopting the method, the accuracy of data classification can be improved.

Description

Data classification model training method, data classification method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data classification model training method, a data classification method, an apparatus, a computer device, and a storage medium.
Background
With the development of computer technology, machine learning models have emerged. Training the machine learning model based on the specific data classification task can enable the machine learning model to learn the relevant knowledge of the data classification task, and further can carry out specific data classification.
However, when the machine learning model faces a series of data classification tasks, catastrophic forgetfulness occurs, that is, the knowledge learned by the machine learning model in the previous data classification task is covered by the new knowledge in the subsequent data classification task, which finally results in the performance of the machine learning model in the previous data classification task being degraded, that is, the accuracy of data classification of the machine learning model in the previous data classification task is reduced.
Disclosure of Invention
In view of the above, it is necessary to provide a data classification model training method, a data classification method, an apparatus, a computer device, and a storage medium capable of improving accuracy of data classification.
A method of data classification model training, the method comprising:
acquiring a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks;
acquiring a current training sample corresponding to a current data classification task, and training a coding layer of an initial data classification model based on the current training sample to obtain an intermediate data classification model;
acquiring an updated training sample corresponding to the next data classification task, and training a coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model;
respectively inputting the forward trained samples of the updated training samples into the intermediate data classification model and the coding layer of the updated data classification model to obtain corresponding first features and second features;
training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model;
and returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished.
In one embodiment, obtaining a set of training samples comprises:
obtaining a plurality of candidate samples; the candidate sample carries a candidate tag; clustering all candidate samples corresponding to the same candidate label to obtain an initial cluster corresponding to each candidate label; determining a processing priority corresponding to each candidate sample based on the candidate label; clustering the initial clustering cluster serving as a data classification subtask to obtain a target data classification task set corresponding to each processing priority; the target data classification task set comprises training samples corresponding to the same data classification task; and obtaining a training sample set based on each target data classification task set.
In one embodiment, the target data classification model is used for determining a target label corresponding to data to be classified from candidate labels corresponding to the trained samples.
In one embodiment, training a mapping layer of an updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model includes:
inputting a second feature corresponding to the forward trained sample into a mapping layer of the updated data classification model to obtain a corresponding predicted feature; calculating a target training loss value based on a first feature and a predicted feature corresponding to the same training sample; and adjusting the mapping layer parameters of the updated data classification model based on the target training loss value until the convergence condition is met, and obtaining an updated intermediate data classification model.
An apparatus for training a data classification model, the apparatus comprising:
the training sample set acquisition module is used for acquiring a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks;
the coding layer training module is used for acquiring a current training sample corresponding to a current data classification task, and training a coding layer of the initial data classification model based on the current training sample to obtain an intermediate data classification model; acquiring an updated training sample corresponding to the next data classification task, and training a coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model;
the mapping layer training module is used for respectively inputting the forward trained samples of the updated training samples into the intermediate data classification model and the coding layer of the updated data classification model to obtain corresponding first features and second features; training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model;
and the target data classification model determining module is used for returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining the target data classification model based on the corresponding intermediate data classification model when the training is finished.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks;
acquiring a current training sample corresponding to a current data classification task, and training a coding layer of an initial data classification model based on the current training sample to obtain an intermediate data classification model;
acquiring an updated training sample corresponding to the next data classification task, and training a coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model;
respectively inputting the forward trained samples of the updated training samples into the intermediate data classification model and the coding layer of the updated data classification model to obtain corresponding first features and second features;
training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model;
and returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks;
acquiring a current training sample corresponding to a current data classification task, and training a coding layer of an initial data classification model based on the current training sample to obtain an intermediate data classification model;
acquiring an updated training sample corresponding to the next data classification task, and training a coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model;
respectively inputting the forward trained samples of the updated training samples into the intermediate data classification model and the coding layer of the updated data classification model to obtain corresponding first features and second features;
training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model;
and returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished.
The data classification model training method, the device, the computer equipment and the storage medium are characterized in that a training sample set is obtained, the training sample set comprises training samples corresponding to at least two data classification tasks, a current training sample corresponding to the current data classification task is obtained, a coding layer of an initial data classification model is trained based on the current training sample to obtain an intermediate data classification model, an updating training sample corresponding to the next data classification task is obtained, the coding layer of the intermediate data classification model is trained based on the updating training sample to obtain an updating data classification model, the forward trained sample of the updating training sample is respectively input into the intermediate data classification model and the coding layer of the updating data classification model to obtain a first characteristic and a second characteristic which correspond to each other, and a mapping layer of the updating data classification model is trained based on the first characteristic and the second characteristic which correspond to the same training sample, and obtaining an updated intermediate data classification model, returning to the step of obtaining an updated training sample corresponding to the next data classification task until the training is finished, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished. Therefore, the training of the coding layer aiming at the data classification model can lead the data classification model to learn the knowledge of each data classification task from the training sample corresponding to each data classification task, the training of the mapping layer aiming at the data classification model can lead the data classification model to learn the general knowledge among each data classification task, and through the alternate training of the coding layer and the mapping layer, the data classification accuracy of the data classification model on the forward data classification task can be guaranteed, the data classification accuracy of the data classification model on the new data classification task can also be guaranteed, and a general target data classification model is obtained.
A method of data classification, the method comprising:
acquiring data to be classified;
inputting the data to be classified into a target data classification model to obtain a target classification result corresponding to the data to be classified;
the target data classification model is a step of training a coding layer of an initial data classification model based on a current training sample corresponding to a current data classification task in a training sample set to obtain an intermediate data classification model, training the coding layer of the intermediate data classification model based on an updated training sample corresponding to a next data classification task in the training sample set to obtain an updated data classification model, inputting a forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model respectively to obtain a corresponding first characteristic and a second characteristic, training a mapping layer of the updated data classification model based on the first characteristic and the second characteristic corresponding to the same training sample to obtain the updated intermediate data classification model, and returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
An apparatus for data classification, the apparatus comprising:
the data acquisition module is used for acquiring data to be classified;
the classification result determining module is used for inputting the data to be classified into the target data classification model to obtain a target classification result corresponding to the data to be classified;
the target data classification model is a step of training a coding layer of an initial data classification model based on a current training sample corresponding to a current data classification task in a training sample set to obtain an intermediate data classification model, training the coding layer of the intermediate data classification model based on an updated training sample corresponding to a next data classification task in the training sample set to obtain an updated data classification model, inputting a forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model respectively to obtain corresponding first and second characteristics, training a mapping layer of the updated data classification model based on the first and second characteristics corresponding to the same training sample to obtain an updated intermediate data classification model, and returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring data to be classified;
inputting the data to be classified into a target data classification model to obtain a target classification result corresponding to the data to be classified;
the target data classification model is a step of training a coding layer of an initial data classification model based on a current training sample corresponding to a current data classification task in a training sample set to obtain an intermediate data classification model, training the coding layer of the intermediate data classification model based on an updated training sample corresponding to a next data classification task in the training sample set to obtain an updated data classification model, inputting a forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model respectively to obtain corresponding first and second characteristics, training a mapping layer of the updated data classification model based on the first and second characteristics corresponding to the same training sample to obtain an updated intermediate data classification model, and returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring data to be classified;
inputting the data to be classified into a target data classification model to obtain a target classification result corresponding to the data to be classified;
the target data classification model is a step of training a coding layer of an initial data classification model based on a current training sample corresponding to a current data classification task in a training sample set to obtain an intermediate data classification model, training the coding layer of the intermediate data classification model based on an updated training sample corresponding to a next data classification task in the training sample set to obtain an updated data classification model, inputting a forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model respectively to obtain corresponding first and second characteristics, training a mapping layer of the updated data classification model based on the first and second characteristics corresponding to the same training sample to obtain an updated intermediate data classification model, and returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
The data classification method, the device, the computer equipment and the storage medium obtain the data to be classified, input the data to be classified into the target data classification model to obtain the target classification result corresponding to the data to be classified, the target data classification model is based on the current training sample corresponding to the current data classification task in the training sample set, train the coding layer of the initial data classification model to obtain the intermediate data classification model, train the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task in the training sample set to obtain the updated data classification model, input the forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model respectively to obtain the corresponding first characteristic and second characteristic, train the mapping layer of the updated data classification model based on the first characteristic and the second characteristic corresponding to the same training sample, and obtaining an updated intermediate data classification model, returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task until the training is finished, and obtaining the updated intermediate data classification model based on the corresponding intermediate data classification model when the training is finished. Therefore, the data classification model can learn the knowledge of each data classification task from the training samples corresponding to each data classification task by training aiming at the coding layer of the data classification model, the data classification model can learn the general knowledge among each data classification task by training aiming at the mapping layer of the data classification model, and the data classification accuracy of the data classification model on the forward data classification task and the data classification accuracy of the data classification model on the new data classification task can be guaranteed through alternate training of the coding layer and the mapping layer, so that a general target data classification model is obtained.
Drawings
FIG. 1 is a diagram of an exemplary implementation of a data classification model training method;
FIG. 2 is a schematic flow chart diagram illustrating a method for training a data classification model according to an embodiment;
FIG. 3 is a schematic flow chart diagram illustrating a method for training a data classification model in another embodiment;
FIG. 4 is a schematic flow chart illustrating the process of obtaining a training sample set according to one embodiment;
FIG. 5 is a flowchart illustrating an encoding process performed by an encoding layer on input data according to an embodiment;
FIG. 6 is a diagram that illustrates text entity extraction, in one embodiment;
FIG. 7 is a flow diagram that illustrates the assignment of attention to physical characteristics, in one embodiment;
FIG. 8 is a schematic flow chart illustrating training of a mapping layer of an updated data classification model based on first and second features corresponding to the same training sample in one embodiment;
FIG. 9 is a flow diagram that illustrates a method for data classification in one embodiment;
FIG. 10A is a flowchart illustrating a method for training a data classification model according to another embodiment;
FIG. 10B is a graph comparing the experimental results of the data classification model training method and the conventional method according to an embodiment of the present invention;
FIG. 10C is a graph comparing experimental results of the data classification model training method of the present application in one embodiment;
FIG. 11 is a block diagram showing an example of the structure of a data classification model training apparatus;
FIG. 12 is a block diagram showing the construction of an apparatus for training a data classification model according to an embodiment;
FIG. 13 is a block diagram showing the construction of a data sorting apparatus according to one embodiment;
FIG. 14 is a diagram showing an internal structure of a computer device in one embodiment;
FIG. 15 is a diagram of an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further performing graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The scheme provided by the embodiment of the application relates to the technologies of artificial intelligence, such as computer vision, natural language processing, machine learning and the like, and is specifically explained by the following embodiments:
the data classification model training method and the data classification method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.
The terminal 102 and the server 104 can be used separately to execute the data classification model training method provided in the embodiments of the present application. For example, the server 104 obtains a training sample set, where the training sample set includes training samples corresponding to at least two data classification tasks. The server 104 obtains a current training sample corresponding to the current data classification task, trains the coding layer of the initial data classification model based on the current training sample, and obtains an intermediate data classification model. The server 104 obtains an updated training sample corresponding to the next data classification task, and trains the coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model. The server 104 inputs the forward trained samples of the updated training samples into the intermediate data classification model and the coding layer of the updated data classification model respectively to obtain corresponding first features and second features, and trains the mapping layer of the updated data classification model based on the first features and the second features corresponding to the same training samples to obtain an updated intermediate data classification model. If the training is needed to be continued, the server 104 may continue to obtain the updated training sample corresponding to the next data classification task, continue the training, that is, return to the step of obtaining the updated training sample corresponding to the next data classification task until the training is completed, and obtain the target data classification model based on the corresponding intermediate data classification model when the training is completed.
The terminal 102 and the server 104 may also be cooperatively used to perform the data classification model training method provided in the embodiments of the present application. For example, the terminal 102 sends a training sample set to the server 104, where the training sample set includes training samples corresponding to at least two data classification tasks. After the server 104 receives the training sample set, the server 104 may obtain a current training sample corresponding to the current data classification task, train the coding layer of the initial data classification model based on the current training sample to obtain an intermediate data classification model, obtain an updated training sample corresponding to the next data classification task, train the coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model, input the forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model, respectively, to obtain corresponding first features and second features, train the mapping layer of the updated data classification model based on the first features and second features corresponding to the same training sample to obtain an updated intermediate data classification model, and return to the step of obtaining the updated training sample corresponding to the next data classification task, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished until the training is finished.
Of course, both the terminal 102 and the server 104 can be used separately to perform the data classification method provided in the embodiments of the present application. The terminal 102 and the server 104 may also be cooperatively used to execute the data classification method provided in the embodiments of the present application.
In one embodiment, as shown in fig. 2, a data classification model training method is provided, which is described by taking the method as an example applied to the computer device in fig. 1, where the computer device may be the terminal 102 or the server 104 in fig. 1. Referring to fig. 2, the data classification model training method includes the steps of:
step S202, acquiring a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks.
The training sample set comprises training samples corresponding to at least two data classification tasks. A plurality of training samples correspond to one data classification task. The data classification task is used for classifying data and determining a target data class corresponding to the data to be classified from at least one data class. The data to be classified may specifically be an image to be classified, a text to be classified, or the like. The data classification tasks comprise at least one data classification subtask, one data classification subtask corresponds to one data category, and different data classification subtasks correspond to different data categories.
For example, the training sample set includes training texts corresponding to a text classification task one, a text classification task two, and a text classification task three, respectively. The first text classification task comprises a text category 1 and a text category 2, and the first text classification task is used for determining a target text category corresponding to a text to be classified from the text category 1 and the text category 2. The text classification task II comprises a text class 3 and a text class 4, and is used for determining a target text class corresponding to the text to be classified from the text class 3 and the text class 4. The text classification task three comprises a text category 5 and a text category 6, and the text classification task three is used for determining a target text category corresponding to the text to be classified from the text category 5 and the text category 6.
The training sample set comprises training images corresponding to the image classification task I, the image classification task II and the image classification task III respectively. The first image classification task comprises a plurality of sports dog varieties, such as English-history Bingg dogs, Ireland Saite dogs, Labradore dogs and the like, and the first image classification task is used for determining a target variety corresponding to a dog in an image to be classified from each sports dog variety. The second image classification task comprises a plurality of working dog varieties, such as autumn dogs, Siberian bodhing dogs and the like, and the second image classification task is used for determining a target variety corresponding to the dog in the image to be classified from each working dog variety. The third image classification task comprises varieties of pet dogs, such as doll, bomei dog, and teddy dog, and the third image classification task is used for determining a target variety corresponding to the dog in the image to be classified from various varieties of the pet dogs.
It is to be understood that at least one different data classification sub-task is included between different data classification tasks, for example, the data classification task one includes a data classification sub-task a and a data classification sub-task b, and the data classification task two includes a data classification sub-task a and a data classification sub-task c. The first data classification task comprises a data classification subtask a and a data classification subtask b, and the second data classification task comprises a data classification subtask c and a data classification subtask d. The division of the data classification task can be set according to actual needs, for example, different division methods are set according to different application scenarios.
Specifically, the computer device may train a machine learning model only applicable to a specific data classification task based on a training sample corresponding to the data classification task, for example, train a machine learning model one based on a training text corresponding to a text classification task, where the machine learning model one is used to determine a target class corresponding to a text to be classified from text classes corresponding to the text classification task one. And training a second machine learning model based on the training texts corresponding to the second text classification task, wherein the second machine learning model is used for determining a target class corresponding to the text to be classified from all text classes corresponding to the second text classification task. However, to reduce the number of models and improve the efficiency of model training, the computer device may acquire a set of training samples, train a common data classification model based on the set of training samples, and the set of training samples includes training samples corresponding to at least two data classification tasks. The computer equipment can train the machine learning model based on the training samples corresponding to a part of the data classification tasks, and then continue to train the machine learning model based on the training samples corresponding to the new data classification tasks, so that the machine learning model can continuously learn new data classification recognition capability from the training samples corresponding to the new data classification tasks under the condition of keeping the learned data classification recognition capability, and the data classification recognition capability of the machine learning model is expanded. For example, the machine learning model is trained based on the training text corresponding to the first text classification task, and then the machine learning model is continuously trained based on the training text corresponding to the second text classification task, so that the machine learning model can determine the target class corresponding to the text to be classified from the text classes corresponding to the first and second text classification tasks.
Step S204, obtaining a current training sample corresponding to the current data classification task, and training a coding layer of the initial data classification model based on the current training sample to obtain an intermediate data classification model.
The current data classification task may be any one of the data classification tasks in the training sample set. The current training sample refers to a training sample corresponding to the current data classification task. The initial data classification model is a data classification model having initialization model parameters. There are various methods for initializing model parameters, such as random assignment, zero-setting or gaussian distribution initialization, etc. The data classification model is a machine learning model for classifying input data and determining a data class corresponding to the input data. The intermediate data classification model is a machine learning model obtained by training an initial data classification model based on a current training sample and adjusting parameters of a coding layer.
The data classification model includes an encoding layer and a mapping layer. The encoding layer is used for encoding input data, the mapping layer is used for mapping the output result of the encoding layer, and finally the classification result corresponding to the input data can be generated based on the output result of the mapping layer. The coding layer of the data classification model is trained based on the training samples corresponding to the data classification tasks, so that the data classification model can learn the relevant knowledge of the data classification tasks, the trained data classification model can be used for processing the data classification tasks, and the target data classes corresponding to the data to be classified are determined from the data classes corresponding to the data classification tasks.
Specifically, the computer device may determine a current data classification task from the training sample set, obtain a current training sample corresponding to the current data classification task, train a coding layer of the initial data classification model based on the current training sample, and adjust parameters of the coding layer of the initial data classification model, thereby obtaining an intermediate data classification model.
For example, the training sample set includes training samples corresponding to at least two data classification tasks, and each data classification task may be randomly ordered to obtain a data classification task sequence. The computer device may use the first-ranked data classification task as a current data classification task, and train the coding layer of the initial data classification model based on the training samples corresponding to the current data classification task to obtain an intermediate data classification model. In this way, the intermediate data classification model may learn the relevant knowledge of the first-ranked data classification task.
Step S206, obtaining an updated training sample corresponding to the next data classification task, and training the coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model.
The next data classification task may be other data classification tasks except the current data classification task, that is, a data classification task not involved in model training in the training sample set. The updated training sample refers to a training sample corresponding to the next data classification task. The updated data classification model is a machine learning model obtained by training the intermediate data classification model based on the updated training samples and adjusting parameters of the coding layer. It can be understood that when the coding layer of the model is trained, only the parameters of the coding layer are adjusted, and when the mapping layer of the model is trained subsequently, only the parameters of the mapping layer are adjusted.
Specifically, the computer device may determine a next data classification task from the training sample set, obtain an updated training sample corresponding to the next data classification task, train the coding layer of the intermediate data classification model based on the updated training sample, and obtain an updated data classification model, that is, on the basis that the training of the coding layer of the initial data classification model based on the current training sample is completed, continue to train the coding layer of the initial data classification model based on the updated training sample corresponding to the next data classification task.
For example, from the data classification task sequence, the computer device may use the data classification task ranked second as a next data classification task, train the coding layer of the intermediate data classification model based on the training sample corresponding to the next data classification task, and adjust the parameters of the coding layer of the intermediate data classification model, thereby obtaining the updated data classification model. In this way, updating the data classification model may learn the relevant knowledge to rank the second data classification task.
In an embodiment, the training of the coding layer of the data classification model based on the training samples may specifically be inputting the training samples into the data classification model, outputting prediction labels corresponding to the training samples by the data classification model, calculating a training loss value based on the training labels and the prediction labels carried by the training samples, and adjusting coding layer parameters of the data classification model based on the training loss value until the updated training loss value satisfies a convergence condition, so as to obtain the data classification model of the trained coding layer. The labels are used for identifying the data types of the training samples, the training labels refer to the real data types corresponding to the training samples, and the prediction labels refer to the prediction data types corresponding to the training samples.
Step S208, respectively inputting the forward trained sample of the updated training sample into the intermediate data classification model and the coding layer of the updated data classification model to obtain the corresponding first feature and second feature.
The forward trained sample of the updated training sample refers to each training sample that has been used for model training before the training sample is updated. For example, from the data classification task sequence, the computer device trains the coding layer of the initial data classification model based on the training samples corresponding to the data classification task ranked first to obtain an intermediate data classification model, and trains the coding layer of the intermediate data classification model based on the training samples corresponding to the data classification task ranked second. Then, the forward training samples corresponding to the training samples sorted by the second data classification task are the training samples corresponding to the data classification task sorted by the first data classification task. It is to be understood that the forward trained samples that update the training samples include training samples corresponding to at least one data classification task.
The first characteristic is data output by a coding layer of the intermediate data classification model after the forward trained sample for updating the training sample is input into the intermediate data classification model. The second characteristic is data output by an encoding layer of the updated data classification model after the updated data classification model is input into a forward trained sample for updating the training sample.
Specifically, the computer device may obtain a forward trained sample of the updated training sample from the training sample set, input the forward trained sample of the updated training sample into the intermediate data classification model and the coding layer of the updated data classification model, respectively, take a result output by the coding layer of the intermediate data classification model as the first feature, and take a result output by the coding layer of the updated data classification model as the second feature. It can be understood that, when there are a plurality of forward training samples for updating the training samples, each forward training sample is input into the coding layer of the intermediate data classification model and the coding layer of the updated data classification model, so as to obtain the first feature and the second feature corresponding to each forward training sample.
Step S210, training the mapping layer of the updated data classification model based on the first characteristic and the second characteristic corresponding to the same training sample to obtain an updated intermediate data classification model.
Specifically, the computer device may train the mapping layer of the updated data classification model based on the first feature and the second feature corresponding to the same training sample, and adjust parameters of the mapping layer of the updated data classification model, thereby obtaining a new data classification model. It will be appreciated that the results output by the intermediate data classification model of the forward trained samples of the updated training samples are relatively accurate because the intermediate data classification model is trained based on the forward trained samples of the updated training samples. The results output by updating the data classification model for the forward trained samples of the training samples are less accurate because the updated data classification model is based on the intermediate data classification model by further adjusting the model parameters based on the updated training samples. At this time, on the basis that the updated data classification model learns the knowledge of the updated training sample through the coding layer training, and thus has good performance on the data classification task corresponding to the updated training sample, the mapping layer of the updated data classification model is trained based on the first feature and the second feature corresponding to the same training sample, so that the updated data classification model can maintain the performance of the data classification task corresponding to the previously trained sample, and thus the new data classification model obtained through the mapping layer training can be applicable to the data classification tasks corresponding to all the currently trained samples. Further, the new data classification model is substituted for the previous intermediate data classification model, i.e. the new data classification model is used as the updated intermediate data classification model.
In an embodiment, the training of the mapping layer of the updated data classification model based on the first feature and the second feature corresponding to the training sample may specifically be that the second feature corresponding to the training sample is used as an input of the coding layer of the updated data classification model, the first feature corresponding to the training sample is used as an expected output of the coding layer of the updated data classification model, and the coding of the updated data classification model is subjected to supervised training to obtain the data classification model after the mapping layer is trained. The first characteristic corresponding to the training sample is the output result of the coding layer of the intermediate data classification model, and the second characteristic corresponding to the training sample is the output result of the coding layer of the updated data classification model. Therefore, when the second features are used as input of the mapping layer of the updated data classification model, the first features are used as target output of the mapping layer, and a result output by the forward trained sample through the intermediate data classification model and a result output by the updated data classification model can be converged into a common space, so that the data classification model trained through the mapping layer learns the common knowledge of the current trained data classification task, and the data classification model which is universal to the current trained data classification task is obtained.
In an embodiment, since the data classification model already has a good model parameter through the coding layer training, when the mapping layer of the data classification model is trained, a part of forward trained samples can be obtained from all the forward trained samples of the updated training samples, the mapping layer of the updated data classification model can be trained based on a small number of forward trained samples, and the mapping layer parameter is finely adjusted to obtain a new data classification model.
And S212, returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished.
The target data classification model is a data classification model which is finally trained.
Specifically, after the mapping layer of the updated data classification model is trained based on the first feature and the second feature corresponding to the same training sample to obtain a new data classification model, the previous intermediate data classification model can be replaced by the new data classification model, the new data classification model is used as the updated intermediate data classification model, the step of obtaining the updated training sample corresponding to the next data classification task is returned, and a new round of coding layer training and mapping layer training is performed. Training a coding layer of a current latest intermediate data classification model based on a training sample corresponding to a new next data classification task to obtain a new updated data classification model, inputting a training sample corresponding to a forward data classification task of the latest trained data classification task into the current latest intermediate data classification model and the coding layer of the updated data classification model respectively to obtain a first characteristic and a second characteristic which correspond to each other, training a mapping layer of the current latest updated data classification model based on the first characteristic and the second characteristic which correspond to the same training sample to obtain a new data classification model, replacing the previous intermediate data classification model with the new data classification model, taking the new data classification model as the updated intermediate data classification model, returning to the step of obtaining the updated training sample corresponding to the next data classification task, and performing a new round of coding layer training and mapping layer training. By analogy, when training samples corresponding to all data classification tasks in the training sample set participate in model training, the training is finished, and a target data classification model is obtained based on the finally obtained data classification model after the training is finished. Or, when the training times of the mapping layer reach the preset times, the training is finished, and the target data classification model is obtained based on the data classification model finally obtained when the training is finished. Specifically, the target data classification model obtained based on the corresponding intermediate data classification model after the training is completed may be a data classification model obtained finally as the target data classification model.
For example, referring to fig. 3, the training sample set includes a training sample one corresponding to the data classification task one, a training sample two corresponding to the data classification task two, a training sample three corresponding to the data classification task three, and a training sample four corresponding to the data classification task four. And training the coding layer of the initial data classification model based on the training samples to obtain a first data classification model. And training the coding layer of the first data classification model based on the training samples to obtain a second data classification model. And inputting the first training sample into the coding layer of the first data classification model to obtain the first characteristic of the first training sample output by the coding layer, and inputting the second training sample into the coding layer of the second data classification model to obtain the second characteristic of the first training sample output by the coding layer. And training the mapping layer of the second data classification model based on the first characteristic and the second characteristic of the training sample I to obtain an updated second data classification model.
And training the coding layer of the updated second data classification model based on the training sample three to obtain a third data classification model. And respectively inputting the first training sample into the coding layers of the updated second data classification model and the third data classification model to obtain the first characteristic and the second characteristic of the first training sample, and respectively inputting the second training sample into the coding layers of the updated second data classification model and the third data classification model to obtain the first characteristic and the second characteristic of the second training sample. And training the mapping layer of the third data classification model based on the first characteristic and the second characteristic of the training sample I and the first characteristic and the second characteristic of the training sample II to obtain an updated third data classification model.
And training the coding layer of the updated third data classification model based on the training sample four to obtain a fourth data classification model. Respectively inputting the first training sample into the coding layers of the updated third data classification model and the fourth data classification model to obtain the first characteristic and the second characteristic of the first training sample, respectively inputting the second training sample into the coding layers of the updated third data classification model and the fourth data classification model to obtain the first characteristic and the second characteristic of the second training sample, and respectively inputting the third training sample into the coding layers of the updated third data classification model and the fourth data classification model to obtain the first characteristic and the second characteristic of the third training sample. And training the mapping layer of the fourth data classification model based on the first characteristic and the second characteristic of the training sample I, the first characteristic and the second characteristic of the training sample II, and the first characteristic and the second characteristic of the training sample III to obtain an updated fourth data classification model, and taking the updated fourth data classification model as a target data classification model.
In this way, through the alternate training of the coding layer and the mapping layer, the data classification model can keep the performance on the data classification task on the basis of learning the knowledge of a new data classification task every time. Then, after the data to be classified is input into the target data classification model, the target data classification model may determine the target data class corresponding to the data to be classified from the data classes corresponding to all the trained data classification tasks.
In the data classification model training method, a training sample set is obtained, the training sample set comprises training samples corresponding to at least two data classification tasks, a current training sample corresponding to the current data classification task is obtained, a coding layer of an initial data classification model is trained on the basis of the current training sample to obtain an intermediate data classification model, an updating training sample corresponding to the next data classification task is obtained, the coding layer of the intermediate data classification model is trained on the basis of the updating training sample to obtain an updating data classification model, a forward trained sample of the updating training sample is respectively input into the coding layer of the intermediate data classification model and the coding layer of the updating data classification model to obtain corresponding first characteristics and second characteristics, and a mapping layer of the updating data classification model is trained on the basis of the first characteristics and the second characteristics corresponding to the same training sample, and obtaining an updated intermediate data classification model, returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished. Therefore, the training of the coding layer aiming at the data classification model can lead the data classification model to learn the knowledge of each data classification task from the training sample corresponding to each data classification task, the training of the mapping layer aiming at the data classification model can lead the data classification model to learn the general knowledge among each data classification task, and through the alternate training of the coding layer and the mapping layer, the data classification accuracy of the data classification model on the forward data classification task can be guaranteed, the data classification accuracy of the data classification model on the new data classification task can also be guaranteed, and a general target data classification model is obtained.
In one embodiment, as shown in fig. 4, obtaining a set of training samples comprises:
step S402, obtaining a plurality of candidate samples; the candidate sample carries a candidate tag.
And S404, clustering each candidate sample corresponding to the same candidate label to obtain an initial cluster corresponding to each candidate label.
The candidate sample may be an image or text. Candidate labels are categories of data used to identify candidate samples.
Specifically, the computer device may obtain a plurality of candidate samples carrying candidate tags, and perform clustering on each candidate sample corresponding to the same candidate tag to obtain an initial cluster corresponding to each candidate tag. That is, the computer device may classify the candidate samples according to the data categories of the candidate samples, and form each candidate sample of the same data category into an initial cluster, to obtain initial clusters corresponding to each data category.
Step S406, determining a processing priority corresponding to each candidate sample based on the candidate label.
Step S408, taking the initial clustering cluster as a data classification subtask, clustering each data classification subtask corresponding to the same processing priority to obtain a target data classification task set corresponding to each processing priority; the target data classification task set comprises training samples corresponding to the same data classification task.
The processing priority is determined according to a parent class corresponding to the specific data class of the candidate sample. The candidate tags of the candidate samples may be considered sub-categories, one sub-category corresponding to one parent category, and one parent category corresponding to at least one sub-category. The correspondence between the parent category and the child category may be stored in advance in the category comparison table, so that the parent category corresponding to the child category may be determined based on the category comparison table.
For example, the candidate sample is an electronic medical record. The paternal categories of electronic medical records can be divided according to the severity of the disease, for example, the paternal categories of electronic medical records include mild disease, moderate disease, and severe disease. The sub-categories of electronic medical records can be classified according to specific diseases, such as cold, cough, fever, cancer and the like. The diseases with small severity are classified as mild diseases, the diseases with medium severity are classified as moderate diseases, and the diseases with large severity are classified as severe diseases. The electronic medical record is a digitalized medical record stored, managed, transmitted and reproduced by electronic equipment (a computer, a health card and the like) and is used for replacing a handwritten paper medical record. Its contents include all the information of the paper case history.
The candidate sample is a dog image. The father category of the dog image may be divided according to the use of the dog, and for example, the father category of the dog image includes a sport dog, a work dog, a stock dog, a hunting dog, an aristoloc dog, a family dog, and a pet dog. The subclasses of dog images can be divided according to specific dog varieties, such as doll, bomei dog, autumn dog, and the like. Various dogs for hunting birds were classified into sport dogs, various dogs for hunting animals were classified into hunting dogs, various dogs for appreciation and appreciation were classified into pet dogs, various dogs for performing tasks and working operations were classified into working dogs, various dogs for shepherds and cattle were classified into livestock dogs, various dogs for eliminating venomous snake pests were classified into stock dogs, and various dogs for companion, nursing and house keepers were classified into family dogs.
Specifically, the computer device may determine a processing priority corresponding to each candidate sample based on the candidate label of each candidate sample, and one parent class may be used as one processing priority. Further, the computer device may use one initial cluster as a data classification subtask, and perform clustering on each data classification subtask belonging to the same parent class to obtain a target data classification task set corresponding to each processing priority. One target data classification task set comprises candidate samples corresponding to sub-categories under the same parent category, namely, one target data classification task set comprises training samples corresponding to the same data classification task. A parent category may serve as a data classification task.
And step S410, obtaining a training sample set based on each target data classification task set.
Specifically, after obtaining each target data classification task set through clustering, the computer device may obtain a training sample set based on each target data classification task set, and specifically, the training sample set may be composed of each target data classification task set.
For example, each electronic medical record belonging to the same disease is classified into one initial cluster, and the initial cluster corresponding to each disease is obtained. An initial cluster serves as a data classification subtask. And classifying all data classification subtasks belonging to mild diseases into a target data classification task set, classifying all data classification subtasks belonging to moderate diseases into a target data classification task set, classifying all data classification subtasks belonging to severe diseases into a target data classification task set, and forming a training sample set by the three target data classification task sets. That is, the training sample set includes training samples corresponding to the data classification task one, the data classification task two, and the data classification task three, respectively. The training samples corresponding to the data classification task one comprise at least one specific electronic medical record corresponding to mild diseases, the training samples corresponding to the data classification task two comprise at least one specific electronic medical record corresponding to moderate diseases, and the training samples corresponding to the data classification task three comprise at least one specific electronic medical record corresponding to severe diseases.
In one embodiment, the target data classification model is used for determining a target label corresponding to data to be classified from candidate labels corresponding to the trained samples.
Specifically, after the target data classification model is obtained through training, the computer device may obtain data to be classified, and input the data to be classified into the target data classification model, and the target data classification model may output a target label corresponding to the data to be classified, where the target label is a candidate label determined from candidate labels corresponding to all trained samples, and the probability that the data to be classified belongs to the candidate label is the largest. It can be understood that the existing machine learning model is usually trained based on a training sample corresponding to a data classification task, and then the machine learning model can only determine a target data class corresponding to data to be classified from each data class corresponding to the data classification task. For example, an image recognition model is obtained based on training of training samples corresponding to the sports dog, and the image recognition model can only recognize which sports dog the dog in the image to be classified belongs to, and has the highest probability. However, the target data classification model of the present application is obtained by training samples corresponding to a plurality of data classification tasks, and the target data classification model may determine a target data class corresponding to data to be classified from each data class corresponding to each trained data classification task. For example, the image recognition model is obtained based on training samples corresponding to a sports dog, a working dog, a herding dog, a hunting dog, a beast dog, a family dog and a pet dog respectively, and the image recognition model can recognize which breed the dog in the image to be classified belongs to with the highest probability.
In this embodiment, the data classification subtasks may be obtained by performing first clustering on the candidate samples based on the candidate labels, and the data classification subtasks may be obtained by performing second clustering on the data classification subtasks based on the processing priority, so that a training sample set for training the general data classification model may be obtained based on training samples corresponding to the data classification tasks.
In one embodiment, training samples in the training sample set carry training labels, the data classification model to be trained is an initial data classification model or an intermediate data classification model, and training an encoding layer of the data classification model to be trained includes the following steps: performing data processing on the input training sample through a to-be-trained data classification model to obtain a prediction label corresponding to the input training sample; and adjusting the coding layer parameters of the data classification model to be trained based on the label difference between the training label corresponding to the input training sample and the prediction label until the convergence condition is met.
Wherein the training labels are real data classes used for identifying the training samples. The prediction label is used for identifying the prediction category of the training sample, namely the classification result output by the data classification model after the training sample is input into the data classification model. The encoding layer parameters refer to model parameters of an encoding layer of the data classification model.
Specifically, the data classification model to be trained is an initial data classification model or an intermediate data classification model, and training the coding layer of the data classification model to be trained may specifically be to perform data processing on an input training sample through the data classification model to be trained to obtain a prediction label corresponding to the input training sample, calculate a label difference based on the training label corresponding to the input training sample and the prediction label, perform back propagation based on the label difference, update the coding layer parameters of the data classification model to be trained, that is, calculate a training loss value based on the training label corresponding to the input training sample and the prediction label, update the coding layer parameters of the data classification model to be trained based on the training loss value, until a convergence condition is satisfied, and obtain the data classification model of the trained coding layer. The convergence condition may specifically be that the training loss value is a minimum value, a change rate of the training loss value is smaller than a preset threshold, the number of iterations reaches a preset number, and the like. The calculation of the tag difference or the training loss value may specifically be performed by a cross entropy loss function, an exponential loss function, a hinge loss function, or other loss functions.
When the data classification model to be trained is an initial data classification model, inputting a current training sample into the initial data classification model to obtain a prediction label corresponding to the current training sample, and adjusting the coding layer parameters of the initial data classification model based on the label difference between the training label corresponding to the current training sample and the prediction label until a convergence condition is met to obtain an intermediate data classification model.
And when the data classification model to be trained is the intermediate data classification model, inputting the updated training sample into the intermediate data classification model to obtain a prediction label corresponding to the updated training sample, and adjusting the coding layer parameters of the intermediate data classification model based on the training label corresponding to the updated training sample and the label difference of the prediction label until the convergence condition is met to obtain the updated data classification model.
In this embodiment, data processing is performed on an input training sample through a to-be-trained data classification model to obtain a prediction label corresponding to the input training sample, and coding layer parameters of the to-be-trained data classification model are adjusted based on a label difference between a training label corresponding to the input training sample and the prediction label until the updated label difference satisfies a convergence condition, so as to obtain a data classification model of a trained coding layer. Model parameters of the coding layer can be quickly adjusted based on supervised training, and the coding layer of the data classification model is quickly trained.
In one embodiment, the current data classification model is any one of an initial data classification model, an intermediate data classification model, an updated data classification model and a target data classification model, an encoding layer of the current data classification model is used for encoding an input training sample to obtain initial features, and a mapping layer of the current data classification model is used for mapping the initial features to obtain intermediate features.
Specifically, the coding layer of the current data classification model is used for coding an input training sample to obtain an initial feature, the mapping layer is used for mapping the initial feature to obtain an intermediate feature, and finally, a prediction label corresponding to the input training sample can be obtained based on the intermediate feature output by the mapping layer. The current data classification model is any one of an initial data classification model, an intermediate data classification model, an updated data classification model and a target data classification model. It is to be understood that the current data classification model refers to the data classification model currently in use. That is, all data classification models of the present application encode input training samples through an encoding layer to obtain initial features, and map the initial features through a mapping layer to obtain intermediate features.
In one embodiment, the data classification model includes an encoding layer, a mapping layer, and a classification layer. After the training samples are input into the data classification model, the coding layer of the data classification model carries out coding processing on the input training samples, and characteristic information of the input training samples is extracted to obtain initial characteristics corresponding to the input training samples. Further, a mapping layer of the data classification model receives the initial features transmitted by the coding layer, performs mapping processing on the initial features, and maps the initial features into a feature space to obtain intermediate features corresponding to the input training samples. And finally, a classification layer of the data classification model receives the intermediate features transmitted by the mapping layer, classifies the intermediate features, converts the intermediate features into a preset data format and outputs the preset data format, so that a prediction label corresponding to the training sample is obtained.
In one embodiment, the encoding layer of the data classification model includes at least one encoding subnetwork, and the mapping layer includes at least one mapping subnetwork, one encoding subnetwork corresponding to one mapping subnetwork. When the coding layer of the data classification model comprises at least two coding sub-networks, each coding sub-network is used for extracting different characteristic information of the input training samples. For example, the image classification model includes three coding sub-networks, one coding sub-network for extracting overall features of the input image, one coding sub-network for extracting background features of the input image, and one coding sub-network for extracting foreground features of the input image. The image classification model comprises four coding sub-networks, wherein one coding sub-network is used for extracting color features of an input image, one coding sub-network is used for extracting texture features of the input image, one coding sub-network is used for extracting shape features of the input image, and one coding sub-network is used for extracting spatial relation features of the input image. The specific use of each coding subnetwork in the data classification model can be set according to a specific data classification task.
When the coding layer of the data classification model comprises at least two coding subnetworks, each coding subnetwork may receive different data as input data. For example, the training sample includes at least two types of training data, one type of training data corresponds to one coding sub-network, after the training sample is input into the data classification model, various types of training data are respectively input into the corresponding coding sub-networks in the data classification model, after each coding sub-network performs coding processing on the respective input data, the corresponding initial sub-features are respectively output, and the initial features are obtained based on the respective initial sub-features. In addition, the data classification model may further include a data extraction layer, after the training sample is input into the data classification model, the training sample is input into the data extraction layer first to extract corresponding target data, then the target data is input into a corresponding coding sub-network to obtain initial sub-features corresponding to the target data, and the training sample is input into the corresponding coding sub-network to obtain initial sub-features corresponding to the training sample. For example, when the training sample is a training image, after the training image is input into the data classification model, the entity extraction layer may first perform entity extraction on the training image, extract an entity in a text on the training image, and obtain entity data corresponding to the training image. Then, the entity data and the training images are respectively input into corresponding coding sub-networks, the entity data are coded through the coding sub-networks corresponding to the entity extraction layer to obtain entity characteristics, the training images are coded through the coding sub-networks corresponding to the training images to obtain image characteristics, and finally the entity characteristics and the image characteristics form initial characteristics.
The coding sub-networks and the mapping sub-networks correspond one to one. And the initial sub-features output by each coding sub-network are respectively input into the corresponding mapping sub-networks, each mapping sub-network respectively carries out mapping processing on the respective initial sub-features, intermediate sub-features corresponding to the initial sub-features are output, and the intermediate features are obtained based on the intermediate sub-features. Further, after receiving each intermediate sub-feature, the classification layer can perform unified classification processing on each intermediate sub-feature, fuse each intermediate sub-feature and convert the intermediate sub-feature into a preset data format for outputting, so as to obtain a prediction label corresponding to the training sample.
In one embodiment, the encoding layer specifically includes a forward encoding process and a reverse encoding process, the forward encoding process is performed on the input training sample to obtain a forward characteristic corresponding to the input training sample, the reverse encoding process is performed on the input training sample to obtain a reverse characteristic corresponding to the input training sample, and an initial characteristic corresponding to the input training sample is obtained based on the forward characteristic and the reverse characteristic. It can be understood that, when the respective input data is subjected to the encoding processing, each encoding sub-network may respectively perform the forward encoding processing and the backward encoding processing, and obtain the corresponding initial sub-feature based on the processing results of the forward encoding processing and the backward encoding processing.
In one embodiment, when the coding layer of the data classification model comprises at least two coding subnetworks, different coding subnetworks may interact with each other. For example, the initial sub-feature a output by the first coding sub-network affects the initial sub-feature b output by the second coding sub-network. And updating the initial sub-feature b output by the second coding sub-network based on the initial sub-feature a output by the first coding sub-network to obtain an initial sub-feature b'. Specifically, the updating may be performed by performing processes such as attention allocation and fusion.
In one embodiment, when the data classification model is a text classification model, the coding layer structure of the text classification model may be an LSTM network. The LSTM network is a special RNN that can learn long-term dependencies and efficiently extract context features. When the data classification model is an image classification model, the coding layer structure of the image classification model may be specifically a CNN network.
In this embodiment, the coding layer of the data classification model is configured to perform coding processing on an input training sample to obtain an initial feature, and the mapping layer of the data classification model is configured to perform mapping processing on the initial feature transmitted by the coding layer to obtain an intermediate feature. Through cooperation of the coding layer and the mapping layer, the data classification model can finally output a classification result corresponding to the input data.
In one embodiment, as shown in fig. 5, the input training sample includes text data and entity data, the coding layer includes a coding sub-network corresponding to the text data and a coding sub-network corresponding to the entity data, and the determining of the initial characteristic includes the steps of:
and step S502, carrying out coding processing on the text data through a coding sub-network corresponding to the text data to obtain text characteristics.
And step S504, coding the entity data through the coding sub-network corresponding to the entity data to obtain the entity characteristics.
Step S506, obtaining initial characteristics based on the entity characteristics and the text characteristics.
The training samples can be training texts, the text data refers to the whole training texts, and the entity data refers to each entity in the training texts.
Specifically, the input training sample comprises text data and entity data, and the coding layer comprises a coding sub-network corresponding to the text data and a coding sub-network corresponding to the entity data. After the input training samples are input into the data classification model by the computer equipment, the text data input into the training samples can be coded through the coding subnetworks corresponding to the text data to obtain text characteristics corresponding to the text data, the entity data input into the training samples are coded through the coding subnetworks corresponding to the entity data to obtain entity characteristics corresponding to the entity data, and finally, the entity characteristics and the text characteristics form initial characteristics.
In one embodiment, the training text may be input into the entity extraction model, i.e., the text data is input into the entity extraction model, resulting in entity data. The entity extraction model can be an entity recognition model based on BERT-LSTM-CRF. Further, an entity extraction model for fine-grained entity extraction can be obtained based on entity training sample training of fine-grained labeling. And fine-grained marking refers to thinning entities, dividing common entities into finer entity categories, and maximally utilizing detailed knowledge of entity words. For example, the traditional medical entity extraction model can only identify coarse entities such as diseases, symptoms and medicines in the medical text, and the medical entity extraction model is obtained by training the entity training sample based on the fine-grained labeling, so that the coarse-grained entities such as diseases, symptoms and medicines in the medical text can be identified, and fine-grained entities such as degree words, time words, part words, negative words and connecting words in the medical text can be identified. Referring to fig. 6, after the medical text is input into the medical entity extraction model for fine-grained entity extraction, each fine-grained entity and coarse-grained entity in the medical text may be output. As shown in FIG. 6, the original medical text (Raw Note) inputted into the medical entity extraction model is Chinese "red rash of left scalp with pain, pulsatile pain, persistent complaint, mild fever, and no dizziness" and the corresponding English "Get tickles on the left scale with touching of continuous pulsation of blood. The output of the medical entity extraction model may be the labeling of each fine-grained entity and coarse-grained entity and the corresponding entity category in the medical text. As shown in fig. 6, the entity extraction result specifically includes position word (location) -left side (left), position word (Body Part) -scalp (scale), Symptom (Symptom) -rash (tickles) pain (ache), Modifier (Modifier) -pulsation (pulsation) persistence (continuousness) mild (mil), Symptom (Symptom) -fever (wind), Negative word (Negative word) -no dizziness (no dizziness), and the entities (Sub-entities) are arranged according to the appearance sequence in the chinese text. In training the medical entity extraction model, a large number of medical corpora may be collected, for example, 50000 medical corpora are collected, 14 types of entity categories including diseases, symptoms, drugs, treatment methods, etc. are labeled, wherein 11 types are fine-grained entity categories including but not limited to conjunctions, negatives, part words, degree words, etc. Of course, it is understood that other fields besides the medical field may also train corresponding entity extraction models for fine-grained entity extraction.
In an embodiment, the encoding of the text data by the encoding subnetwork corresponding to the text data to obtain the text feature may specifically be forward encoding of the text data by the encoding subnetwork corresponding to the text data to obtain a forward feature corresponding to the text data, reverse encoding of the text data by the encoding subnetwork corresponding to the text data to obtain a reverse feature corresponding to the text data, and splicing the forward feature and the reverse feature to obtain the text feature. The obtaining of the entity characteristic by encoding the entity data through the encoding sub-network corresponding to the entity data may specifically be that the forward encoding processing is performed on the entity data through the encoding sub-network corresponding to the entity data to obtain the forward characteristic corresponding to the entity data, the reverse encoding processing is performed on the entity data through the encoding sub-network corresponding to the entity data to obtain the reverse characteristic corresponding to the entity data, and the forward characteristic and the reverse characteristic are spliced to obtain the entity characteristic.
In this embodiment, when the data classification model is a text classification model, the input training sample may include entity data in addition to text data. In this way, the coding layer of the data classification model can code the text data through the coding sub-network corresponding to the text data to obtain text characteristics, code the entity data through the coding sub-network corresponding to the entity data to obtain entity characteristics, and transmit the text characteristics and the entity characteristics to the mapping layer. The text classification accuracy of the text classification model can be improved by comprehensively considering the influence of texts and entities on the text classification.
In one embodiment, the current data classification model includes an entity extraction layer, the coding layer includes a coding sub-network corresponding to the entity extraction layer and a coding sub-network corresponding to the input training sample, and the determining of the initial features includes the following steps: performing entity extraction on the input training sample through an entity extraction layer to obtain entity data corresponding to the input training sample; coding the entity data through a coding sub-network corresponding to the entity extraction layer to obtain entity characteristics; coding the input training sample through a coding sub-network corresponding to the input training sample to obtain text characteristics; initial features are derived based on the entity features and the text features.
Specifically, the data classification model may include an entity extraction layer in addition to the encoding layer and the mapping layer. After the training samples are input into the data classification model, entity extraction can be performed on the input training samples through an entity extraction layer to obtain entity data corresponding to the input training samples. And then, respectively inputting the entity data and the input training sample into corresponding coding subnetworks, coding the entity data through the coding subnetworks corresponding to the entity extraction layer to obtain entity characteristics, coding the input training sample through the coding subnetworks corresponding to the input training sample to obtain text characteristics, and finally forming initial characteristics by the entity characteristics and the text characteristics.
In an embodiment, when the training sample is a training text, after the training text is input into the data classification model, entity extraction may be performed on the training text through the entity extraction layer to obtain entity data corresponding to the training text. Then, the entity data and the training texts are respectively input into corresponding coding sub-networks, the entity data are coded through the coding sub-networks corresponding to the entity extraction layers to obtain entity characteristics, the training texts are coded through the coding sub-networks corresponding to the training texts to obtain text characteristics, and finally the entity characteristics and the text characteristics form initial characteristics.
In an embodiment, the obtaining of the entity characteristic by encoding the entity data through the encoding subnetwork corresponding to the entity extraction layer may specifically be performing forward encoding processing on the entity data through the encoding subnetwork corresponding to the entity extraction layer to obtain a forward characteristic corresponding to the entity data, performing reverse encoding processing on the entity data through the encoding subnetwork corresponding to the entity extraction layer to obtain a reverse characteristic corresponding to the entity data, and splicing the forward characteristic and the reverse characteristic to obtain the entity characteristic. Specifically, the text feature obtained by coding the input training sample through the coding sub-network corresponding to the input training sample may be obtained by performing forward coding on the input training sample through the coding sub-network corresponding to the input training sample to obtain a forward feature corresponding to the input training sample, performing reverse coding on the input training sample through the coding sub-network corresponding to the input training sample to obtain a reverse feature corresponding to the input training sample, and splicing the forward feature and the reverse feature to obtain the text feature.
In this embodiment, in order to improve data processing efficiency, the data classification model may further include an entity extraction layer in addition to the encoding layer and the mapping layer. Therefore, the entity extraction of the training text is not needed through an additional entity extraction model, and then the training text and the extracted entity are input into the data classification model together. The entity extraction of the training text can be carried out through the entity extraction layer in the data classification model, a series of operations such as entity extraction, coding processing, mapping processing and the like can be carried out in the data classification model only by inputting the training text into the data classification model, and therefore the data processing efficiency is improved.
In one embodiment, as shown in fig. 7, the entity data includes at least one entity, the entity features include entity sub-features corresponding to the respective entities, and the obtaining of the initial features based on the entity features and the text features includes:
step S702, respectively performing attention distribution on entity sub-features corresponding to each entity based on the text features to obtain attention information corresponding to each entity.
Wherein, the attention allocation is used for allocating different attention information to each entity, and the attention information of the entity can reflect the importance of the entity in the text.
In particular, entity features and text features may interact with each other. For example, in the medical field, knowledge of some entity words (e.g., symptoms, examinations, drugs, etc.) can play a crucial role in the final classification of medical history text. The same entity words should have different importance in different contexts, for example, the electronic medical record a records fever accompanied by red rash of the head, the electronic medical record B records only simple fever, and the electronic medical record a and the electronic medical record B obviously correspond to different text categories. It will be appreciated that, in addition to the medical neighborhood, so do other areas of text classification, entity data can affect the text classification results to some extent. Thus, to place more emphasis on important entities in the current context, a mechanism of attention between entities and context can be introduced. The attention mechanism may enable the neural network to focus on a subset of its inputs (or features): a particular input is selected. In situations where computing power is limited, the attention mechanism is a resource allocation scheme that is the primary means to solve the information overload problem, allocating computing resources to more important tasks. The computer device can respectively distribute attention to the entity sub-features corresponding to each entity based on the text features to obtain the attention information corresponding to each entity.
In one embodiment, the computer device may perform attention allocation according to the following formula:
Figure BDA0002882094670000281
wherein u ismAnd the attention information of the mth entity in the training text is represented.
Figure BDA0002882094670000282
Representing training textThe entity vector of the m-th entity, i.e. the entity sub-features of the m-th entity.
Figure BDA0002882094670000283
Presentation pair
Figure BDA0002882094670000284
And performing matrix transposition. h iscA text vector, i.e., a text feature, representing the training text.
Figure BDA0002882094670000285
Representing computations
Figure BDA0002882094670000286
Is of a length, | | hc| |2 denotes calculating hcDie length of (2). The importance relationship between each two entities can be obtained through the operation between the entity vector of one entity and the text vector containing the entity.
Step S704, perform normalization processing on the attention information corresponding to each entity to obtain an attention factor corresponding to each entity.
Specifically, the computer device may perform normalization processing on the attention information corresponding to each entity to obtain the attention factor corresponding to each entity. That is, the computer device may perform normalization processing on the attention information corresponding to each entity to obtain a weight corresponding to each entity.
In one embodiment, the computer device may perform the normalization process according to the following formula:
Figure BDA0002882094670000287
wherein, amAnd representing the weight corresponding to the mth entity in the training text, namely the attention factor corresponding to the mth entity in the training text. exp denotes the e-exponential operation. u. ofmAnd the attention information of the mth entity in the training text is represented. M represents the number of entities in the training text. u. ujAnd showing attention information of the jth entity in the training text. Tong (Chinese character of 'tong')The weight of an entity in the text can be obtained by comparing the entity with all entities.
Step S706, the updated entity characteristics are obtained based on the entity sub-characteristics and the attention factors corresponding to the entities.
Specifically, the computer device performs weighted summation on the entity sub-features corresponding to each entity and the corresponding attention factors to obtain updated entity features.
In one embodiment, the computer device may calculate the updated entity characteristics according to the following formula:
Figure BDA0002882094670000291
wherein h issRepresenting updated entity characteristics, amAnd representing the weight corresponding to the mth entity in the training text, namely the attention factor corresponding to the mth entity in the training text.
Figure BDA0002882094670000292
And the entity vector representing the mth entity in the training text, namely the entity sub-feature of the mth entity. M represents the number of entities in the training text. And multiplying the original entity vector by the weight to obtain a final entity vector.
Step S708, an initial feature is obtained based on the text feature and the updated entity feature.
Specifically, when the updated entity characteristics are computed, the computer device may combine the text characteristics and the updated entity characteristics into the initial characteristics. Further, the initial features are transmitted to a mapping layer, the text features are mapped through a mapping sub-network corresponding to the text data to obtain intermediate sub-features corresponding to the text data, the updated entity features are mapped through a mapping sub-network corresponding to the entity data to obtain intermediate sub-features corresponding to the entity data, and the intermediate sub-features corresponding to the text data and the intermediate sub-features corresponding to the entity data form the intermediate features.
In one embodiment, the computer device may map the text features according to the following formula:
Figure BDA0002882094670000293
wherein Z iscRepresenting the result of the mapping of the text features.
Figure BDA0002882094670000294
Model parameters representing the mapped sub-network to which the text data corresponds.
Figure BDA0002882094670000295
Pair of representations
Figure BDA0002882094670000296
And performing matrix transposition. h is a total ofcOutput data representing text features corresponding to the training text, i.e., the encoding subnetwork to which the text data corresponds. x is a radical of a fluorine atomcThe input data representing the coding sub-network is text data.
Figure BDA0002882094670000297
Represents a pair xcAnd performing coding processing, namely performing coding processing on the text data through a coding sub-network corresponding to the text data.
In one embodiment, the computer device may perform the mapping process on the updated entity characteristics according to the following formula:
Figure BDA0002882094670000301
wherein, ZsAnd representing the mapping result of the updated entity characteristics.
Figure BDA0002882094670000302
And representing model parameters of the mapping sub-network corresponding to the entity data.
Figure BDA0002882094670000303
Pair of representations
Figure BDA0002882094670000304
And performing matrix transposition. h is a total ofsRepresenting the updated entity characteristics. x is the number ofsThe input data representing the coding sub-network is entity data.
Figure BDA0002882094670000305
(xs) Represents the pair xsAnd performing coding processing and attention distribution processing, namely performing coding processing on the entity data through a coding sub-network corresponding to the entity data, and performing attention distribution processing on the entity characteristics based on the text characteristics.
In this embodiment, attention distribution processing is performed on the entity features based on the text features, so that important entities and non-important entities in the text can be distinguished. The important entities represent more key semantic information and allocate more attention to the important entities, so that the accuracy of text classification can be accurately improved.
In one embodiment, the encoding processing of the current input data by the current encoding subnetwork to obtain the corresponding characteristic of the current input data includes: carrying out forward coding processing on current input data through a current coding sub-network to obtain forward characteristics; performing reverse coding processing on current input data through a current coding sub-network to obtain reverse characteristics; and obtaining the characteristics corresponding to the current input data based on the forward characteristics and the reverse characteristics.
Specifically, the encoding processing of the current input data by the computer device through the current encoding subnetwork may specifically be that the current input data is subjected to forward encoding processing through the current encoding subnetwork to obtain a forward characteristic, the current input data is subjected to reverse encoding processing through the current encoding subnetwork to obtain a reverse characteristic, and a characteristic corresponding to the current input data is obtained based on the forward characteristic and the reverse characteristic. The obtaining of the feature corresponding to the current input data based on the forward feature and the backward feature may specifically be performing processing such as splicing, weighted summation, and the like on the forward feature and the backward feature.
In one embodiment, if the current input data is a training text, the computer device may perform the forward encoding process and the backward encoding process according to the following formulas:
Figure BDA0002882094670000306
wherein,
Figure BDA0002882094670000307
the result of the first t words of the training text after going through the forward encoding result, i.e. the text vector at the t-th time (the input time of the t-th character forward input). LSTM (Long Short-Term Memory, Long Short-Term Memory neural network) means forward coding processing by an LSTM network.
Figure BDA0002882094670000308
The t-th word representing the training text.
Figure BDA0002882094670000309
The result of the first t-1 words of the training text after the forward encoding result, namely the text vector at the t-1 th moment (the input moment of the forward input of the t-1 th character) is represented.
Figure BDA0002882094670000311
The result of the first t words of the training text after being subjected to the reverse encoding result, that is, the text vector at the t-th moment (the input moment of the t-th character reverse input) is represented. LSTM denotes the inverse encoding process performed over the LSTM network.
Figure BDA0002882094670000312
The t-th word representing the training text.
Figure BDA0002882094670000313
The result of the first t-1 words of the training text after being subjected to the reverse coding result, namely the text vector at the t-1 th moment (the input moment of the t-1 th character reverse input).
Finally, can be
Figure BDA0002882094670000314
And
Figure BDA0002882094670000315
cascade to obtain
Figure BDA0002882094670000316
That is, will
Figure BDA0002882094670000317
And
Figure BDA0002882094670000318
are spliced together to obtain
Figure BDA0002882094670000319
The above formula indicates that the text vector at time t (i.e. the time of inputting the tth character) is obtained by comprehensive calculation based on the vector at the previous time t-1 and the tth word. It is understood that each word of the training text may be input to the LSTM network in a forward direction for forward encoding processing and each word of the training text may be input to the LSTM network in a reverse direction for reverse encoding processing. Similarly, each entity of the training text can be input to the LSTM network in the forward direction according to the sequence appearing in the training text for forward encoding, and each entity of the training text can be input to the LSTM network in the reverse direction according to the sequence appearing in the training text for reverse encoding.
In this embodiment, the coding sub-network performs forward coding processing and backward coding processing on the input data, which can help the data classification model learn the association between each word of the text and enhance the generalization capability of the data classification model.
In one embodiment, the encoding layer comprises at least two encoding subnetworks, the mapping layer comprises at least two mapping subnetworks, the encoding subnetworks and the mapping subnetworks correspond one to one, the initial characteristic comprises an initial sub-characteristic output by each encoding subnetwork, and the determination of the intermediate characteristic comprises the steps of: inputting the initial sub-features output by each coding sub-network into the corresponding mapping sub-network to obtain the intermediate sub-features output by each mapping sub-network; intermediate features are derived based on the respective intermediate sub-features.
In particular, the encoding layer of the data classification model may comprise at least two encoding subnetworks, and the mapping layer may comprise at least two mapping subnetworks, one encoding subnetwork for each mapping subnetwork, i.e. one encoding subnetwork and one mapping subnetwork for each mapping subnetwork. After the training samples are input into the data classification model, each coding sub-network can respectively output corresponding initial sub-features through data processing of each coding sub-network. Because one coding sub-network corresponds to one mapping sub-network, the initial sub-features output by each coding sub-network can be input into the corresponding mapping sub-network, and each mapping sub-network can respectively output corresponding intermediate sub-features through the data processing of each mapping sub-network, so that the intermediate features can be obtained through each intermediate sub-feature.
In the embodiment, the coding sub-networks and the mapping sub-networks are in one-to-one correspondence, and the output data of one coding sub-network is input into the corresponding mapping sub-network, so that the ordering of data processing in the data classification model can be effectively guaranteed.
In one embodiment, training a mapping layer of an updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model includes: inputting a second feature corresponding to the forward trained sample into a mapping layer of the updated data classification model to obtain a corresponding predicted feature; calculating a target training loss value based on the first characteristic and the predicted characteristic corresponding to the same training sample; and adjusting the mapping layer parameters of the updated data classification model based on the target training loss value until the convergence condition is met, so as to obtain an updated intermediate data classification model.
Specifically, in order to enable the updated data classification model to maintain the performance on the trained sample before updating the training sample after training by updating the training sample, the mapping layer of the updated data classification model needs to be trained. Specifically, the second feature may be used as an input of a mapping layer of the updated data classification model, and the first feature may be used as an expected output of the mapping layer of the updated data classification model, so as to train the mapping layer of the updated data classification model, so that the updated data classification model can maintain the performance on a forward trained sample of the updated training sample.
The computer device can input the second feature corresponding to the forward trained sample into the mapping layer of the updated data classification model to obtain the predicted feature output by the mapping layer, calculate a target training loss value based on the first feature and the predicted feature corresponding to the same training sample, perform back propagation based on the target training loss value, adjust the mapping layer parameters of the updated data classification model and continue training until a convergence condition is met, and obtain the data classification model of the trained mapping layer. And taking the data classification model of the trained mapping layer as a new intermediate data classification model to replace the previous intermediate data classification model. The convergence condition may specifically be that the target training loss value is a minimum value, a change rate of the target training loss value is smaller than a preset threshold, the iteration number reaches a preset number, and the like. The target training loss value can be calculated through a cross entropy loss function, an exponential loss function, a hinge loss function and other loss functions.
In one embodiment, as shown in fig. 8, the encoding layer includes at least two encoding subnetworks, the mapping layer includes at least two mapping subnetworks, the encoding subnetworks and the mapping subnetworks correspond to each other one to one, the first feature includes a first sub-feature output by each encoding subnetwork of the intermediate data classification model, the second feature includes a second sub-feature output by each encoding subnetwork of the updated data classification model, and the training of the mapping layer of the updated data classification model is performed based on the first feature and the second feature corresponding to the same training sample, so as to obtain an updated intermediate data classification model, including:
step S802, inputting each second sub-feature corresponding to the forward trained sample into a corresponding mapping sub-network in the updated data classification model to obtain each corresponding prediction sub-feature.
Specifically, the coding layer for updating the data classification model comprises at least two coding sub-networks, and the mapping layer comprises at least two mapping sub-networks, one coding sub-network corresponding to one mapping sub-network. The computer equipment respectively inputs the training samples into the coding layers of the intermediate data classification model and the updated data classification model, each coding sub-network of the intermediate data classification model can output first sub-features corresponding to the training samples, each first sub-feature is combined to obtain a first feature, each coding sub-network of the updated data classification model can output second sub-features corresponding to the training samples, and each second sub-feature is combined to obtain a second feature. Further, the computer device may input each second sub-feature corresponding to the forward trained sample into a corresponding mapping sub-network in the updated data classification model, each mapping sub-network outputting a corresponding predicted sub-feature.
Step S804, the coding sub-network and the mapping sub-network having the correspondence relationship are obtained as the target coding sub-network and the target mapping sub-network.
Step S806, calculating training loss values based on the first sub-feature output by the target coding sub-network and the predicted sub-feature output by the target mapping sub-network corresponding to the same training sample, to obtain training loss values corresponding to each mapping sub-network.
In particular, a first sub-feature corresponds to a predictor sub-feature, since one coding sub-network corresponds to one mapping sub-network. Then, the computer device may calculate training loss values based on the first sub-feature output by the target coding sub-network and the predicted sub-feature output by the target mapping sub-network corresponding to the same training sample, so as to obtain training loss values corresponding to the mapping sub-networks, respectively. The target coding sub-network and the target mapping sub-network refer to a coding sub-network and a mapping sub-network which have a corresponding relationship.
And step S808, adjusting parameters of the corresponding mapping sub-networks in the updated data classification model based on the training loss values until a convergence condition is met, and obtaining an updated intermediate data classification model.
Specifically, the computer device may perform back propagation on each training loss value, adjust parameters of a corresponding mapping sub-network in the updated data classification model, and continue training until a convergence condition is satisfied, so as to obtain a data classification model of a trained mapping layer. And taking the data classification model of the trained mapping layer as a new intermediate data classification model to replace the previous intermediate data classification model. The convergence condition may specifically be that each training loss value is a minimum value, a sum of each training loss value is a minimum value, an average change rate of each training loss value is smaller than a preset threshold, the number of iterations reaches a preset number, and the like. The target training loss value can be calculated through a cross entropy loss function, an exponential loss function, a hinge loss function and other loss functions.
For example, the updated data classification model includes a first coding sub-network and a first mapping sub-network which have a corresponding relationship, and a second coding sub-network and a second mapping sub-network which have a corresponding relationship. The computer device may input the training samples into the coding layers of the intermediate data classification model and the updated data classification model, respectively, the coding sub-network one of the intermediate data classification model outputting the first sub-feature a1, the coding sub-network two of the intermediate data classification model outputting the first sub-feature a2, the coding sub-network one of the updated data classification model outputting the second sub-feature b1, and the coding sub-network two of the updated data classification model outputting the second sub-feature b 2. Furthermore, the computer device may input the second sub-feature b1 into the first mapping sub-net of the updated data classification model, obtain the first predicted sub-feature c1 output by the first mapping sub-net, calculate the training loss value d1 based on the first sub-feature a1 and the first predicted sub-feature c1, and adjust the model parameters of the first mapping sub-net of the updated data classification model based on the training loss value d 1. The computer device inputs the second sub-feature b2 into the second mapping sub-network of the updated data classification model to obtain the first predicted sub-feature c2, calculates a training loss value d2 based on the first sub-feature a2 and the first predicted sub-feature c2, and adjusts the model parameters of the second mapping sub-network of the updated data classification model based on the training loss value d 2. After the model parameters of the mapping sub-network are adjusted, the above steps are repeated to calculate a new training loss value d1 and a new training loss value d2, and when the updated training loss value d1 and the updated training loss value d2 satisfy the convergence condition, the data classification model of the trained mapping layer can be obtained.
In this embodiment, different mapping sub-networks adjust parameters according to different training loss values, which can effectively ensure the accuracy of parameters of each mapping sub-network, thereby improving the accuracy of data classification of the data classification model.
In one embodiment, as shown in fig. 9, a data classification method is provided, which is described by taking the method as an example applied to the computer device in fig. 1, where the computer device may be the terminal 102 or the server 104 in fig. 1. Referring to fig. 9, the data classification method includes the steps of:
step S902, obtaining data to be classified.
Step S904, inputting the data to be classified into a target data classification model to obtain a target classification result corresponding to the data to be classified; the target data classification model is a step of training a coding layer of an initial data classification model based on a current training sample corresponding to a current data classification task in a training sample set to obtain an intermediate data classification model, training the coding layer of the intermediate data classification model based on an updated training sample corresponding to a next data classification task in the training sample set to obtain an updated data classification model, inputting a forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model respectively to obtain a corresponding first characteristic and a second characteristic, training a mapping layer of the updated data classification model based on the first characteristic and the second characteristic corresponding to the same training sample to obtain the updated intermediate data classification model, and returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
Specifically, when the training sample set is a training text set, the target data classification model finally trained based on the training sample set is a target text classification model. At this time, the data to be classified may be a text to be classified, and after the computer device obtains the text to be classified, the text to be classified may be input into the target text classification model, so as to obtain a target classification result output by the target text classification model.
And when the training sample set is the training image set, the target data classification model finally trained on the basis of the training image set is the target image classification model. At this time, the data to be classified may be an image to be classified, and after the computer device acquires the image to be classified, the image to be classified may be input into the target image classification model, so as to obtain a target classification result output by the target image classification model.
The computer equipment can acquire the data to be classified input or browsed on the terminal by a user, and inputs the data to be classified into the trained target data classification model to obtain a target classification result corresponding to the data to be classified. The computer device may present the target classification result to the user, for example, present the target classification result of the medical record to be classified to the user, so that the user can perform the next processing. The computer device can also recommend information to the user according to the target classification result, for example, the output result of the target data classification model determines that the news information currently browsed by the user is entertainment news, so that the computer device can actively recommend other entertainment news to the user, thereby improving the information recommendation efficiency.
It can be understood that, for the specific process of performing model training on the initial data classification model based on the training sample set to obtain the target data classification model, reference may be made to the methods described in the foregoing related embodiments of the data classification model training method, and details are not described here again.
The data classification method comprises the steps of obtaining data to be classified, inputting the data to be classified into a target data classification model to obtain a target classification result corresponding to the data to be classified, training a coding layer of an initial data classification model to obtain an intermediate data classification model based on a current training sample corresponding to a current data classification task in a training sample set, training the coding layer of the intermediate data classification model based on an updating training sample corresponding to a next data classification task in the training sample set to obtain an updating data classification model, inputting a forward trained sample of the updating training sample into the intermediate data classification model and the coding layer of the updating data classification model respectively to obtain a first characteristic and a second characteristic corresponding to the intermediate data classification model and training a mapping layer of the updating data classification model based on the first characteristic and the second characteristic corresponding to the same training sample, and obtaining an updated intermediate data classification model, returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task until the training is finished, and obtaining the updated intermediate data classification model based on the corresponding intermediate data classification model when the training is finished. Therefore, the training of the coding layer aiming at the data classification model can lead the data classification model to learn the knowledge of each data classification task from the training sample corresponding to each data classification task, the training of the mapping layer aiming at the data classification model can lead the data classification model to learn the general knowledge among each data classification task, and through the alternate training of the coding layer and the mapping layer, the data classification accuracy of the data classification model on the forward data classification task can be guaranteed, the data classification accuracy of the data classification model on the new data classification task can also be guaranteed, and a general target data classification model is obtained.
The application also provides an application scenario, and the data classification model training method is applied to the application scenario. Specifically, the data classification model training method is applied to the application scenario as follows:
1. building a set of training texts
In the medical field, different text classification tasks may be divided according to the severity of a disease, thereby establishing a training text set including training texts corresponding to at least two text classification tasks. Because the primary hospital mainly treats common diseases (common diseases), the secondary hospital mainly treats intermediate diseases, and the tertiary hospital mainly treats serious and difficult diseases, a training text set including training texts corresponding to three text classification tasks can be established, wherein the training texts corresponding to the three text classification tasks respectively include medical history texts corresponding to various common diseases, medical history texts corresponding to various intermediate diseases, and medical history texts corresponding to various serious and difficult diseases. It will be appreciated that the division of the text classification task may also be performed in other ways.
2. Training based on training text set to obtain target text classification model
As shown in fig. 10A, the text classification model adopts a dual-channel arrangement of original text and entity of medical record, and uses bidirectional LSTM as an encoder to encode the entity and the text respectively. And the text is encoded through a corresponding encoding sub-network, and the final step is output as a high-dimensional vector expression of the text, namely text characteristics. And the entities carry out coding processing through corresponding coding sub-networks, and the output of each step is used as a high-dimensional vector expression of each entity, namely the entity characteristics. Furthermore, by combining the attention mechanism with the coding results of the two coding sub-networks, weight distribution is carried out on each entity based on the text features so as to strengthen the role of the entity in the final vector expression, and therefore updated entity features are obtained. After that, the encoding results of the two encoding sub-networks are respectively mapped to two high-dimensional representation spaces adapted to all tasks by adopting a continuous learning method, namely a universal text representation space and a universal entity representation space. And finally, classifying the general text representation space and the general entity representation space through a classification layer, and outputting a classification result corresponding to the input text.
The training phase of the text classification model is divided into two steps, wherein the first step is to train the coding layer, and the second step is to train the mapping layer. The division of the training into two steps is mainly to guarantee the model's performance on the current task and all previous tasks at the same time. Specifically, the first step of training for the coding layer mainly allows the model to learn the language features under the specific context from the data of the current task, and the second step of training for the mapping layer mainly allows the vector representation at the previous moment to be taken as a target, so as to obtain the vector representation which is universal for all tasks.
When training is performed for each text classification task, a part of training text of each text classification task can be taken out from the complete training text set as a training set to perform the first training step. In addition, the training texts with extremely small number of each text classification task can be randomly selected from the complete training text set to perform the second training step.
2-1, coding layer training
And training the coding layer of the initial text classification model based on a training text corresponding to the text classification task to obtain an intermediate text classification model. And training the coding layer of the intermediate text classification model based on the training text corresponding to the text classification task II to obtain an updated text classification model.
The specific method for training the coding layer of the text classification model based on the training text can be to input the training text into the text classification model, the text classification model finally outputs the prediction category corresponding to the training text, and the coding layer parameters of the text classification model are adjusted based on the training category and the prediction category corresponding to the training text until the convergence condition is met, so as to obtain the text classification model of the trained coding layer.
2-2, mapping layer training
And respectively inputting the training texts corresponding to the first text classification task into the intermediate text classification model and the updated text classification model to obtain a first characteristic output by a coding layer of the intermediate text classification model and a second characteristic output by a coding layer of the updated text classification model. And inputting the second features corresponding to the same training text into the mapping layer of the updated text training model to obtain corresponding prediction features, and adjusting the mapping layer parameters of the updated text training model based on the first features and the prediction features corresponding to the same training text until a convergence condition is met to obtain a text classification model of the trained mapping layer.
The mapping layer training is described by taking a text channel as an example, a training text subset for the mapping layer training for the k-th task can be defined as Rk, and simultaneously, the high-dimensional vector representation of Rk after the k-th task training is finished (namely, at the k-th moment) is defined as Ek ', and similarly, the original text and the corresponding high-dimensional vector representation of the previous k-1 tasks at the current k moment are { R1, R2, R3, …, Rk-1}, { E1 ', E2 ', E3 ', …, and Ek-1 ' }. After the k +1 th task is finished, the { R1, R2, R3, …, Rk } is input into the coding layer of the current text classification model to obtain corresponding high-dimensional vectors { E1, E2, E3, …, Ek }, and then the above sets of high-dimensional vectors constitute training data (X, Y) for the mapping layer of the current text classification task, namely { (E1, E1 '), (E2, E2'), (E3, E3 '), …, (Ek, Ek') }. And in the training data (X, Y), taking X as the input of the mapping layer of the current text classification model, taking Y as the expected output of the mapping layer of the current text classification model, and training the mapping layer of the current text classification model.
It is to be understood that the mapping layer may also be referred to as an alignment layer. The alignment layer is used for converting a text vector and an entity vector code obtained after each text classification task is trained into a vector code which is universal for all tasks, so that the purpose of continuous learning is achieved, and the model is guaranteed not to forget the previous task when a subsequent task is learned.
2-3, coding layer and mapping layer alternating training
And taking the text classification model of the trained mapping layer as a new intermediate data classification model, and training the coding layer of the current latest intermediate data classification model based on the training text corresponding to the text classification task III to obtain a new updated data classification model.
And respectively inputting the training text corresponding to the text classification task I into the current latest intermediate text classification model and the current latest updated text classification model to obtain a first characteristic output by a coding layer of the current latest intermediate text classification model and a second characteristic output by a coding layer of the current latest updated text classification model. And respectively inputting the training texts corresponding to the text classification task II into the current latest intermediate text classification model and the current latest updated text classification model to obtain the first characteristics output by the coding layer of the current latest intermediate text classification model and the second characteristics output by the coding layer of the current latest updated text classification model. And inputting the second features corresponding to the same training text into the mapping layer of the current latest updated text training model to obtain corresponding prediction features, and adjusting the mapping layer parameters of the current latest updated text training model based on the first features and the prediction features corresponding to the same training text until a convergence condition is met to obtain a text classification model of the trained mapping layer.
And when all the text classification tasks in the training text set participate in training, taking a text classification model obtained after the last training of the mapping layer as a target text classification model. Therefore, after the medical record text to be classified is obtained, the medical record text to be classified can be input into the target text classification model, and the specific category corresponding to the medical record text to be classified is obtained.
In this embodiment, the original medical record data can be used to complete the multiplexing and iteration of the medical record text classification model in multiple hospitals with the minimum cost. Firstly, a concept of continuous learning is introduced, training is only needed on a new text classification task when a new hospital or an iterative version is accessed, hospital medical record data does not need to be reserved for a long time, and model training can be even completed under the condition that the data is not discharged. Secondly, a pretreatment is carried out on the medical record text, and a series of medical entities are extracted from the medical record text and used as the input of another channel. The dual-channel design of the model can not only improve the expression of the model, but also further improve the utilization rate of the medical history text. Thirdly, interaction between the medical entity and the medical record text is realized by using an attention mechanism in the encoding process, importance of different entities in the medical record text is redistributed, encoding efficiency can be improved, feature extraction accuracy is improved, and accuracy of medical record text classification is improved.
It can be understood that the data classification model training method can be applied to the medical text classification and the medical text classification model training, and can also be applied to the text classification in other fields such as news information classification and the like to train various text classification models. Furthermore, besides text classification and training of the text classification model, the method can also be applied to image classification and training of the image classification model.
Further, the data classification model in this embodiment may be referred to as Embedding apparent Memory and consistency, E2And (C) MC. Will E2The MC and the target data classification model obtained by other continuous learning methods were tested, and the test results are shown in fig. 10B. EWC (Elastic Weight Consolidation, regularization-based model long-term learning method) in fig. 10B is a scheme that employs constraints on loss setting to avoid catastrophic forgetting. That is, the EWC will pair loss functions when training each taskAnd adding a regular term to limit the parameters in the model to be optimized in the direction compatible with all tasks. The gem (gradient empirical memory) in fig. 10B is then quadratic planned on the gradient of each parameter, and the loss of the current model on the previous task is also recorded in the gradient of this update. On the basis of GEM, researchers propose an age Gradient social memory (agem), and the overall efficiency of GEM is improved in an average mode. In addition, the MBPA + + in fig. 10B is a scheme of introducing a recall mechanism in the prediction stage to achieve the purpose of avoiding forgetting. The remaining two comparison objects in fig. 10B can then serve as the upper and lower limits of the continuous learning domain. In fig. 10B, the abscissa represents the number of data classification tasks involved in model training, and the ordinate represents the data classification accuracy of the model. The data classification accuracy can refer to the average accuracy of the data classification model on all trained tasks, and can be specifically determined through
Figure BDA0002882094670000401
Figure BDA0002882094670000402
Calculated, where K represents the number of data classification tasks involved in model training, accf,kRepresenting the accuracy of the data classification model f on the kth task. As can be seen in FIG. 10B, E2MC is significantly better in performance than other existing solutions, and is not far from the upper limit of the continuous learning domain.
Further, the two-channel scheme of the data classification model is also compared with the single-channel scheme to prove the beneficial effects introduced by the entity knowledge and the attention mechanism. As shown in FIG. 10C, E2MC (no-attention) indicates that no attention mechanism is introduced into the model, E2MC (no-entity) indicates that no physical channel is introduced in the model. As shown in fig. 10C, the two-pass scheme significantly improves the performance of the model. Because the introduction of the entity knowledge improves the mapping capability of the model between the existing task space and the general space to a certain extent, the forgetting phenomenon of the model is further optimized.
It should be understood that although the various steps in the flowcharts of fig. 2-5, 7-10A are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in fig. 2-5 and 7-10A may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or in alternation with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 11, there is provided a data classification model training apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: a training sample set obtaining module 1102, a coding layer training module 1104, a mapping layer training module 1106, and a target data classification model determining module 1108, wherein:
a training sample set obtaining module 1102, configured to obtain a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks;
the coding layer training module 1104 is used for acquiring a current training sample corresponding to the current data classification task, and training a coding layer of the initial data classification model based on the current training sample to obtain an intermediate data classification model; acquiring an updated training sample corresponding to the next data classification task, and training a coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model;
a mapping layer training module 1106, configured to input the forward trained sample of the updated training sample into the intermediate data classification model and the coding layer of the updated data classification model, respectively, so as to obtain a corresponding first feature and a corresponding second feature; training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model;
and the target data classification model determining module 1108 is configured to return to the step of obtaining the updated training sample corresponding to the next data classification task until the training is completed, and obtain the target data classification model based on the corresponding intermediate data classification model when the training is completed.
In one embodiment, the training sample set obtaining module is further configured to obtain a plurality of candidate samples; the candidate sample carries a candidate tag; clustering all candidate samples corresponding to the same candidate label to obtain an initial cluster corresponding to each candidate label; determining a processing priority corresponding to each candidate sample based on the candidate label; clustering the initial clustering cluster serving as a data classification subtask to obtain a target data classification task set corresponding to each processing priority; the target data classification task set comprises training samples corresponding to the same data classification task; and obtaining a training sample set based on each target data classification task set.
In one embodiment, the target data classification model is used for determining a target label corresponding to data to be classified from candidate labels corresponding to the trained samples.
In one embodiment, the training samples in the training sample set carry training labels, the data classification model to be trained is an initial data classification model or an intermediate data classification model, and the coding layer training module is further configured to perform data processing on the input training samples through the data classification model to be trained to obtain prediction labels corresponding to the input training samples; and adjusting the coding layer parameters of the data classification model to be trained based on the label difference between the training label corresponding to the input training sample and the prediction label until the convergence condition is met.
In one embodiment, the current data classification model is any one of an initial data classification model, an intermediate data classification model, an updated data classification model and a target data classification model, an encoding layer of the current data classification model is used for encoding an input training sample to obtain initial features, and a mapping layer of the current data classification model is used for mapping the initial features to obtain intermediate features.
In one embodiment, the input training samples include text data and entity data, the coding layer includes a coding sub-network corresponding to the text data and a coding sub-network corresponding to the entity data, as shown in fig. 12, the apparatus further includes:
the input training sample processing module 1110 is configured to perform coding processing on the text data through a coding subnetwork corresponding to the text data to obtain text features; coding the entity data through a coding sub-network corresponding to the entity data to obtain entity characteristics; initial features are derived based on the entity features and the text features.
In one embodiment, the current data classification model includes an entity extraction layer, the coding layer includes a coding sub-network corresponding to the entity extraction layer and a coding sub-network corresponding to the input training sample, and the input training sample processing module is further configured to perform entity extraction on the input training sample through the entity extraction layer to obtain entity data corresponding to the input training sample; coding the entity data through a coding sub-network corresponding to the entity extraction layer to obtain entity characteristics; coding the input training sample through a coding sub-network corresponding to the input training sample to obtain text characteristics; initial features are derived based on the entity features and the text features.
In one embodiment, the entity data includes at least one entity, the entity features include entity sub-features corresponding to the respective entities, and the input training sample processing module is further configured to perform attention allocation on the entity sub-features corresponding to the respective entities based on the text features, so as to obtain attention information corresponding to the respective entities; normalizing the attention information corresponding to each entity to obtain an attention factor corresponding to each entity; obtaining updated entity characteristics based on the entity sub-characteristics and the attention factors corresponding to the entities; an initial feature is derived based on the textual feature and the updated entity feature.
In one embodiment, the input training sample processing module is further configured to perform forward coding processing on current input data through a current coding subnetwork to obtain a forward feature; performing reverse coding processing on current input data through a current coding sub-network to obtain reverse characteristics; and obtaining the characteristics corresponding to the current input data based on the forward characteristics and the reverse characteristics.
In one embodiment, the encoding layer includes at least two encoding sub-networks, the mapping layer includes at least two mapping sub-networks, the encoding sub-networks and the mapping sub-networks are in one-to-one correspondence, the initial feature includes an initial sub-feature output by each encoding sub-network, and the input training sample processing module is further configured to input the initial sub-feature output by each encoding sub-network to the corresponding mapping sub-network, so as to obtain a middle sub-feature output by each mapping sub-network; intermediate features are derived based on the respective intermediate sub-features.
In one embodiment, the mapping layer training module is further configured to input a second feature corresponding to the forward trained sample into the mapping layer of the updated data classification model to obtain a corresponding predicted feature; calculating a target training loss value based on the first characteristic and the predicted characteristic corresponding to the same training sample; and adjusting the mapping layer parameters of the updated data classification model based on the target training loss value until the convergence condition is met, so as to obtain an updated intermediate data classification model.
In one embodiment, the coding layer includes at least two coding sub-networks, the mapping layer includes at least two mapping sub-networks, the coding sub-networks and the mapping sub-networks are in one-to-one correspondence, the first feature includes a first sub-feature output by each coding sub-network of the intermediate data classification model, the second feature includes a second sub-feature output by each coding sub-network of the updated data classification model, and the mapping layer training module is further configured to input each second sub-feature corresponding to the forward trained sample into a corresponding mapping sub-network in the updated data classification model, so as to obtain a corresponding each predicted sub-feature; acquiring a coding sub-network and a mapping sub-network which have a corresponding relation as a target coding sub-network and a target mapping sub-network; calculating training loss values based on the first sub-feature output by the target coding sub-network and the predicted sub-feature output by the target mapping sub-network corresponding to the same training sample to obtain training loss values corresponding to each mapping sub-network; and adjusting parameters of the corresponding mapping sub-networks in the updated data classification model based on each training loss value until a convergence condition is met, so as to obtain an updated intermediate data classification model.
In one embodiment, as shown in fig. 13, there is provided a data classification apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two, and specifically includes: a data acquisition module 1302 and a classification result determination module 1304, wherein:
a data obtaining module 1302, configured to obtain data to be classified;
a classification result determining module 1304, configured to input the data to be classified into the target data classification model, so as to obtain a target classification result corresponding to the data to be classified;
the target data classification model is a step of training a coding layer of an initial data classification model based on a current training sample corresponding to a current data classification task in a training sample set to obtain an intermediate data classification model, training the coding layer of the intermediate data classification model based on an updated training sample corresponding to a next data classification task in the training sample set to obtain an updated data classification model, inputting a forward trained sample of the updated training sample into the coding layers of the intermediate data classification model and the updated data classification model respectively to obtain a corresponding first characteristic and a second characteristic, training a mapping layer of the updated data classification model based on the first characteristic and the second characteristic corresponding to the same training sample to obtain the updated intermediate data classification model, and returning to the step of training the coding layer of the intermediate data classification model based on the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
For specific limitations of the data classification model training device and the data classification device, reference may be made to the above limitations of the data classification model training method and the data classification method, which are not described herein again. The modules in the data classification model training device and the data classification device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 14. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data such as training sample sets, target data classification models and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data classification model training method and a data classification method.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 15. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a data classification model training method and a data classification method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the configurations shown in fig. 14 and 15 are block diagrams of only some of the configurations relevant to the present application, and do not constitute a limitation on the computing devices to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps of the above-described method embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (15)

1. A method for training a data classification model, the method comprising:
acquiring a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks;
acquiring a current training sample corresponding to a current data classification task, and training a coding layer of an initial data classification model based on the current training sample to obtain an intermediate data classification model;
acquiring an updated training sample corresponding to the next data classification task, and training a coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model;
respectively inputting the forward trained samples of the updated training samples into an intermediate data classification model and a coding layer of the updated data classification model to obtain corresponding first features and second features;
training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model;
and returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining a target data classification model based on the corresponding intermediate data classification model when the training is finished.
2. The method of claim 1, wherein obtaining the set of training samples comprises:
obtaining a plurality of candidate samples; the candidate sample carries a candidate tag;
clustering all candidate samples corresponding to the same candidate label to obtain an initial cluster corresponding to each candidate label;
determining a processing priority corresponding to each candidate sample based on the candidate label;
clustering the initial clustering cluster serving as a data classification subtask to obtain a target data classification task set corresponding to each processing priority; the target data classification task set comprises training samples corresponding to the same data classification task;
and obtaining the training sample set based on each target data classification task set.
3. The method according to claim 1, wherein the training samples in the training sample set carry training labels, the data classification model to be trained is an initial data classification model or an intermediate data classification model, and training the coding layer of the data classification model to be trained comprises the following steps:
performing data processing on an input training sample through the to-be-trained data classification model to obtain a prediction label corresponding to the input training sample;
and adjusting the coding layer parameters of the data classification model to be trained based on the label difference between the training label corresponding to the input training sample and the prediction label until a convergence condition is met.
4. The method according to claim 1, wherein the current data classification model is any one of an initial data classification model, an intermediate data classification model, an updated data classification model and a target data classification model, an encoding layer of the current data classification model is used for encoding an input training sample to obtain initial features, and a mapping layer of the current data classification model is used for mapping the initial features to obtain intermediate features.
5. The method of claim 4, wherein the input training samples comprise text data and entity data, wherein the encoding layer comprises a sub-encoding network corresponding to the text data and a sub-encoding network corresponding to the entity data, and wherein the determining of the initial features comprises:
coding the text data through a coding sub-network corresponding to the text data to obtain text characteristics;
coding the entity data through a coding sub-network corresponding to the entity data to obtain entity characteristics;
and obtaining the initial characteristic based on the entity characteristic and the text characteristic.
6. The method of claim 4, wherein the current data classification model comprises an entity extraction layer, wherein the coding layer comprises a coding sub-network corresponding to the entity extraction layer and a coding sub-network corresponding to the input training sample, and wherein the determining of the initial features comprises the following steps:
performing entity extraction on the input training sample through the entity extraction layer to obtain entity data corresponding to the input training sample;
coding the entity data through a coding sub-network corresponding to the entity extraction layer to obtain entity characteristics;
coding the input training sample through a coding sub-network corresponding to the input training sample to obtain text characteristics;
and obtaining the initial characteristic based on the entity characteristic and the text characteristic.
7. The method of claim 5 or 6, wherein the entity data comprises at least one entity, the entity features comprise entity sub-features corresponding to respective entities, and the deriving the initial features based on the entity features and the text features comprises:
respectively carrying out attention distribution on entity sub-features corresponding to the entities on the basis of the text features to obtain attention information corresponding to the entities;
normalizing the attention information corresponding to each entity to obtain an attention factor corresponding to each entity;
obtaining updated entity characteristics based on the entity sub-characteristics and the attention factors corresponding to the entities;
and obtaining the initial characteristic based on the text characteristic and the updated entity characteristic.
8. The method of claim 5 or 6, wherein encoding the current input data through the current encoding subnetwork to obtain the corresponding feature of the current input data comprises:
forward coding processing is carried out on the current input data through the current coding sub-network to obtain forward characteristics;
performing reverse coding processing on the current input data through the current coding sub-network to obtain reverse characteristics;
and obtaining the feature corresponding to the current input data based on the forward feature and the backward feature.
9. The method according to claim 4, wherein the coding layer comprises at least two coding subnetworks, the mapping layer comprises at least two mapping subnetworks, the coding subnetworks and the mapping subnetworks correspond one to one, the initial characteristic comprises an initial sub-characteristic output by each coding subnetwork, and the determining of the intermediate characteristic comprises the steps of:
inputting the initial sub-features output by each coding sub-network into the corresponding mapping sub-network to obtain the intermediate sub-features output by each mapping sub-network;
the intermediate features are derived based on the respective intermediate sub-features.
10. The method of claim 1, wherein the coding layer comprises at least two coding subnetworks, the mapping layer comprises at least two mapping subnetworks, the coding subnetworks and the mapping subnetworks are in one-to-one correspondence, the first feature comprises a first sub-feature output by each coding subnetwork of the intermediate data classification model, the second feature comprises a second sub-feature output by each coding subnetwork of the updated data classification model, and the training of the mapping layer of the updated data classification model based on the first feature and the second feature corresponding to the same training sample results in the updated intermediate data classification model, comprising:
inputting each second sub-feature corresponding to the forward trained sample into a corresponding mapping sub-network in the updated data classification model to obtain each corresponding predicted sub-feature;
acquiring a coding sub-network and a mapping sub-network which have a corresponding relation as a target coding sub-network and a target mapping sub-network;
calculating training loss values based on the first sub-feature output by the target coding sub-network and the predicted sub-feature output by the target mapping sub-network corresponding to the same training sample to obtain training loss values corresponding to each mapping sub-network;
and adjusting parameters of the corresponding mapping sub-networks in the updated data classification model based on each training loss value until a convergence condition is met, so as to obtain an updated intermediate data classification model.
11. A method of data classification, the method comprising:
acquiring data to be classified;
inputting the data to be classified into a target data classification model to obtain a target classification result corresponding to the data to be classified;
the target data classification model is based on a current training sample corresponding to a current data classification task in a training sample set, a coding layer of an initial data classification model is trained to obtain an intermediate data classification model, an updated training sample corresponding to a next data classification task in the training sample set is trained on the coding layer of the intermediate data classification model to obtain an updated data classification model, the forward trained sample of the updated training sample is respectively input into the intermediate data classification model and the coding layer of the updated data classification model to obtain a corresponding first characteristic and a second characteristic, a mapping layer of the updated data classification model is trained on the basis of the first characteristic and the second characteristic corresponding to the same training sample to obtain an updated intermediate data classification model, and the coding layer of the intermediate data classification model is trained on the basis of the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
12. An apparatus for training a data classification model, the apparatus comprising:
the training sample set acquisition module is used for acquiring a training sample set; the training sample set comprises training samples corresponding to at least two data classification tasks;
the coding layer training module is used for acquiring a current training sample corresponding to a current data classification task, and training a coding layer of an initial data classification model based on the current training sample to obtain an intermediate data classification model; acquiring an updated training sample corresponding to the next data classification task, and training a coding layer of the intermediate data classification model based on the updated training sample to obtain an updated data classification model;
the mapping layer training module is used for respectively inputting the forward trained samples of the updated training samples into the middle data classification model and the coding layer of the updated data classification model to obtain corresponding first features and second features; training a mapping layer of the updated data classification model based on a first feature and a second feature corresponding to the same training sample to obtain an updated intermediate data classification model;
and the target data classification model determining module is used for returning to the step of obtaining the updated training sample corresponding to the next data classification task until the training is finished, and obtaining the target data classification model based on the corresponding intermediate data classification model when the training is finished.
13. An apparatus for classifying data, the apparatus comprising:
the data acquisition module is used for acquiring data to be classified;
the classification result determining module is used for inputting the data to be classified into a target data classification model to obtain a target classification result corresponding to the data to be classified;
the target data classification model is based on a current training sample corresponding to a current data classification task in a training sample set, a coding layer of an initial data classification model is trained to obtain an intermediate data classification model, an updated training sample corresponding to a next data classification task in the training sample set is trained on the coding layer of the intermediate data classification model to obtain an updated data classification model, the forward trained sample of the updated training sample is respectively input into the intermediate data classification model and the coding layer of the updated data classification model to obtain a corresponding first characteristic and a second characteristic, a mapping layer of the updated data classification model is trained on the basis of the first characteristic and the second characteristic corresponding to the same training sample to obtain an updated intermediate data classification model, and the coding layer of the intermediate data classification model is trained on the basis of the updated training sample corresponding to the next data classification task, and obtaining the intermediate data classification model based on the corresponding intermediate data classification model when the training is finished.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 11 when executing the computer program.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.
CN202110003555.1A 2021-01-04 2021-01-04 Data classification model training method, data classification method and device Pending CN114764865A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110003555.1A CN114764865A (en) 2021-01-04 2021-01-04 Data classification model training method, data classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110003555.1A CN114764865A (en) 2021-01-04 2021-01-04 Data classification model training method, data classification method and device

Publications (1)

Publication Number Publication Date
CN114764865A true CN114764865A (en) 2022-07-19

Family

ID=82363522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110003555.1A Pending CN114764865A (en) 2021-01-04 2021-01-04 Data classification model training method, data classification method and device

Country Status (1)

Country Link
CN (1) CN114764865A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309906A (en) * 2022-09-19 2022-11-08 北京三维天地科技股份有限公司 Intelligent data classification technology based on knowledge graph technology
CN117036869A (en) * 2023-10-08 2023-11-10 之江实验室 Model training method and device based on diversity and random strategy
WO2024179177A1 (en) * 2023-03-02 2024-09-06 腾讯科技(深圳)有限公司 Method and apparatus for detecting universality of continual learning model, and electronic device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309906A (en) * 2022-09-19 2022-11-08 北京三维天地科技股份有限公司 Intelligent data classification technology based on knowledge graph technology
CN115309906B (en) * 2022-09-19 2023-06-13 北京三维天地科技股份有限公司 Intelligent data classification method based on knowledge graph technology
WO2024179177A1 (en) * 2023-03-02 2024-09-06 腾讯科技(深圳)有限公司 Method and apparatus for detecting universality of continual learning model, and electronic device
CN117036869A (en) * 2023-10-08 2023-11-10 之江实验室 Model training method and device based on diversity and random strategy
CN117036869B (en) * 2023-10-08 2024-01-09 之江实验室 Model training method and device based on diversity and random strategy

Similar Documents

Publication Publication Date Title
CN112949786B (en) Data classification identification method, device, equipment and readable storage medium
WO2021164772A1 (en) Method for training cross-modal retrieval model, cross-modal retrieval method, and related device
CN110737801B (en) Content classification method, apparatus, computer device, and storage medium
Branson et al. The ignorant led by the blind: A hybrid human–machine vision system for fine-grained categorization
JP2022501740A (en) Point cloud segmentation methods, computer programs and computer equipment
CN112966127A (en) Cross-modal retrieval method based on multilayer semantic alignment
CN114764865A (en) Data classification model training method, data classification method and device
CN112131883B (en) Language model training method, device, computer equipment and storage medium
CN111242948B (en) Image processing method, image processing device, model training method, model training device, image processing equipment and storage medium
CN113821670B (en) Image retrieval method, device, equipment and computer readable storage medium
CN116129141B (en) Medical data processing method, apparatus, device, medium and computer program product
CN114298122B (en) Data classification method, apparatus, device, storage medium and computer program product
CN113609233B (en) Entity object coding method and device, electronic equipment and storage medium
CN114358109A (en) Feature extraction model training method, feature extraction model training device, sample retrieval method, sample retrieval device and computer equipment
CN117494051A (en) Classification processing method, model training method and related device
CN113408721A (en) Neural network structure searching method, apparatus, computer device and storage medium
Marchellus et al. Deep learning for 3d human motion prediction: State-of-the-art and future trends
CN116958624A (en) Method, device, equipment, medium and program product for identifying appointed material
Yao et al. Semantic segmentation based on stacked discriminative autoencoders and context-constrained weakly supervised learning
CN114329065A (en) Processing method of video label prediction model, video label prediction method and device
Nurhasanah et al. Fine-grained object recognition using a combination model of navigator–teacher–scrutinizer and spinal networks
CN111582404A (en) Content classification method and device and readable storage medium
Apicella et al. Sparse dictionaries for the explanation of classification systems
Xian Learning from Limited Labeled Data-Zero-Shot and Few-Shot Learning
CN117556275B (en) Correlation model data processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40072967

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination