CN112926737A

CN112926737A - Model training method, data processing method and device and electronic equipment

Info

Publication number: CN112926737A
Application number: CN202110227463.1A
Authority: CN
Inventors: 张发恩; 刘雨微
Original assignee: Innovation Wisdom Shanghai Technology Co ltd
Current assignee: Innovation Wisdom Shanghai Technology Co ltd; AInnovation Shanghai Technology Co Ltd
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2021-06-08

Abstract

The application relates to a model training method, a data processing device and electronic equipment, and belongs to the technical field of computers. The training method comprises the following steps: acquiring a data sample set, and dividing the data sample set into a training set and a testing set according to a preset proportion, wherein the data sample set comprises: multiple pieces of positive comment texts and multiple pieces of negative comment texts related to the specified target in the webpage; and performing iterative training on the BERT model by using the training set and the testing set to obtain a trained emotion classification model, wherein after an input layer of the BERT model passes through an embedding layer, average pooling and maximum pooling are respectively performed along the length direction of a sentence sequence, and the average pooling and the maximum pooling are cascaded. By improving the BERT model, the average response and the maximum response of each sentence are considered, so that the trained word vector not only contains semantic information, but also takes the emotion polarity expression of the whole sentence into consideration, and the emotion classification prediction accuracy is improved.

Description

Model training method, data processing method and device and electronic equipment

Technical Field

The application belongs to the technical field of computers, and particularly relates to a model training method, a data processing device and electronic equipment.

Background

With the development of the internet and the popularization of electronic commerce, online shopping becomes a popular shopping mode, shopping comments on e-commerce websites are gradually increased, and the comment information expresses the subjective feeling of consumers on purchased commodities, has great reference value for the consumers to select the commodities meeting the minds, and is also an important basis for merchants to improve marketing strategies. With the massive accumulation of the comment information of the e-commerce platform, consumers are more and more concerned about the commodity subject information of interest, such as the quality, packaging or delivery speed of commodities. Therefore, it is a popular topic to obtain the emotional information of the commodity theme from the e-commerce comment.

Currently, there are three main types of emotion analysis methods: one is an analysis method based on an emotion dictionary, and the method mainly depends on the emotion dictionary and carries out emotion classification through a manually designed rule; the second type is a traditional analysis method based on machine learning, which needs to dig out the characteristics of words and phrases and then uses classification algorithms such as support vector machine, naive Bayes, random forests and the like to judge the emotion tendentiousness of texts; the last category is an analysis method based on deep learning, namely, different neural network models are used, texts are hidden to a vector space to obtain numerical representation of words, and then vectors are input into a classifier.

The analysis method based on the emotion dictionary cannot process the mass text corpora at present, not only is time and labor consumed, but also the accuracy is extremely low. The traditional machine learning analysis method has serious dependence on the feature extraction of the text, and human factor interference exists in the whole process, so that the robustness resistance is poor. However, in the current general deep neural network, sentences in a text are regarded as a set of emotional words or phrases, the mutual combination relationship between each word and each phrase is not considered, the trained word vectors only contain semantic information but ignore expression of emotional polarity, and the accuracy rate of the emotion analysis task is not high.

Disclosure of Invention

In view of this, an object of the present application is to provide a model training method, a data processing method, an apparatus and an electronic device, so as to solve the problem of inaccurate classification existing in the conventional emotion analysis method.

The embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a model training method, including: acquiring a data sample set, and dividing the data sample set into a training set and a testing set according to a preset proportion, wherein the data sample set comprises: multiple pieces of positive comment texts and multiple pieces of negative comment texts related to the specified target in the webpage; and performing iterative training on the BERT model by using the training set and the testing set to obtain a trained emotion classification model, wherein after an input layer of the BERT model passes through an embedding layer, average pooling and maximum pooling are respectively performed along the length direction of a sentence sequence, the average pooling and the maximum pooling are cascaded, and the data dimension is changed into [ batch _ size, embedding _ dimension ] 2 from [ batch _ size, sequence _ length and embedding _ dimension ] after pooling operation. In the embodiment of the application, the BERT model is improved, so that after an input layer of the BERT model passes through an embedded layer, average pooling and maximum pooling are respectively carried out along the length direction of a sentence sequence, the input layer and the maximum pooling are cascaded, the data dimension is changed into [ batch _ size, embedding _ dimension ] after pooling operation from [ batch _ size, sequence _ length, embedding _ dimension ] and the average response and the maximum response of each sentence are considered, so that a trained word vector not only contains semantic information, but also considers the emotional polarity expression of the whole sentence, and the prediction accuracy is improved.

With reference to a possible implementation manner of the embodiment of the first aspect, the iteratively training the BERT model by using the training set and the test set includes: when training of the BERT model is completed by using the training set each time, performing qualification test on the model after current iterative training by using the test set; if the model evaluation index of the model after the current iterative training is smaller than the model evaluation index of the model after the previous iterative training, reducing the learning rate of the BERT model and continuing the iterative training; and if the model evaluation indexes of the preset times are smaller than the model evaluation indexes of the model after the previous iterative training, terminating the training without reaching the preset iterative times. In the embodiment of the application, when the model is trained, the model is trained in a dynamic learning rate and early termination mode, so that the convergence speed of the module can be accelerated on the premise of ensuring the prediction precision.

With reference to a possible implementation manner of the embodiment of the first aspect, the iteratively training the BERT model by using the training set and the test set includes: calculating F1 scores of the test set relative to each threshold value in a preset plurality of threshold values when the current iteration-trained BERT model is subjected to qualification test by using the test set each time; and when the iteration is finished, selecting a threshold corresponding to the maximum F1 score as a final prediction threshold of the model. In the embodiment of the application, the optimal threshold is selected by adopting a dynamic threshold strategy so as to improve the prediction accuracy of the model.

With reference to a possible implementation manner of the embodiment of the first aspect, the iteratively training the BERT model by using the training set and the test set includes: and performing iterative training on the BERT model by utilizing the training set and the testing set based on a weight attenuation mechanism and a dropout mechanism. In the embodiment of the application, when the model is trained, the problem of overfitting of the model is reduced by adding a weight decay (weight decay) mechanism and a dropout mechanism, so that the accuracy of the model is improved.

With reference to one possible implementation manner of the embodiment of the first aspect, the acquiring a data sample set includes: acquiring a plurality of pieces of comment text data related to a specified target in a webpage; and removing the duplicate of the acquired document data, and labeling each piece of document data after the duplicate removal to obtain the data sample set, wherein the label is used for labeling the text data as a positive comment text or a negative comment sample. In the embodiment of the application, the samples are subjected to de-duplication, so that the model can learn more characteristics of different samples, and the generalization capability of the model can be improved.

In a second aspect, an embodiment of the present application further provides a data processing method, where the method includes: obtaining comment data in a webpage; and performing emotion classification on the comment data by using the emotion classification model trained by the model training method provided in the embodiment of the first aspect and/or in combination with any one of the possible implementation manners of the embodiment of the first aspect to obtain a classification result.

In a third aspect, an embodiment of the present application further provides a model training apparatus, including: the device comprises an acquisition module and a processing module; the acquisition module is used for acquiring a data sample set and dividing the data sample set into a training set and a testing set according to a preset proportion, wherein the data sample set comprises: multiple pieces of positive comment texts and multiple pieces of negative comment texts related to the specified target in the webpage; and the processing module is used for carrying out iterative training on the BERT model by utilizing the training set and the testing set to obtain a trained emotion classification model, wherein after an input layer of the BERT model passes through an embedding layer, average pooling and maximum pooling are respectively carried out along the length direction of a sentence sequence, the input layer and the input layer are cascaded, and data dimensionality is changed into [ batch _ size, embedding _ size 2] from [ batch _ size, sequence _ length and embedding _ dimension ] after pooling operation.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including: a memory and a processor, the processor coupled to the memory; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory to perform the method according to the first aspect and/or any possible implementation manner of the first aspect, or to perform the method according to the second aspect.

In a fifth aspect, an embodiment of the present application further provides a BERT model, including: an input layer, an embedding layer; after the data of the input layer passes through the embedding layer, average pooling and maximum pooling are respectively carried out along the sequence length direction, the data and the sequence length are cascaded, and the data dimension is changed into [ batch _ size, embedding _ dimension ] 2 from [ batch _ size, sequence _ length and embedding _ dimension ] after the pooling operation.

In a sixth aspect, embodiments of the present application further provide a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method provided in the foregoing first aspect and/or any one of the possible implementation manners of the first aspect, or to perform the method provided in the foregoing second aspect.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not intended to be to scale as practical, emphasis instead being placed upon illustrating the subject matter of the present application.

Fig. 1 shows a schematic flowchart of a model training method provided in an embodiment of the present application.

Fig. 2 shows a schematic diagram of a principle of improving a BERT module according to an embodiment of the present application.

Fig. 3 shows a flowchart of a data processing method provided in an embodiment of the present application.

Fig. 4 shows a block diagram of a model training apparatus according to an embodiment of the present application.

Fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely in the description herein to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Further, the term "and/or" in the present application is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone.

In view of the defects of the existing emotion analysis method, the embodiment of the application provides a data processing method, by selecting a BERT (bidirectional Encoder retrieval from transform) model and improving the BERT model, after an input layer of the improved BERT model passes through an embedding layer (embedding), average pooling (mean pooling) and maximum pooling (max pooling) are performed along the sentence sequence length (sequence length) direction respectively, and the two are cascaded, after the data dimension is subjected to pooling (Pooling) operation by [ batch _ size, sequence _ length, embedding _ dimension ], becomes [ batch _ size, embedding _ dimension ] 2], the model takes the mutual combination relationship among the words in each sentence into account, the trained word vector not only contains semantic information, and moreover, the emotion polarity expression of the whole sentence is also considered, so that the problem of low classification accuracy in the conventional emotion analysis method is solved. Wherein, the batch _ size represents the batch size, that is, the number of words input in a single time; sequence _ length represents the sentence sequence length, i.e., the length of the input word, and embedding _ dimension represents the embedding dimension.

For ease of understanding, the model training method provided in the embodiments of the present application will be described below, as shown in fig. 1. The model training method comprises the following steps:

step S101: and acquiring a data sample set, and dividing the data sample set into a training set and a testing set according to a preset proportion.

When the model needs to be trained, a data sample set is obtained and is divided into a training set and a testing set according to a preset proportion (for example, 7: 3). Wherein the set of data samples comprises: a plurality of positive comment texts and a plurality of negative comment texts related to the specified target in the webpage.

The acquired data sample set may be prepared in advance, for example, stored in a database or a disk, and may be directly acquired when needed. Of course, it may also be acquired in real time.

In one embodiment, the process of obtaining the data sample set may be: the method comprises the steps of obtaining a plurality of pieces of comment text data related to a specified target in a webpage, removing duplication of the obtained document data, and labeling each piece of text data after duplication removal to obtain a data sample set, wherein labels are used for labeling the text data as positive comment texts or negative comment samples. For example, hotel comment text data is used as corpus, a plurality of pieces of comment text data are crawled by using a crawler technology, for example, ten thousand pieces of comment data are crawled, each of 5000 positive comments and 5000 negative comments are cleared and are removed, and each piece of text data after the removal of the weight is labeled, for example, the label representing the positive comment data is 1, and the label representing the negative comment data is 0.

It should be noted that, the above-mentioned only takes the designated target as wine as an example, and it may be various targets set by the user according to the needs, such as clothes, shoes, home appliances, and the like, and therefore, the wine in the above-mentioned example is not to be construed as a limitation to the designated target.

Step S102: and performing iterative training on the BERT model by using the training set and the testing set to obtain a trained emotion classification model.

After the data sample set is divided into a training set and a testing set according to a preset proportion, the training set and the testing set can be used for carrying out iterative training on the BERT model, and a trained emotion classification model can be obtained. And during training, training the BERT model by using the training set, performing qualified test on the trained model by using the test set, and if the test is not suitable, adjusting the model parameters and continuing training by using the training set until the model test is qualified.

The trained BERT model is an improved model, the initial BERT model has too many parameters and is too complex, and the structure of the BERT model needs to be improved in order to solve the problem of overfitting. In the embodiment of the application, a BERT model comprising 3 transform blocks is adopted to play a role in reducing parameters, and the BERT model comprising 3 transform blocks is improved. As shown in fig. 2, the modified BERT model is that average pooling (mean pooling) and maximum pooling (max pooling) are respectively performed along the length (sequence length) of the sentence sequence, and the data dimension is cascaded from [ batch _ size, sequence _ length, embedding _ dimension ] to [ batch _ size, embedding _ dimension ] 2 after pooling. Then it is also mapped to a value to reactivate: y ═ sigmoid (Linear (mean _ max _ polar)) Y ∈ (0, 1).

The essence of the operation of improving the BERT model can be understood as finding the average response and the maximum response of each sentence, and then identifying the responses by linear mapping, so as to obtain the inference result of the model, thereby enabling the trained emotion classification model to have higher accuracy.

Optionally, when the BERT model is iteratively trained by using the training set and the test set, a weight decay (weight decay) mechanism and a dropout mechanism may be further added to improve accuracy of the model, that is, when training, the BERT model is iteratively trained by using the training set and the test set based on the weight decay mechanism and the dropout mechanism. The weight attenuation mechanism, namely the L2 regularization is added, so that the parameter value is prevented from becoming too large or too small, and the problem of model overfitting can be reduced to a certain extent. Meanwhile, a dropout mechanism can be added, and dropout is set to be 0.4, so that model overfitting is reduced. The principles of the weight decay mechanism and the dropout mechanism are well known to those skilled in the art and will not be described herein.

Optionally, when iteratively training the BERT model by using the training set and the testing set, the iteration number (epoch) may be set to 100, and the batch _ size may be set to 24, and the training is performed by using a dynamic learning rate and an early termination, which may be performed as follows: when training of the BERT model is completed by using the training set each time, performing qualification test on the model after current iterative training by using the test set; if the model evaluation index (such as AUC (area Under cut)) of the model after the current iterative training is smaller than the model evaluation index of the model after the previous iterative training, reducing the learning rate of the BERT model and continuing the iterative training; and if the model evaluation indexes of the preset times are smaller than the model evaluation indexes of the model after the previous iterative training, terminating the training without reaching the preset iterative times. For example, after the current epoch training is completed, the current training result is measured by using the test set, the AUC of the current epoch is recorded, if the current AUC does not increase compared with the AUC of the last epoch, the learning rate is decreased, for example, after the current learning rate is decreased 1/5, the iterative training is continued, if the current AUC does not increase compared with the AUC of the last epoch after multiple iterations occur continuously, for example, the AUC of 10 epoch test sets does not increase, the training is terminated early. The model evaluation index can be measured by AUC, which is the area enclosed by the roc (receiver Operating characteristics) curve and the coordinate axis.

The output value of the model is between 0 and 1, normally, 0.5 is selected as a boundary value, if the output value is a positive sample above 0.5, the value is a negative sample below 0.5, but normally 0.5 is not the optimal classification boundary, therefore, in order to improve the prediction accuracy of the model, in the embodiment of the application, a dynamic threshold strategy is adopted to select the optimal threshold value so as to improve the prediction accuracy of the model. The process of iteratively training the BERT model using the training set and the test set may be: when the test set is used for carrying out qualification test on the BERT model after current iterative training each time, F1 scores (F1 score) of the test set relative to each threshold in a plurality of preset thresholds are calculated, and when iteration is finished, the threshold corresponding to the maximum F1 score is selected as a final prediction threshold of the model. For example, 99 thresholds are defined from 0.01 to 0.99, positive samples are calculated above the threshold, negative samples are calculated below the threshold, then F1 scores of the test set relative to each of the preset 99 thresholds are calculated each time the current iteration trained BERT model is qualified using the test set, and at the end of the iteration, the threshold that can make F1 score the highest is selected and used as the final prediction threshold of the model.

After the model training is finished, the model is stored for subsequent use, for example, comment data in a webpage are obtained subsequently, and the comment data are subjected to emotion classification by using the emotion classification model trained by the model training method, so that a classification result can be obtained. The data processing method provided by the embodiment of the present application will be described below with reference to fig. 3.

Step S201: and obtaining comment data in the webpage.

Step S202: and carrying out emotion classification on the comment data by using an emotion classification model trained in advance to obtain a classification result.

The emotion classification model trained in advance is the emotion classification model trained by the model training method shown in fig. 1.

The embodiment of the application also provides a BERT model, which is a BERT model comprising 3 transform blocks, and the BERT model comprises an input layer and an embedded layer, wherein the embedded layer is improved, so that after data of the input layer passes through the embedded layer, average pooling (mean pooling) and maximum pooling (max pooling) are respectively performed along a sequence length direction, and the data dimensions are changed into [ batch _ size, embedding _ size ] 2 after being subjected to the pooling operation from [ batch _ size, sequence _ length, embedding _ size ]. Then it is also mapped to a value to reactivate: y ═ sigmoid (Linear (mean _ max _ polar)) Y ∈ (0, 1). The essence of the above-described modified operation of the BERT model is that the average response and the maximum response of each sentence are obtained, and then the responses are identified by linear mapping, so as to obtain the inference result of the model, thus the accuracy of the trained emotion classification model is higher.

The embodiment of the present application further provides a model training apparatus 100, as shown in fig. 4. The model training apparatus 100 includes: an acquisition module 110 and a processing module 120.

An obtaining module 110, configured to obtain a data sample set, and divide the data sample set into a training set and a testing set according to a preset ratio, where the data sample set includes: a plurality of positive comment texts and a plurality of negative comment texts related to the specified target in the webpage.

Optionally, the obtaining module 110 is specifically configured to obtain multiple pieces of comment text data related to a specified target in a webpage; and removing the duplicate of the acquired document data, and labeling each piece of document data after the duplicate removal to obtain the data sample set, wherein the label is used for labeling the text data as a positive comment text or a negative comment sample.

And the processing module 120 is configured to perform iterative training on the BERT model by using the training set and the test set to obtain a trained emotion classification model, where after an input layer of the BERT model is subjected to an embedding layer, average pooling and maximum pooling are performed along the sentence sequence length direction respectively, and the input layer are cascaded, and a data dimension is changed into [ batch _ size, embedding _ dimension ] from [ batch _ size, sequence _ length, and embedding _ dimension ] after pooling operation.

Optionally, the processing module 120 is configured to perform a qualification test on the model after the current iteration training by using the test set each time the training of the BERT model is completed by using the training set; if the model evaluation index of the model after the current iterative training is smaller than the model evaluation index of the model after the previous iterative training, reducing the learning rate of the BERT model and continuing the iterative training; and if the model evaluation indexes of the preset times are smaller than the model evaluation indexes of the model after the previous iterative training, terminating the training without reaching the preset iterative times.

Optionally, the processing module 120 is configured to calculate an F1 score of the test set relative to each threshold in a preset plurality of thresholds each time the test set is used to perform a qualification test on the currently iteratively trained BERT model; and when the iteration is finished, selecting a threshold corresponding to the maximum F1 score as a final prediction threshold of the model.

Optionally, the processing module 120 is configured to perform iterative training on the BERT model by using the training set and the test set based on a weight attenuation mechanism and a dropout mechanism.

The model training apparatus 100 provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments for the parts of the embodiment that are not mentioned in the description of the present application.

As shown in fig. 5, fig. 5 is a block diagram illustrating a structure of an electronic device 200 according to an embodiment of the present disclosure. The electronic device 200 includes: a transceiver 210, a memory 220, a communication bus 230, and a processor 240.

The elements of the transceiver 210, the memory 220, and the processor 240 are electrically connected to each other directly or indirectly to achieve data transmission or interaction. For example, the components may be electrically coupled to each other via one or more communication buses 230 or signal lines. The transceiver 210 is used for transceiving data. The memory 220 is used for storing a computer program, such as the software functional module shown in fig. 4, i.e., the model training apparatus 100. The model training apparatus 100 includes at least one software function module, which may be stored in the memory 220 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 200. The processor 240 is configured to execute an executable module stored in the memory 220, such as a software functional module or a computer program included in the model training apparatus 100. For example, the processor 240 is configured to obtain a data sample set, and divide the data sample set into a training set and a testing set according to a preset ratio, where the data sample set includes: multiple pieces of positive comment texts and multiple pieces of negative comment texts related to the specified target in the webpage; and the method is also used for carrying out iterative training on the BERT model by using the training set and the testing set to obtain a trained emotion classification model, wherein after an input layer of the BERT model passes through an embedding layer, average pooling and maximum pooling are respectively carried out along the length direction of a sentence sequence, the input layer and the input layer are cascaded, and data dimensionality is changed into [ batch _ size, embedding _ dimension ] 2 from [ batch _ size, sequence _ length and embedding _ dimension ] after pooling operation.

The Memory 220 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 240 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 240 may be any conventional processor or the like.

The electronic device 200 includes, but is not limited to, a computer, a server, and the like.

The embodiment of the present application further provides a non-volatile computer-readable storage medium (hereinafter, referred to as a storage medium), where a computer program is stored on the storage medium, and when the computer program is run by the electronic device 200 as described above, the computer program performs the above-described model training method and the data processing method.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or an electronic device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of model training, comprising:

acquiring a data sample set, and dividing the data sample set into a training set and a testing set according to a preset proportion, wherein the data sample set comprises: multiple pieces of positive comment texts and multiple pieces of negative comment texts related to the specified target in the webpage;

and performing iterative training on the BERT model by using the training set and the testing set to obtain a trained emotion classification model, wherein after an input layer of the BERT model passes through an embedding layer, average pooling and maximum pooling are respectively performed along the length direction of a sentence sequence, the average pooling and the maximum pooling are cascaded, and the data dimension is changed into [ batch _ size, embedding _ dimension ] 2 from [ batch _ size, sequence _ length and embedding _ dimension ] after pooling operation.

2. The method of claim 1, wherein iteratively training a BERT model using the training set and the test set comprises:

when training of the BERT model is completed by using the training set each time, performing qualification test on the model after current iterative training by using the test set;

if the model evaluation index of the model after the current iterative training is smaller than the model evaluation index of the model after the previous iterative training, reducing the learning rate of the BERT model and continuing the iterative training;

and if the model evaluation indexes of the preset times are smaller than the model evaluation indexes of the model after the previous iterative training, terminating the training without reaching the preset iterative times.

3. The method of claim 1, wherein iteratively training a BERT model using the training set and the test set comprises:

calculating F1 scores of the test set relative to each threshold value in a preset plurality of threshold values when the current iteration-trained BERT model is subjected to qualification test by using the test set each time;

and when the iteration is finished, selecting a threshold corresponding to the maximum F1 score as a final prediction threshold of the model.

4. The method of claim 1, wherein iteratively training a BERT model using the training set and the test set comprises:

and performing iterative training on the BERT model by utilizing the training set and the testing set based on a weight attenuation mechanism and a dropout mechanism.

5. The method of claim 1, wherein obtaining a set of data samples comprises:

acquiring a plurality of pieces of comment text data related to a specified target in a webpage;

and removing the duplicate of the acquired document data, and labeling each piece of document data after the duplicate removal to obtain the data sample set, wherein the label is used for labeling the text data as a positive comment text or a negative comment sample.

6. A method of data processing, the method comprising:

obtaining comment data in a webpage;

carrying out emotion classification on the comment data by using the emotion classification model trained by the model training method of any one of claims 1-5 to obtain a classification result.

7. A model training apparatus, comprising:

the acquisition module is used for acquiring a data sample set and dividing the data sample set into a training set and a testing set according to a preset proportion, wherein the data sample set comprises: multiple pieces of positive comment texts and multiple pieces of negative comment texts related to the specified target in the webpage;

and the processing module is used for carrying out iterative training on the BERT model by utilizing the training set and the testing set to obtain a trained emotion classification model, wherein after an input layer of the BERT model passes through an embedding layer, average pooling and maximum pooling are respectively carried out along the length direction of a sentence sequence, the input layer and the input layer are cascaded, and data dimensionality is changed into [ batch _ size, embedding _ size 2] from [ batch _ size, sequence _ length and embedding _ dimension ] after pooling operation.

8. An electronic device, comprising:

a memory and a processor, the processor coupled to the memory;

the memory is used for storing programs;

the processor for invoking a program stored in the memory to perform the method of any one of claims 1-5 or to perform the method of claim 6.

9. A BERT model, comprising:

an input layer, an embedding layer; after the data of the input layer passes through the embedding layer, average pooling and maximum pooling are respectively carried out along the sequence length direction, the data and the sequence length are cascaded, and the data dimension is changed into [ batch _ size, embedding _ dimension ] 2 from [ batch _ size, sequence _ length and embedding _ dimension ] after the pooling operation.

10. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1-5 or performs the method of claim 6.