WO2022095354A1 - 基于bert的文本分类方法、装置、计算机设备及存储介质 - Google Patents

基于bert的文本分类方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022095354A1
WO2022095354A1 PCT/CN2021/090505 CN2021090505W WO2022095354A1 WO 2022095354 A1 WO2022095354 A1 WO 2022095354A1 CN 2021090505 W CN2021090505 W CN 2021090505W WO 2022095354 A1 WO2022095354 A1 WO 2022095354A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
similarity
bert
feature vector
positive sample
Prior art date
Application number
PCT/CN2021/090505
Other languages
English (en)
French (fr)
Inventor
王晶
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022095354A1 publication Critical patent/WO2022095354A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles

Definitions

  • the present application relates to the technical field of natural language processing, and in particular, to a BERT-based text classification method, apparatus, computer equipment and storage medium.
  • Automatic classification of text data is an important application field of artificial intelligence technologies such as text data mining and natural language processing. Its main function is to convert unstructured text data stored in digital form into Automatic classification into pre-organized categories related to specific businesses. With the continuous development of technology in the information age, automatic classification of text data is an important technical measure to improve the production efficiency and competitive advantage of enterprises.
  • CNN Convolutional Neural Networks, Convolutional Neural Networks
  • RNN Recurrent Neural Network, Recurrent Neural Network
  • the purpose of the embodiments of the present application is to propose a BERT-based text classification method, apparatus, computer equipment and storage medium, so as to solve the problem that existing general models are easily affected by noise labels.
  • the embodiments of the present application provide a BERT-based text classification method, which adopts the following technical solutions:
  • the consultation data is input into the BERT network trained based on the triple loss function to perform a feature transformation operation to obtain a session feature vector;
  • the embodiments of the present application also provide a BERT-based text classification device, which adopts the following technical solutions:
  • a request receiving module configured to receive a session request carrying consulting data sent by a user through a requesting terminal
  • a feature transformation module for responding to the session request, inputting the consultation data into the BERT network trained based on the triple loss function to perform a feature transformation operation to obtain a session feature vector;
  • a category prediction module for inputting the session feature vector into the Dense classification layer to perform a category prediction operation to obtain a predicted classification result
  • a speech acquisition module configured to read a speech database, and obtain speech response information corresponding to the predicted classification result in the speech database
  • a session reply module configured to send the speech reply information to the requesting terminal, so as to complete the consultation session reply.
  • the embodiment of the present application also provides a kind of computer equipment, adopts the following technical scheme:
  • It comprises a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the following steps of the BERT-based text classification method are implemented:
  • the consultation data is input into the BERT network trained based on the triple loss function to perform a feature transformation operation to obtain a session feature vector;
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • Computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by the processor, the steps of the BERT-based text classification method described below are implemented:
  • the consultation data is input into the BERT network trained based on the triple loss function to perform a feature transformation operation to obtain a session feature vector;
  • the BERT-based text classification method, device, computer equipment and storage medium mainly have the following beneficial effects:
  • the BERT-based text classification method, device, computer equipment and storage medium provided by this application receive a session request with consultation data sent by a user through a requesting terminal;
  • the BERT network trained by the tuple loss function performs the feature transformation operation to obtain the session feature vector; input the session feature vector into the Dense classification layer to perform the category prediction operation to obtain the predicted classification result; obtain the speech reply information corresponding to the predicted classification result from the technical database; and send the speech reply information to the requesting terminal to complete the consultation session reply.
  • Training the BERT network through the loss function of Triplet loss can greatly reduce the influence of noise labels, and effectively solve the problem that the basic classification structure of encoder+Dense layer+cross entropy loss such as traditional BERT is easily affected by noise labels.
  • Fig. 1 is the realization flow chart of the BERT-based text classification method provided in the first embodiment of the present application
  • Fig. 2 is the realization flow chart of the BERT network training method provided in the first embodiment of the present application
  • FIG. 3 is a schematic diagram of a vector similarity calculation operation provided in Embodiment 1 of the present application.
  • Fig. 4 is a flow chart of an implementation manner of step S205 in Fig. 2;
  • Fig. 5 is a flowchart of another implementation manner of step S205 in Fig. 2;
  • FIG. 6 is a schematic structural diagram of a BERT-based text classification device provided in Embodiment 2 of the present application.
  • FIG. 7 is a schematic structural diagram of a BERT network training device provided in Embodiment 2 of the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • FIG. 1 a flowchart of the implementation of the BERT-based text classification method provided in Embodiment 1 of the present application is shown. For the convenience of description, only parts related to the present application are shown.
  • step S101 a session request carrying consultation data sent by a user through a requesting terminal is received.
  • the request terminal is mainly used to collect content information carrying user consultation semantics, and the content information may be text information, voice information, video information, etc.
  • the request terminal at least includes a text collection module, One or more combinations of voice capture modules or video capture modules.
  • the requesting terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, etc. It should be understood that the examples of the requesting terminal here are only for convenience of understanding, and are not used to limit the present application.
  • consultation data refers to content information that carries the user's consultation semantics through voice data, text data, etc. sent by the requesting terminal during the consultation session between the user and the system, and this information is consultation data .
  • step S102 in response to the session request, the consultation data is input into the BERT network trained based on the triple loss function to perform a feature transformation operation to obtain a session feature vector.
  • the BERT Bidirectional Encoder Representations from Transformers, pre-trained language representation model
  • the triplet loss function refers to the training of the BERT network through the Triplet loss function, so that the BERT network is
  • the feature conversion distance between similar data is reduced as much as possible, and the feature conversion distance between heterogeneous data is enlarged as much as possible, so as to effectively solve the problem that the basic classification structure of the traditional BERT network is easily affected by noise labels.
  • step S103 the session feature vector is input to the Dense classification layer to perform a class prediction operation to obtain a predicted classification result.
  • the Dense classification layer is mainly used to complete the multi-classification task, and the Dense classification layer uses Focal loss as the loss.
  • Focal loss reduces the weight of a large number of simple negative samples in training.
  • In multi-classification, only the function of adjusting the total loss modulus; ⁇ is to adjust the contribution of the sample to the model.
  • takes different values in a certain range (for example, 0-5), it can be found that the larger the ⁇ , the simpler and easier the The smaller the contribution of the divided samples to the total loss, the more conducive to the classification of difficult samples; ⁇ is a minimum value, the function is to avoid p t in log(p t ) being 0; use y true log(p t + ⁇ ) is to make the loss generated by the position of 0 in the one-hot label to be 0.
  • step S104 the speech database is read, and speech reply information corresponding to the predicted classification result is obtained from the speech database.
  • the vocabulary database is mainly used to store the reply information data corresponding to the predicted classification result.
  • the vocabulary reply information corresponding to the predicted classification result is obtained in the database by means of indexes, etc., so as to complete the consultation session reply work.
  • step S105 speech reply information is sent to the requesting terminal to complete the consultation session reply.
  • a BERT-based text classification method which receives a session request carrying consultation data sent by a user through a requesting terminal; and in response to the session request, the consultation data is input into a triple-based
  • the BERT network trained by the loss function performs a feature transformation operation to obtain a session feature vector; input the session feature vector into the Dense classification layer to perform a category prediction operation to obtain a predicted classification result; Obtaining the speech reply information corresponding to the predicted classification result in the process; sending the speech reply information to the requesting terminal, so as to complete the consultation session reply.
  • Training the BERT network through the loss function of Triplet loss can greatly reduce the influence of noise labels, and effectively solve the problem that the basic classification structure of encoder+Dense layer+cross entropy loss such as traditional BERT is easily affected by noise labels.
  • FIG. 2 a flowchart of the implementation of the BERT network training method provided in Embodiment 1 of the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • the BERT-based text classification method provided by the present application further includes: step S201 , step S202 , step S203 , and step S205 .
  • step S201 the training database is read, and a training text data set is obtained from the training database.
  • the training text data set includes at least a first positive sample, a second positive sample of the same category as the first positive sample, and a second positive sample of the same category as the first positive sample. Random samples with different classes of positive samples.
  • the training text data set is a triple data set of (sentence 1, sentence 2, and sentence 3), and the categories of sentence 1, sentence 2, and sentence 3 are A, A, and B, respectively.
  • the first two elements are any two positive samples of the same class, and the last element can be randomly selected from different classes, or from a class that is indistinguishable from class A, or a combination of the first two methods.
  • step S202 the first positive sample, the second positive sample and the random sample are respectively input to the original BERT network for feature transformation operation to obtain the first feature vector, the second feature vector and the random feature vector.
  • the original BERT network refers to the original feature vector transformation model that has not undergone any training.
  • the input triples pass through the same BERT layer.
  • BERT plays the role of encoder (encoding), and its purpose is to output sentence vectors that represent semantics.
  • step S203 a vector similarity calculation operation is performed on the first eigenvector and the second eigenvector to obtain a similar vector similarity.
  • step S204 the vector similarity calculation operation is performed on the first feature vector and the random feature vector to obtain non-homogeneous vector similarity.
  • sent1, all sent2, and sent3 are sentence vectors shared by the output of the BERT layer. Due to the possibility of noise samples in each group of samples, the mean value of the similarity between all sample vectors in each group of samples and sent1 can be taken to calculate the Triplet loss, or the maximum value of the similarity between all sample vectors in the positive example bag and sent1 (representing the same sentence as the sentence) 1 is the most similar) and the minimum value of the similarity between all sample vectors of the negative bag and sent1 (meaning the least similar to sentence 1) to calculate the Triplet loss.
  • step S205 a training operation is performed on the BERT network based on the similarity of homogeneous vectors, the similarity of non-homogeneous vectors and the triplet loss function, and the BERT network trained based on the triplet loss function is obtained.
  • Training step 1 These 7 sentences pass through the same model BERT with shared weights at the same time, and output 7 vectors.
  • Training step 2 Calculate the similarity between sentence 1 and the other 6 sentences respectively.
  • Training step 3 Take the positive example BAG that is most similar to sentence 1: Max(Sim(sentence 1, sentence 2[0]), Sim(sentence 1, sentence 2[0]), Sim(sentence 1, sentence 2 [2])) (The highest similarity among the three may be between sentence 1 and sentence 2[1]. Since sentence 2[0] is actually a mislabeled noise data, this calculation can weaken its impact on the model ).
  • Training step 4 Take the negative example BAG, the least similar to sentence 1: Min(Sim(sentence 1, sentence 3[0]), Sim(sentence 1, sentence 3[0]), Sim(sentence 1, sentence 3[2])) (the least similar hypothesis of the three is sentence 3[1]).
  • Training step 5 Triplet loss: Make the similarity difference between training step 3 and training step 4 as large as possible.
  • Training step 6 Do another task at the same time: In addition to the triplet loss task of opening the gap between classes, sentence 1 finally does a multi-classification.
  • Training step 7 After the model is trained, it can not only obtain the classification of sentence 1, but also make the class more distinguishable in the process of weakening the label noise. At this time, it is only necessary to retain the model part of sentence 1 to sentence 1 category prediction, which can be used to predict the classification of any input sentence.
  • step S205 in FIG. 2 a flowchart of an implementation manner of step S205 in FIG. 2 is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S205 specifically includes: step S401 , step S402 and step S403 .
  • step S401 the average value of the similarity of the same class is calculated to obtain the average vector of the same class.
  • sentence 1 and sentence 2[0] have a similar similarity of 60
  • sentence 1 and sentence 2[1] have a similar similarity of 70
  • sentence 1 and sentence 2[2] have a similar similarity of 80
  • the average vector of the same class is equal to Sim(sentence 1, sentence 2[1]).
  • step S402 the average value of the non-homogeneous similarity is calculated to obtain the average non-homogeneous vector.
  • the calculation of the average value of the non-similar similarity is the same as the above-mentioned calculation of the average value of the same similarity.
  • step S403 a reverse update operation is performed on the BERT network based on the first feature vector, the average homogeneous vector, the average non-homogeneous vector and the triplet loss function to obtain a BERT network trained based on the triplet loss function.
  • the reverse update operation is mainly used to dynamically update the representation parameters of the BERT network according to the changes of the average homogeneous vector and the average non-homogeneous vector.
  • step S205 in FIG. 2 a flowchart of another implementation manner of step S205 in FIG. 2 is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S205 specifically includes: step S501 , step S502 and step S503 .
  • step S501 the largest similar vector with the largest similarity is obtained from the second feature vector based on the similar similarity.
  • step S502 the smallest random vector with the smallest similarity is obtained from the random feature vectors based on the non-homogeneous similarity.
  • the implementation manner of obtaining the smallest random vector with the smallest similarity is the same as the above-mentioned method of obtaining the largest similar vector with the largest similarity.
  • step S503 a reverse update operation is performed on the BERT network based on the first feature vector, the largest homogeneous vector, the smallest random vector and the triplet loss function, and a BERT network trained based on the triplet loss function is obtained.
  • the reverse update operation is mainly used to dynamically update the representation parameters of the BERT network according to the changes of the largest homogeneous vector and the smallest random vector.
  • the above triplet loss function is expressed as:
  • + indicates that when the value in [ ] is greater than zero, the value is taken as the loss, and when it is less than zero, the loss is zero.
  • the above-mentioned consultation data can also be stored in a node of a blockchain.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • This application provides a BERT-based text classification method.
  • the BERT network is trained through a multi-instance triplet loss function, which can greatly reduce the impact of noise labels and effectively solve traditional BERT and other encoder + Dense layer + cross entropy loss
  • the basic classification structure of is susceptible to noisy labels.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides a BERT-based text classification device.
  • the embodiment of the device corresponds to the method embodiment shown in FIG. 1 , and the device can be specifically applied to in various electronic devices.
  • the BERT-based text classification apparatus 100 includes: a request receiving module 110 , a feature conversion module 120 , a category prediction module 130 , a speech acquisition module 140 , and a conversation reply module 150 . in:
  • a request receiving module 110 configured to receive a session request carrying consulting data sent by a user through a requesting terminal;
  • the feature transformation module 120 is used to respond to the session request, input the consultation data into the BERT network trained based on the triple loss function, and perform a feature transformation operation to obtain a session feature vector;
  • the category prediction module 130 is used to input the session feature vector into the Dense classification layer to perform a category prediction operation to obtain a predicted classification result;
  • the speech acquisition module 140 is configured to read the speech database, and obtain speech reply information corresponding to the predicted classification result in the speech database;
  • the conversation reply module 150 is used for sending the speech reply information to the requesting terminal to complete the consultation conversation reply.
  • the request terminal is mainly used to collect content information carrying user consultation semantics, and the content information may be text information, voice information, video information, etc.
  • the request terminal at least includes a text collection module, One or more combinations of voice capture modules or video capture modules.
  • the requesting terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, etc. It should be understood that the examples of the requesting terminal here are only for convenience of understanding, and are not used to limit the present application.
  • consultation data refers to content information that carries the user's consultation semantics through voice data, text data, etc. sent by the requesting terminal during the consultation session between the user and the system, and this information is consultation data .
  • the BERT network trained based on the triplet loss function refers to the training of the BERT network through the Triplet loss function, so that the BERT network can reduce the features between similar data as much as possible when expressing features.
  • the feature conversion distance between heterogeneous data is enlarged as much as possible, so as to effectively solve the problem that the basic classification structure of the traditional BERT network is easily affected by noise labels.
  • the Dense classification layer is mainly used to complete the multi-classification task, and the Dense classification layer uses Focal loss as the loss.
  • Focal loss reduces the weight of a large number of simple negative samples in training.
  • the Focal loss of each sample is:
  • takes different values in a certain range (such as 0-5), it can be found that the larger ⁇ is, the more simple and easy-to-divide samples contribute to the total loss. The smaller the contribution, the more conducive to the classification of difficult samples; ⁇ is a minimum value, the function is to avoid p t in log(p t ) being 0; using y true log(p t + ⁇ ) is to make the one-hot label The loss generated by the position of 0 in the middle is 0.
  • the vocabulary database is mainly used to store the reply information data corresponding to the predicted classification result.
  • the vocabulary reply information corresponding to the predicted classification result is obtained in the database by means of indexes, etc., so as to complete the consultation session reply work.
  • a BERT-based text classification device including: a request receiving module for receiving a session request carrying consultation data sent by a user through a requesting terminal; a feature conversion module for responding to the session request , input the consultation data into the BERT network trained based on the triple loss function for feature transformation operation, and obtain the session feature vector; the category prediction module is used to input the session feature vector into the Dense classification layer for category prediction operation, and obtain the predicted classification Results; the speech acquisition module is used to read the speech database, and obtain speech response information corresponding to the predicted classification result in the speech database; the conversation reply module is used to send speech response information to the requesting terminal to complete the consultation Conversation reply.
  • Training the BERT network through the multi-instance triple loss function can greatly reduce the impact of noise labels, effectively solving the problem that the basic classification structure of traditional BERT such as encoder + Dense layer + cross entropy loss is easily affected by noise labels.
  • FIG. 7 a schematic structural diagram of the BERT network training apparatus provided in Embodiment 2 of the present application is shown. For the convenience of description, only parts related to the present application are shown.
  • the above-mentioned BERT-based text classification apparatus 100 further includes: a training text acquisition module 160 , a feature transformation training module 170 , a similarity calculation module 180 , and a network training module 190 . in:
  • the training text obtaining module 160 is used for reading the training database, and obtaining a training text data set in the training database.
  • the training text data set includes at least a first positive sample, a second positive sample of the same category as the first positive sample, and Random samples that are different from the first positive sample category;
  • the feature transformation training module 170 is used to input the first positive sample, the second positive sample and the random sample into the original BERT network respectively for feature transformation operation to obtain the first feature vector, the second feature vector and the random feature vector;
  • the similarity calculation module 180 is used to perform vector similarity calculation operations on the first eigenvector and the second eigenvector, the first eigenvector and the random eigenvector, the similarity of similar vectors and the similarity of non-homogeneous vectors;
  • the network training module 190 is used for training the BERT network based on the similarity of similar vectors, the similarity of non-homogeneous vectors and the triplet loss function to obtain a BERT network trained based on the triplet loss function.
  • the training text data set is a triple data set of (sentence 1, sentence 2, and sentence 3), and the categories of sentences 1, 2, and 3 are A, A, and B, respectively.
  • the first two elements are any two positive samples of the same class, and the last element can be randomly selected from different classes, or from a class that is indistinguishable from class A, or a combination of the first two methods.
  • the original BERT network refers to the original feature vector transformation model that has not undergone any training.
  • the input triples pass through the same BERT layer.
  • BERT acts as an encoder, with the purpose of outputting sentence vectors representing semantics.
  • sent1, all sent2, and sent3 are sentence vectors shared by the output of the BERT layer. Due to the possibility of noise samples in each group of samples, the mean value of the similarity between all sample vectors in each group of samples and sent1 can be taken to calculate the Triplet loss, or the maximum value of the similarity between all sample vectors in the positive example bag and sent1 (representing the same sentence as the sentence) 1 is the most similar) and the minimum value of the similarity between all sample vectors of the negative bag and sent1 (meaning the least similar to sentence 1) to calculate the Triplet loss.
  • Training step 1 These 7 sentences pass through the same model BERT with shared weights at the same time, and output 7 vectors.
  • Training step 2 Calculate the similarity between sentence 1 and the other 6 sentences respectively.
  • Training step 3 Take the positive example BAG that is most similar to sentence 1: Max(Sim(sentence 1, sentence 2[0]), Sim(sentence 1, sentence 2[0]), Sim(sentence 1, sentence 2 [2])) (The highest similarity among the three may be between sentence 1 and sentence 2[1]. Since sentence 2[0] is actually a mislabeled noise data, this calculation can weaken its impact on the model ).
  • Training step 4 Take the negative example BAG, the least similar to sentence 1: Min(Sim(sentence 1, sentence 3[0]), Sim(sentence 1, sentence 3[0]), Sim(sentence 1, sentence 3[2])) (the least similar hypothesis of the three is sentence 3[1]).
  • Training step 5 Triplet loss: Make the similarity difference between training step 3 and training step 4 as large as possible.
  • Training step 6 Do another task at the same time: In addition to the triplet loss task of opening the gap between classes, sentence 1 finally does a multi-classification.
  • Training step 7 After the model is trained, it can not only obtain the classification of sentence 1, but also make the class more distinguishable in the process of weakening the label noise. At this time, it is only necessary to retain the model part of sentence 1 to sentence 1 category prediction, which can be used to predict the classification of any input sentence.
  • the network training module 190 includes: a homogeneous average value calculation submodule, a non-homogeneous average value calculation submodule, and a first reverse update submodule. in:
  • the sub-module for calculating the average value of the same class is used to calculate the average value of the similarity of the same class to obtain the average vector of the same class;
  • the sub-module for calculating the average value of non-similars is used to calculate the average of non-similar similarity, and obtain the average non-similar vector;
  • the first reverse update submodule is used to perform a reverse update operation on the BERT network based on the first feature vector, the average homogeneous vector, the average non-homogeneous vector and the triplet loss function, and obtain the triplet-based Loss function trained BERT network.
  • the network training module 190 further includes: a maximum value acquisition sub-module, a minimum value acquisition sub-module, and a second reverse update sub-module. in:
  • the maximum value acquisition submodule is used to obtain the largest similar vector with the largest similarity in the second feature vector based on the similarity of the same type
  • the minimum value obtaining submodule is used to obtain the minimum random vector with the smallest similarity in the random feature vectors based on the non-homogeneous similarity
  • the second reverse update sub-module is configured to perform a reverse update operation on the BERT network based on the first feature vector, the largest homogeneous vector, the smallest random vector and the triplet loss function, and obtain the triplet-based loss Function trained BERT network.
  • the above triplet loss function is expressed as:
  • FIG. 8 is a block diagram of a basic structure of a computer device according to this embodiment.
  • the computer device 200 includes a memory 210 , a processor 220 , and a network interface 230 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 210-230 is shown in the figure, but it should be understood that implementation of all of the shown components is not required, and more or less components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded equipment etc.
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 210 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the computer readable storage Media can be non-volatile or volatile.
  • the memory 210 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 .
  • the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 210 may also include both the internal storage unit of the computer device 200 and its external storage device.
  • the memory 210 is generally used to store the operating system and various application software installed on the computer device 200 , such as computer-readable instructions of the BERT-based text classification method.
  • the memory 210 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 220 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 220 is typically used to control the overall operation of the computer device 200 .
  • the processor 220 is configured to execute computer-readable instructions stored in the memory 210 or process data, for example, computer-readable instructions for executing the BERT-based text classification method.
  • the network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
  • the BERT-based text classification method trains the BERT network through a multi-instance triple loss function, which can greatly reduce the impact of noise labels and effectively solve the basic problems of traditional BERT such as encoder+Dense layer+cross entropy loss.
  • the classification structure is susceptible to the problem of noisy labels.
  • the present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the BERT-based text classification method as described above.
  • the BERT-based text classification method trains the BERT network through a multi-instance triple loss function, which can greatly reduce the impact of noise labels and effectively solve the basic problems of traditional BERT such as encoder+Dense layer+cross entropy loss.
  • the classification structure is susceptible to the problem of noisy labels.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于BERT的文本分类方法、装置、计算机设备及存储介质,该方法包括:接收用户通过请求终端发送的携带有咨询数据的会话请求(S101);响应会话请求,将咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量(S102);将会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果(S103);读取话术数据库,在话术数据库中获取与预测分类结果相对应的话术答复信息(S104);向请求终端发送话术答复信息,以完成咨询会话答复(S105)。此外,该方法还涉及区块链技术,用户的咨询数据可存储于区块链中。该方法可以极大减少噪音标签的影响,有效解决传统BERT的基本分类结构容易受噪音标签影响的问题。

Description

基于BERT的文本分类方法、装置、计算机设备及存储介质
本申请以2020年11月03日提交的申请号为202011212539.5,名称为“基于BERT的文本分类方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及自然语言处理技术领域,尤其涉及一种基于BERT的文本分类方法、装置、计算机设备及存储介质。
背景技术
近年来,随着网络技术的快速发展,互联网上在线文本信息数据的急剧增加,文本分类在信息处理上起着至关重要的作用,是处理较大规模文本信息的关键技术,并推动了信息处理朝着自动化的方向发展。
对文本数据进行自动化分类是文本数据挖掘、自然语言处理等人工智能技术的重要应用领域,其主要功能是把以数字化形式存储的非结构化文本数据,通过自然语言处理技术、文本数据挖掘技术,自动按照事先组织好的与具体业务相关的类别进行分类。随着信息时代技术的不断发展,对文本数据进行自动化分类是提高企业生产效率和竞争优势的一种重要技术措施。
然而,申请人意识到传统的文本分类方法中,大多是基于深度学习方法大部分采用CNN(Convolutional Neural Networks,卷积神经网络)模型或RNN(Recurrent Neural Network,循环神经网络)模型来解决文本分类问题,然而,现有通用的模型容易受噪音标签影响。
发明内容
本申请实施例的目的在于提出一种基于BERT的文本分类方法、装置、计算机设备及存储介质,以解决现有通用的模型容易受噪音标签影响的问题。
为了解决上述技术问题,本申请实施例提供一种基于BERT的文本分类方法,采用了如下所述的技术方案:
接收用户通过请求终端发送的携带有咨询数据的会话请求;
响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
为了解决上述技术问题,本申请实施例还提供一种基于BERT的文本分类装置,采用了如下所述的技术方案:
请求接收模块,用于接收用户通过请求终端发送的携带有咨询数据的会话请求;
特征转化模块,用于响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
类别预测模块,用于将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
话术获取模块,用于读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
会话答复模块,用于向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技 术方案:
包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的基于BERT的文本分类方法的步骤:
接收用户通过请求终端发送的携带有咨询数据的会话请求;
响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:
所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的基于BERT的文本分类方法的步骤:
接收用户通过请求终端发送的携带有咨询数据的会话请求;
响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
与现有技术相比,本申请实施例提供的基于BERT的文本分类方法、装置、计算机设备及存储介质主要有以下有益效果:
本申请提供的基于BERT的文本分类方法、装置、计算机设备及存储介质,通过接收用户通过请求终端发送的携带有咨询数据的会话请求;响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;向所述请求终端发送所述话术答复信息,以完成咨询会话答复。通过Triplet loss的损失函数对BERT网络进行训练,可以极大减少噪音标签的影响,有效解决传统BERT等encoder+Dense层+交叉熵损失的基本分类结构容易受噪音标签影响的问题。
附图说明
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例一提供的基于BERT的文本分类方法的实现流程图;
图2是本申请实施例一提供的BERT网络训练方法的实现流程图;
图3是本申请实施例一提供的向量相似度计算操作的示意图;
图4是图2中步骤S205的一种实现方式的流程图;
图5是图2中步骤S205的另一种实现方式的流程图;
图6是本申请实施例二提供的基于BERT的文本分类装置的结构示意图;
图7是本申请实施例二提供的BERT网络训练装置的结构示意图;
图8是根据本申请的计算机设备的一个实施例的结构示意图。
具体实施方式
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。
实施例一
参阅图1,示出了本申请实施例一提供的基于BERT的文本分类方法的实现流程图,为了便于说明,仅示出与本申请相关的部分。
在步骤S101中,接收用户通过请求终端发送的携带有咨询数据的会话请求。
在本申请实施例中,请求终端主要用于采集携带有用户咨询语义的内容信息,该内容信息可以是文本信息、语音信息、视频信息等等,相应的,该请求终端至少包括文本采集模块、语音采集模块或者视频采集模块中的一种或多种组合。
在本申请实施例中,请求终端可以是诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置等等的移动终端以及诸如数字TV、台式计算机等等的固定终端,应当理解,此处对请求终端的举例仅为方便理解,不用于限定本申请。
在本申请实施例中,咨询数据指的是用户与系统在进行咨询会话的过程中,通过请求终端发送的语音数据、文本数据等等携带有用户咨询语义的内容信息,该信息即为咨询数据。
在步骤S102中,响应会话请求,将咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量。
在本申请实施例中,基于三元组损失函数训练好的BERT(Bidirectional Encoder Representations from Transformers,预训练的语言表征模型)网络指的是通过Triplet loss损失函数对BERT网络进行训练,使得BERT网络在进行特征表达时,尽可能缩小同类数据之间的特征转换距离,同时尽可能拉大异类数据之间的特征转换距离,从而有效解决传统BERT网络的基本分类结构容易受噪音标签影响的问题。
在步骤S103中,将会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果。
在本申请实施例中,Dense分类层主要用于完成多分类任务,该Dense分类层采用Focal loss作为损失。Focal loss降低了大量简单负样本在训练中所占的权重,其中多分类下,每个样本的Focal loss为:FL(p t)=-α*(1-p t) γ*y true*log(p t+ε),其中每个样本的p t为one-hot标签为1的index对应的预测结果值,即max(y pred*y true,axis=-1);不同于二分类,α在多分类中只有调节总loss模长的作用;γ是为了调节样本对模型产生的贡献,在一定范围(例如0-5)内的γ取不同值可以发现,γ越大,简单的、易被分的样本对总loss的贡献越小,越有利于难分样本的分类;ε是个极小值,作用是避免log(p t)中p t为0;用y true log(p t+ε)是为了让one-hot标签中为0的位置产生的loss为0。
在步骤S104中,读取话术数据库,在话术数据库中获取与预测分类结果相对应的话术答复信息。
在本申请实施例中,话术数据库主要用于存储与预测分类结果相对应的答复信息数据,当将会话特征向量输入至Dense分类层进行类别预测操作得到预测分类结果之后,可在该话术数据库中通过索引等方式获取与该预测分类结果相对应的话术答复信息,从而完成咨询会话答复工作。
在步骤S105中,向请求终端发送话术答复信息,以完成咨询会话答复。
在本申请实施例中,提供了一种基于BERT的文本分类方法,接收用户通过请求终端发送的携带有咨询数据的会话请求;响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;向所述请求终端发送所述话术答复信息,以完成咨询会话答复。通过Triplet loss的损失函数对BERT网络进行训练,可以极大减少噪音标签的影响,有效解决传统BERT等encoder+Dense层+交叉熵损失的基本分类结构容易受噪音标签影响的问题。
继续参阅图2,示出了本申请实施例一提供的BERT网络训练方法的实现流程图,为了便于说明,仅示出与本申请相关的部分。
在本申请实施例一的一些可选的实现方式中,本申请提供的基于BERT的文本分类方法还包括:步骤S201、步骤S202、步骤S203以及步骤S205。
在步骤S201中,读取训练数据库,在训练数据库中获取训练文本数据集,训练文本数据集至少包括第一正例样本、与第一正例样本类别相同的第二正例样本以及与第一正例样本类别不同的随机样本。
在本申请实施例中,训练文本数据集为(句1,句2,句3)的三元组数据集,句1、句2、句3的类别分别为A、A、B。前两个元素为同一个类别的任意两正例样本,最后一个元素可以从不同类中随机抽取,或从与类A难以区分的类中抽取,或组合前两种方式抽取。
在步骤S202中,将第一正例样本、第二正例样本以及随机样本分别输入至原始BERT网络进行特征转化操作,得到第一特征向量、第二特征向量以及随机特征向量。
在本申请实施例中,原始BERT网络指的是没有经过任何训练的原始特征向量转化模型。输入的三元组通过同一个BERT层。BERT起到encoder(编码)的作用,目的是输出表征了语义的句向量。
在步骤S203中,对所述第一特征向量以及所述第二特征向量进行向量相似度计算操作,得到同类向量相似度。
在步骤S204中,对所述第一特征向量以及所述随机特征向量进行所述向量相似度计算操作,得到非同类向量相似度。
在本申请实施例中,参阅图3,示出了向量相似度计算操作的示意图,图中sent1、所有sent2、sent3是共享BERT层输出的句向量。由于每组样本中存在噪音样本的可能,可以取每组样本种所有样本向量与sent1的相似度的均值计算Triplet loss,或正例bag所有样本向量与sent1的相似度的最大值(表示与句1最相似)及负例bag所有样本向量与sent1的相似度的最小值(表示与句1最不相似)计算Triplet loss。
在步骤S205中,基于同类向量相似度、非同类向量相似度以及三元组损失函数对BERT网络进行训练操作,得到基于三元组损失函数训练好的BERT网络。
在本申请实施例中,假设多实例学习的每个BAG中有三个句子。那么上述(1)里对应的(句1,句2,句3)的三元组,变为(句1,[句2[0],句2[1],句2[2]],[句3[0],句3[1],句3[2]]),即输入一个正例样本,一组同类的正例样本,一组不同类的负例样本。
Figure PCTCN2021090505-appb-000001
Figure PCTCN2021090505-appb-000002
训练步骤1:这7条句子同时通过共享权重的同一个模型BERT,输出7个向量vector。
训练步骤2:分别计算句1与其他6句的相似度。
训练步骤3:取正例BAG中,与句1最相似的:Max(Sim(句1,句2[0]),Sim(句1,句2[0]),Sim(句1,句2[2]))(三个里面最高的相似度可能是句1与句2[1]的。由于句2[0]实际上是个标签错误的噪音数据,这种计算可以弱化它对模型的影响)。
训练步骤4:取负例BAG中,与句1最不相似的:Min(Sim(句1,句3[0]),Sim(句1,句3[0]),Sim(句1,句3[2]))(三个里最不相似的假设是句3[1])。
训练步骤5:Triplet loss:让训练步骤3和训练步骤4里的相似度差距尽可能大。
训练步骤6:同时做另一个任务:除了上面拉开类间差距的Triplet loss的任务,句1最后做一个多分类。
训练步骤7:模型训练完以后,既能得到句1的分类,又能使得过程中,在弱化标注噪音的努力下让类和类更有区分。此时只需要保留句1到句1类别预测的模型部分,就可以用来预测随便一条输入句子的分类了。
继续参阅图4,示出了图2中步骤S205的一种实现方式的流程图,为了便于说明,仅示出与本申请相关的部分。
在本申请实施例一的一些可选的实现方式中,上述步骤S205具体包括:步骤S401、步骤S402以及步骤S403。
在步骤S401中,计算同类相似度的平均值,得到平均同类向量。
在实际应用中,若句1与句2[0]的同类相似度为60,句1与句2[1]的同类相似度为70,句1与句2[2]的同类相似度为80,由平均值计算可知同类相似度的平均值为70,那么,平均同类向量则与Sim(句1,句2[1])相等。
在步骤S402中,计算非同类相似度的平均值,得到平均非同类向量。
在本申请实施例中,计算非同类相似度的平均值与上述计算同类相似度的平均值的实现方式相同。
在步骤S403中,基于第一特征向量、平均同类向量、平均非同类向量以及三元组损失函数对BERT网络进行反向更新操作,得到基于三元组损失函数训练好的BERT网络。
在本申请实施例中,反向更新操作主要用于根据平均同类向量、平均非同类向量的变化动态更新BERT网络的表征参数。
继续参阅图5,示出了图2中步骤S205的另一种实现方式的流程图,为了便于说明,仅示出与本申请相关的部分。
在本申请实施例一的一些可选的实现方式中,上述步骤S205具体包括:步骤S501、步骤S502以及步骤S503。
在步骤S501中,基于同类相似度在第二特征向量中获取相似度最大的最大同类向量。
在本申请实施例中,若句1与句2[0]的同类相似度为60,句1与句2[1]的同类相似度为70,句1与句2[2]的同类相似度为80,由最大值计算可知相似度最大的为80,那么最大同类向量则为Sim(句1,句2[2])。
在步骤S502中,基于非同类相似度在随机特征向量中获取相似度最小的最小随机向量。
在本申请实施例中,获取相似度最小的最小随机向量的实现方式与上述获取相似度最大的最大同类向量的相同。
在步骤S503中,基于第一特征向量、最大同类向量、最小随机向量以及三元组损失函数对BERT网络进行反向更新操作,得到基于三元组损失函数训练好的BERT网络。
在本申请实施例中,反向更新操作主要用于根据最大同类向量、最小随机向量的变化动态更新BERT网络的表征参数。
在本申请实施例一的一些可选的实现方式中,上述三元组损失函数表示为:
Figure PCTCN2021090505-appb-000003
其中,N表示整个训练集的总数;
Figure PCTCN2021090505-appb-000004
表示第一正例样本;
Figure PCTCN2021090505-appb-000005
表示第一特征向量;
Figure PCTCN2021090505-appb-000006
表示第二正例样本;
Figure PCTCN2021090505-appb-000007
表示第二特征向量;
Figure PCTCN2021090505-appb-000008
表示随机样本;
Figure PCTCN2021090505-appb-000009
表示随机特征向量;α表示第一正例样本与第二正例样本之间的距离和第一正例样本与随机样本之间的距离的最小间隔。
在本申请实施例中,a指的是anchor,代表第一正例样本元组;p指的是positive,代表第二正例样本元组;n指的是negative,代表随机样本元组。
在本申请实施例中,+表示[]内的值大于零的时候,取该值为损失,小于零的时候,损失为零。
Figure PCTCN2021090505-appb-000010
Figure PCTCN2021090505-appb-000011
之间的距离小于
Figure PCTCN2021090505-appb-000012
Figure PCTCN2021090505-appb-000013
之间的距离与α的和时,[]内的值大于零,就会产生损失。
Figure PCTCN2021090505-appb-000014
Figure PCTCN2021090505-appb-000015
之间的距离大于或等于
Figure PCTCN2021090505-appb-000016
Figure PCTCN2021090505-appb-000017
之间的距离与α的和时,损失为零。
需要强调的是,为进一步保证上述咨询数据的私密和安全性,上述咨询数据还可以存储于一区块链的节点中。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本申请提供了一种基于BERT的文本分类方法,通过多实例的三元组损失函数对BERT网络进行训练,可以极大减少噪音标签的影响,有效解决传统BERT等encoder+Dense层+交叉熵损失的基本分类结构容易受噪音标签影响的问题。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执 行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
实施例二
进一步参考图6,作为上述图1所示方法的实现,本申请提供了一种基于BERT的文本分类装置,该装置实施例与图1所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图6所示,本申请实施例二提供的基于BERT的文本分类装置100包括:请求接收模块110、特征转化模块120、类别预测模块130、话术获取模块140以及会话答复模块150。其中:
请求接收模块110,用于接收用户通过请求终端发送的携带有咨询数据的会话请求;
特征转化模块120,用于响应会话请求,将咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
类别预测模块130,用于将会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
话术获取模块140,用于读取话术数据库,在话术数据库中获取与预测分类结果相对应的话术答复信息;
会话答复模块150,用于向请求终端发送话术答复信息,以完成咨询会话答复。
在本申请实施例中,请求终端主要用于采集携带有用户咨询语义的内容信息,该内容信息可以是文本信息、语音信息、视频信息等等,相应的,该请求终端至少包括文本采集模块、语音采集模块或者视频采集模块中的一种或多种组合。
在本申请实施例中,请求终端可以是诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置等等的移动终端以及诸如数字TV、台式计算机等等的固定终端,应当理解,此处对请求终端的举例仅为方便理解,不用于限定本申请。
在本申请实施例中,咨询数据指的是用户与系统在进行咨询会话的过程中,通过请求终端发送的语音数据、文本数据等等携带有用户咨询语义的内容信息,该信息即为咨询数据。
在本申请实施例中,基于三元组损失函数训练好的BERT网络指的是通过Triplet loss损失函数对BERT网络进行训练,使得BERT网络在进行特征表达时,尽可能缩小同类数据之间的特征转换距离,同时尽可能拉大异类数据之间的特征转换距离,从而有效解决传统BERT网络的基本分类结构容易受噪音标签影响的问题。
在本申请实施例中,Dense分类层主要用于完成多分类任务,该Dense分类层采用Focal loss作为损失。Focal loss降低了大量简单负样本在训练中所占的权重,其中多分类下,每个样本的Focal loss为:
Figure PCTCN2021090505-appb-000018
其中每个样本的p t为one-hot标签为1的index对应的预测结果值,即max(y pred*y true,axis=1);不同于二分类,α在多分类中只有调节总loss模长的作用;γ是为了调节样本对模型产生的贡献,在一定范围(例如0-5)内的γ取不同值可以发现,γ越大,简单的、易被分的样本对总loss的贡献越小,越有利于难分样本的分类;ε是个极小值, 作用是避免log(p t)中p t为0;用y true log(p t+ε)是为了让one-hot标签中为0的位置产生的loss为0。
在本申请实施例中,话术数据库主要用于存储与预测分类结果相对应的答复信息数据,当将会话特征向量输入至Dense分类层进行类别预测操作得到预测分类结果之后,可在该话术数据库中通过索引等方式获取与该预测分类结果相对应的话术答复信息,从而完成咨询会话答复工作。
在本申请实施例中,提供了一种基于BERT的文本分类装置,包括:请求接收模块,用于接收用户通过请求终端发送的携带有咨询数据的会话请求;特征转化模块,用于响应会话请求,将咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;类别预测模块,用于将会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;话术获取模块,用于读取话术数据库,在话术数据库中获取与预测分类结果相对应的话术答复信息;会话答复模块,用于向请求终端发送话术答复信息,以完成咨询会话答复。通过多实例的三元组损失函数对BERT网络进行训练,可以极大减少噪音标签的影响,有效解决传统BERT等encoder+Dense层+交叉熵损失的基本分类结构容易受噪音标签影响的问题。
继续参阅图7,示出了本申请实施例二提供的BERT网络训练装置的结构示意图,为了便于说明,仅示出与本申请相关的部分。
在本申请实施例二的一些可选的实现方式中,上述基于BERT的文本分类装置100还包括:训练文本获取模块160、特征转化训练模块170、相似度计算模块180以及网络训练模块190。其中:
训练文本获取模块160,用于读取训练数据库,在训练数据库中获取训练文本数据集,训练文本数据集至少包括第一正例样本、与第一正例样本类别相同的第二正例样本以及与第一正例样本类别不同的随机样本;
特征转化训练模块170,用于将第一正例样本、第二正例样本以及随机样本分别输入至原始BERT网络进行特征转化操作,得到第一特征向量、第二特征向量以及随机特征向量;
相似度计算模块180,用于分别对第一特征向量与第二特征向量、第一特征向量与随机特征向量进行向量相似度计算操作,同类向量相似度以及非同类向量相似度;
网络训练模块190,用于基于同类向量相似度、非同类向量相似度以及三元组损失函数对BERT网络进行训练操作,得到基于三元组损失函数训练好的BERT网络。
在本申请实施例中,训练文本数据集为(句1,句2,句3)的三元组数据集,句1、2、3的类别分别为A、A、B。前两个元素为同一个类别的任意两正例样本,最后一个元素可以从不同类中随机抽取,或从与类A难以区分的类中抽取,或组合前两种方式抽取。
在本申请实施例中,原始BERT网络指的是没有经过任何训练的原始特征向量转化模型。输入的三元组通过同一个BERT层。BERT起到encoder的作用,目的是输出表征了语义的句向量。
在本申请实施例中,参阅图3,示出了向量相似度计算操作的示意图,图中sent1、所有sent2、sent3是共享BERT层输出的句向量。由于每组样本中存在噪音样本的可能,可以取每组样本种所有样本向量与sent1的相似度的均值计算Triplet loss,或正例bag所有样本向量与sent1的相似度的最大值(表示与句1最相似)及负例bag所有样本向量与sent1的相似度的最小值(表示与句1最不相似)计算Triplet loss。
在本申请实施例中,假设多实例学习的每个BAG中有三个句子。那么上述(1)里对应的(句1,句2,句3)的三元组,变为(句1,[句2[0],句2[1],句2[2]],[句3[0],句3[1],句3[2]]),即输入一个正例样本,一组同类的正例样本,一组不同类的负例样本。
Figure PCTCN2021090505-appb-000019
Figure PCTCN2021090505-appb-000020
训练步骤1:这7条句子同时通过共享权重的同一个模型BERT,输出7个向量vector。
训练步骤2:分别计算句1与其他6句的相似度。
训练步骤3:取正例BAG中,与句1最相似的:Max(Sim(句1,句2[0]),Sim(句1,句2[0]),Sim(句1,句2[2]))(三个里面最高的相似度可能是句1与句2[1]的。由于句2[0]实际上是个标签错误的噪音数据,这种计算可以弱化它对模型的影响)。
训练步骤4:取负例BAG中,与句1最不相似的:Min(Sim(句1,句3[0]),Sim(句1,句3[0]),Sim(句1,句3[2]))(三个里最不相似的假设是句3[1])。
训练步骤5:Triplet loss:让训练步骤3和训练步骤4里的相似度差距尽可能大。
训练步骤6:同时做另一个任务:除了上面拉开类间差距的Triplet loss的任务,句1最后做一个多分类。
训练步骤7:模型训练完以后,既能得到句1的分类,又能使得过程中,在弱化标注噪音的努力下让类和类更有区分。此时只需要保留句1到句1类别预测的模型部分,就可以用来预测随便一条输入句子的分类了。
在本申请实施例二的一些可选的实现方式中,上述网络训练模块190包括:同类平均值计算子模块、非同类平均值计算子模块以及第一反向更新子模块。其中:
同类平均值计算子模块,用于计算同类相似度的平均值,得到平均同类向量;
非同类平均值计算子模块,用于计算非同类相似度的平均值,得到平均非同类向量;
第一反向更新子模块,用于基于所述第一特征向量、平均同类向量、平均非同类向量以及三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
在本申请实施例二的一些可选的实现方式中,上述网络训练模块190还包括:最大值获取子模块、最小值获取子模块以及第二反向更新子模块。其中:
最大值获取子模块,用于基于同类相似度在所述第二特征向量中获取相似度最大的最大同类向量;
最小值获取子模块,用于基于非同类相似度在所述随机特征向量中获取相似度最小的最小随机向量;
第二反向更新子模块,用于基于所述第一特征向量、最大同类向量、最小随机向量以及三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
在本申请实施例二的一些可选的实现方式中,上述三元组损失函数表示为:
Figure PCTCN2021090505-appb-000021
其中,N表示整个训练集的总数;
Figure PCTCN2021090505-appb-000022
表示第一正例样本;
Figure PCTCN2021090505-appb-000023
表示第一特征向量;
Figure PCTCN2021090505-appb-000024
表示第二正例样本;
Figure PCTCN2021090505-appb-000025
表示第二特征向量;
Figure PCTCN2021090505-appb-000026
表示随机样本;
Figure PCTCN2021090505-appb-000027
表示随机特征向量;α表示第一正例样本与第二正例样本之间的距离和第一正例样本与随机样本之间的距离的最小间隔。
在本申请实施例中,a指的是anchor,代表第一正例样本元组;p指的是positive,代表第二正例样本元组;n指的是negative,代表随机样本元组。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图8,图8为本实施例计算机设备基本结构框图。
所述计算机设备200包括通过系统总线相互通信连接存储器210、处理器220、网络接口230。需要指出的是,图中仅示出了具有组件210-230的计算机设备200,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器210至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等,所述计算机可读存储介质可以是非易失性,也可以是易失性。在一些实施例中,所述存储器210可以是所述计算机设备200的内部存储单元,例如该计算机设备200的硬盘或内存。在另一些实施例中,所述存储器210也可以是所述计算机设备200的外部存储设备,例如该计算机设备200上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器210还可以既包括所述计算机设备200的内部存储单元也包括其外部存储设备。本实施例中,所述存储器210通常用于存储安装于所述计算机设备200的操作系统和各类应用软件,例如基于BERT的文本分类方法的计算机可读指令等。此外,所述存储器210还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器220在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器220通常用于控制所述计算机设备200的总体操作。本实施例中,所述处理器220用于运行所述存储器210中存储的计算机可读指令或者处理数据,例如运行所述基于BERT的文本分类方法的计算机可读指令。
所述网络接口230可包括无线网络接口或有线网络接口,该网络接口230通常用于在所述计算机设备200与其他电子设备之间建立通信连接。
本申请提供的基于BERT的文本分类方法,通过多实例的三元组损失函数对BERT网络进行训练,可以极大减少噪音标签的影响,有效解决传统BERT等encoder+Dense层+交叉熵损失的基本分类结构容易受噪音标签影响的问题。
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于BERT的文本分类方法的步骤。
本申请提供的基于BERT的文本分类方法,通过多实例的三元组损失函数对BERT网络进行训练,可以极大减少噪音标签的影响,有效解决传统BERT等encoder+Dense层+交叉熵损失的基本分类结构容易受噪音标签影响的问题。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机, 服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种基于BERT的文本分类方法,其中,包括下述步骤:
    接收用户通过请求终端发送的携带有咨询数据的会话请求;
    响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
    将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
    读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
    向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
  2. 根据权利要求1所述的基于BERT的文本分类方法,其中,在所述响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量的步骤之前,所述方法还包括下述步骤:
    读取训练数据库,在所述训练数据库中获取训练文本数据集,所述训练文本数据集至少包括第一正例样本、与所述第一正例样本类别相同的第二正例样本以及与所述第一正例样本类别不同的随机样本;
    将所述第一正例样本、所述第二正例样本以及所述随机样本分别输入至原始BERT网络进行所述特征转化操作,得到第一特征向量、第二特征向量以及随机特征向量;
    对所述第一特征向量以及所述第二特征向量进行向量相似度计算操作,得到同类向量相似度;
    对所述第一特征向量以及所述随机特征向量进行所述向量相似度计算操作,得到非同类向量相似度;
    基于所述同类向量相似度、所述非同类向量相似度以及所述三元组损失函数对所述BERT网络进行训练操作,得到所述基于三元组损失函数训练好的BERT网络。
  3. 根据权利要求2所述的基于BERT的文本分类方法,其中,所述基于所述同类向量相似度、非同类向量相似度以及三元组损失函数对所述Bert网络进行训练操作,得到所述基于三元组损失函数训练好的Bert网络的步骤,具体包括下述步骤:
    计算所述同类相似度的平均值,得到平均同类向量;
    计算所述非同类相似度的平均值,得到平均非同类向量;
    基于所述第一特征向量、所述平均同类向量、所述平均非同类向量以及所述三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
  4. 根据权利要求2所述的基于BERT的文本分类方法,其中,所述基于所述同类向量相似度、非同类向量相似度以及三元组损失函数对所述BERT网络进行训练操作,得到所述基于三元组损失函数训练好的BERT网络的步骤,具体包括下述步骤:
    基于所述同类相似度在所述第二特征向量中获取相似度最大的最大同类向量;
    基于所述非同类相似度在所述随机特征向量中获取相似度最小的最小随机向量;
    基于所述第一特征向量、所述最大同类向量、所述最小随机向量以及所述三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
  5. 根据权利要求2所述的基于BERT的文本分类方法,其中,所述三元组损失函数表示为:
    Figure PCTCN2021090505-appb-100001
    其中,N表示整个训练集的总数;
    Figure PCTCN2021090505-appb-100002
    表示第一正例样本;
    Figure PCTCN2021090505-appb-100003
    表示第一特征向量;
    Figure PCTCN2021090505-appb-100004
    表示第二正例样本;
    Figure PCTCN2021090505-appb-100005
    表示第二特征向量;
    Figure PCTCN2021090505-appb-100006
    表示随机样本;
    Figure PCTCN2021090505-appb-100007
    表示随机特征向量;α 表示第一正例样本与第二正例样本之间的距离和第一正例样本与随机样本之间的距离的最小间隔。
  6. 根据权利要求1所述的基于BERT的文本分类方法,其中,所述接收用户通过请求终端发送的携带有咨询数据的会话请求的步骤之后,还包括:
    将所述咨询数据存储至区块链中。
  7. 一种基于BERT的文本分类装置,其中,包括:
    请求接收模块,用于接收用户通过请求终端发送的携带有咨询数据的会话请求;
    特征转化模块,用于响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
    类别预测模块,用于将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
    话术获取模块,用于读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
    会话答复模块,用于向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
  8. 根据权利要求7所述的基于BERT的文本分类装置,其中,所述装置还包括:
    训练文本获取模块,用于读取训练数据库,在所述训练数据库中获取训练文本数据集,所述训练文本数据集至少包括第一正例样本、与所述第一正例样本类别相同的第二正例样本以及与所述第一正例样本类别不同的随机样本;
    特征转化训练模块,用于将所述第一正例样本、第二正例样本以及随机样本分别输入至原始BERT网络进行所述特征转化操作,得到第一特征向量、第二特征向量以及随机特征向量;
    相似度计算模块,用于分别对第一特征向量与第二特征向量、第一特征向量与随机特征向量进行向量相似度计算操作,同类向量相似度以及非同类向量相似度;
    网络训练模块,用于基于所述同类向量相似度、非同类向量相似度以及三元组损失函数对所述BERT网络进行训练操作,得到所述基于三元组损失函数训练好的BERT网络。
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的基于BERT的文本分类方法的步骤:
    接收用户通过请求终端发送的携带有咨询数据的会话请求;
    响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
    将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
    读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
    向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
  10. 根据权利要求9所述的计算机设备,其中,在所述响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量的步骤之前,所述方法还包括下述步骤:
    读取训练数据库,在所述训练数据库中获取训练文本数据集,所述训练文本数据集至少包括第一正例样本、与所述第一正例样本类别相同的第二正例样本以及与所述第一正例样本类别不同的随机样本;
    将所述第一正例样本、所述第二正例样本以及所述随机样本分别输入至原始BERT网络进行所述特征转化操作,得到第一特征向量、第二特征向量以及随机特征向量;
    对所述第一特征向量以及所述第二特征向量进行向量相似度计算操作,得到同类向量相似度;
    对所述第一特征向量以及所述随机特征向量进行所述向量相似度计算操作,得到非 同类向量相似度;
    基于所述同类向量相似度、所述非同类向量相似度以及所述三元组损失函数对所述BERT网络进行训练操作,得到所述基于三元组损失函数训练好的BERT网络。
  11. 根据权利要求10所述的计算机设备,其中,所述基于所述同类向量相似度、非同类向量相似度以及三元组损失函数对所述Bert网络进行训练操作,得到所述基于三元组损失函数训练好的Bert网络的步骤,具体包括下述步骤:
    计算所述同类相似度的平均值,得到平均同类向量;
    计算所述非同类相似度的平均值,得到平均非同类向量;
    基于所述第一特征向量、所述平均同类向量、所述平均非同类向量以及所述三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
  12. 根据权利要求10所述的计算机设备,其中,所述基于所述同类向量相似度、非同类向量相似度以及三元组损失函数对所述BERT网络进行训练操作,得到所述基于三元组损失函数训练好的BERT网络的步骤,具体包括下述步骤:
    基于所述同类相似度在所述第二特征向量中获取相似度最大的最大同类向量;
    基于所述非同类相似度在所述随机特征向量中获取相似度最小的最小随机向量;
    基于所述第一特征向量、所述最大同类向量、所述最小随机向量以及所述三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
  13. 根据权利要求10所述的计算机设备,其中,所述三元组损失函数表示为:
    Figure PCTCN2021090505-appb-100008
    其中,N表示整个训练集的总数;
    Figure PCTCN2021090505-appb-100009
    表示第一正例样本;
    Figure PCTCN2021090505-appb-100010
    表示第一特征向量;
    Figure PCTCN2021090505-appb-100011
    表示第二正例样本;
    Figure PCTCN2021090505-appb-100012
    表示第二特征向量;
    Figure PCTCN2021090505-appb-100013
    表示随机样本;
    Figure PCTCN2021090505-appb-100014
    表示随机特征向量;α表示第一正例样本与第二正例样本之间的距离和第一正例样本与随机样本之间的距离的最小间隔。
  14. 根据权利要求9所述的计算机设备,其中,所述接收用户通过请求终端发送的携带有咨询数据的会话请求的步骤之后,还包括:
    将所述咨询数据存储至区块链中。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的基于BERT的文本分类方法的步骤:
    接收用户通过请求终端发送的携带有咨询数据的会话请求;
    响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量;
    将所述会话特征向量输入至Dense分类层进行类别预测操作,得到预测分类结果;
    读取话术数据库,在所述话术数据库中获取与所述预测分类结果相对应的话术答复信息;
    向所述请求终端发送所述话术答复信息,以完成咨询会话答复。
  16. 根据权利要求15所述的计算机可读存储介质,其中,在所述响应所述会话请求,将所述咨询数据输入至基于三元组损失函数训练好的BERT网络进行特征转化操作,得到会话特征向量的步骤之前,所述方法还包括下述步骤:
    读取训练数据库,在所述训练数据库中获取训练文本数据集,所述训练文本数据集至少包括第一正例样本、与所述第一正例样本类别相同的第二正例样本以及与所述第一正例 样本类别不同的随机样本;
    将所述第一正例样本、所述第二正例样本以及所述随机样本分别输入至原始BERT网络进行所述特征转化操作,得到第一特征向量、第二特征向量以及随机特征向量;
    对所述第一特征向量以及所述第二特征向量进行向量相似度计算操作,得到同类向量相似度;
    对所述第一特征向量以及所述随机特征向量进行所述向量相似度计算操作,得到非同类向量相似度;
    基于所述同类向量相似度、所述非同类向量相似度以及所述三元组损失函数对所述BERT网络进行训练操作,得到所述基于三元组损失函数训练好的BERT网络。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述基于所述同类向量相似度、非同类向量相似度以及三元组损失函数对所述Bert网络进行训练操作,得到所述基于三元组损失函数训练好的Bert网络的步骤,具体包括下述步骤:
    计算所述同类相似度的平均值,得到平均同类向量;
    计算所述非同类相似度的平均值,得到平均非同类向量;
    基于所述第一特征向量、所述平均同类向量、所述平均非同类向量以及所述三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述基于所述同类向量相似度、非同类向量相似度以及三元组损失函数对所述BERT网络进行训练操作,得到所述基于三元组损失函数训练好的BERT网络的步骤,具体包括下述步骤:
    基于所述同类相似度在所述第二特征向量中获取相似度最大的最大同类向量;
    基于所述非同类相似度在所述随机特征向量中获取相似度最小的最小随机向量;
    基于所述第一特征向量、所述最大同类向量、所述最小随机向量以及所述三元组损失函数对所述BERT网络进行反向更新操作,得到所述基于三元组损失函数训练好的BERT网络。
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述三元组损失函数表示为:
    Figure PCTCN2021090505-appb-100015
    其中,N表示整个训练集的总数;
    Figure PCTCN2021090505-appb-100016
    表示第一正例样本;
    Figure PCTCN2021090505-appb-100017
    表示第一特征向量;
    Figure PCTCN2021090505-appb-100018
    表示第二正例样本;
    Figure PCTCN2021090505-appb-100019
    表示第二特征向量;
    Figure PCTCN2021090505-appb-100020
    表示随机样本;
    Figure PCTCN2021090505-appb-100021
    表示随机特征向量;α表示第一正例样本与第二正例样本之间的距离和第一正例样本与随机样本之间的距离的最小间隔。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述接收用户通过请求终端发送的携带有咨询数据的会话请求的步骤之后,还包括:
    将所述咨询数据存储至区块链中。
PCT/CN2021/090505 2020-11-03 2021-04-28 基于bert的文本分类方法、装置、计算机设备及存储介质 WO2022095354A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011212539.5 2020-11-03
CN202011212539.5A CN112328786A (zh) 2020-11-03 2020-11-03 基于bert的文本分类方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022095354A1 true WO2022095354A1 (zh) 2022-05-12

Family

ID=74323338

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090505 WO2022095354A1 (zh) 2020-11-03 2021-04-28 基于bert的文本分类方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112328786A (zh)
WO (1) WO2022095354A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292470A (zh) * 2022-09-30 2022-11-04 中邮消费金融有限公司 一种用于小额贷款智能客服的语义匹配方法及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328786A (zh) * 2020-11-03 2021-02-05 平安科技(深圳)有限公司 基于bert的文本分类方法、装置、计算机设备及存储介质
CN113064992A (zh) * 2021-03-22 2021-07-02 平安银行股份有限公司 投诉工单结构化处理方法、装置、设备及存储介质
CN113496005B (zh) * 2021-05-26 2022-04-08 北京房多多信息技术有限公司 一种信息管理方法、装置、电子设备及存储介质
CN113377909B (zh) * 2021-06-09 2023-07-11 平安科技(深圳)有限公司 释义分析模型训练方法、装置、终端设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196913A (zh) * 2019-05-23 2019-09-03 北京邮电大学 基于文本生成式的多实体关系联合抽取方法和装置
CN110222167A (zh) * 2019-07-03 2019-09-10 阿里巴巴集团控股有限公司 一种获取目标标准信息的方法和系统
US20200279105A1 (en) * 2018-12-31 2020-09-03 Dathena Science Pte Ltd Deep learning engine and methods for content and context aware data classification
CN112328786A (zh) * 2020-11-03 2021-02-05 平安科技(深圳)有限公司 基于bert的文本分类方法、装置、计算机设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009528B (zh) * 2017-12-26 2020-04-07 广州广电运通金融电子股份有限公司 基于Triplet Loss的人脸认证方法、装置、计算机设备和存储介质
WO2019231105A1 (ko) * 2018-05-31 2019-12-05 한국과학기술원 트리플릿 기반의 손실함수를 활용한 순서가 있는 분류문제를 위한 딥러닝 모델 학습 방법 및 장치
CN110263141A (zh) * 2019-06-25 2019-09-20 杭州微洱网络科技有限公司 一种基于bert的客服问答系统
CN110689878B (zh) * 2019-10-11 2020-07-28 浙江百应科技有限公司 一种基于XLNet的智能语音对话意图识别方法
CN111400470A (zh) * 2020-03-13 2020-07-10 深圳市腾讯计算机系统有限公司 问题处理方法、装置、计算机设备和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200279105A1 (en) * 2018-12-31 2020-09-03 Dathena Science Pte Ltd Deep learning engine and methods for content and context aware data classification
CN110196913A (zh) * 2019-05-23 2019-09-03 北京邮电大学 基于文本生成式的多实体关系联合抽取方法和装置
CN110222167A (zh) * 2019-07-03 2019-09-10 阿里巴巴集团控股有限公司 一种获取目标标准信息的方法和系统
CN112328786A (zh) * 2020-11-03 2021-02-05 平安科技(深圳)有限公司 基于bert的文本分类方法、装置、计算机设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292470A (zh) * 2022-09-30 2022-11-04 中邮消费金融有限公司 一种用于小额贷款智能客服的语义匹配方法及系统
CN115292470B (zh) * 2022-09-30 2023-02-03 中邮消费金融有限公司 一种用于小额贷款智能客服的语义匹配方法及系统

Also Published As

Publication number Publication date
CN112328786A (zh) 2021-02-05

Similar Documents

Publication Publication Date Title
WO2022095354A1 (zh) 基于bert的文本分类方法、装置、计算机设备及存储介质
US20150310862A1 (en) Deep learning for semantic parsing including semantic utterance classification
US11586817B2 (en) Word vector retrofitting method and apparatus
CN114780727A (zh) 基于强化学习的文本分类方法、装置、计算机设备及介质
CN112231569A (zh) 新闻推荐方法、装置、计算机设备及存储介质
WO2021068563A1 (zh) 样本数据处理方法、装置、计算机设备及存储介质
CN112084779B (zh) 用于语义识别的实体获取方法、装置、设备及存储介质
WO2023274187A1 (zh) 基于自然语言推理的信息处理方法、装置和电子设备
CN112686053A (zh) 一种数据增强方法、装置、计算机设备及存储介质
CN112528654A (zh) 自然语言处理方法、装置及电子设备
WO2023240878A1 (zh) 一种资源识别方法、装置、设备以及存储介质
CN113505601A (zh) 一种正负样本对构造方法、装置、计算机设备及存储介质
CN115687934A (zh) 意图识别方法、装置、计算机设备及存储介质
JP2021081713A (ja) 音声信号を処理するための方法、装置、機器、および媒体
JP2023002690A (ja) セマンティックス認識方法、装置、電子機器及び記憶媒体
CN115730597A (zh) 多级语义意图识别方法及其相关设备
CN115438149A (zh) 一种端到端模型训练方法、装置、计算机设备及存储介质
CN117312535A (zh) 基于人工智能的问题数据处理方法、装置、设备及介质
CN113420161A (zh) 一种节点文本融合方法、装置、计算机设备及存储介质
CN116821307A (zh) 内容交互方法、装置、电子设备和存储介质
CN114742058B (zh) 一种命名实体抽取方法、装置、计算机设备及存储介质
CN115759292A (zh) 模型的训练方法及装置、语义识别方法及装置、电子设备
CN115840820A (zh) 一种基于领域模板预训练的小样本文本分类方法
CN115827865A (zh) 一种融合多特征图注意力机制的不良文本分类方法及系统
CN115858776A (zh) 一种变体文本分类识别方法、系统、存储介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888058

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21888058

Country of ref document: EP

Kind code of ref document: A1