WO2021081945A1 - 一种文本分类方法、装置、电子设备及存储介质 - Google Patents

一种文本分类方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021081945A1
WO2021081945A1 PCT/CN2019/114871 CN2019114871W WO2021081945A1 WO 2021081945 A1 WO2021081945 A1 WO 2021081945A1 CN 2019114871 W CN2019114871 W CN 2019114871W WO 2021081945 A1 WO2021081945 A1 WO 2021081945A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
text
network
classification
classified
Prior art date
Application number
PCT/CN2019/114871
Other languages
English (en)
French (fr)
Inventor
刘园林
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2019/114871 priority Critical patent/WO2021081945A1/zh
Priority to CN201980099197.XA priority patent/CN114207605A/zh
Publication of WO2021081945A1 publication Critical patent/WO2021081945A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the embodiments of the present application relate to computer technology, and in particular, to a text classification method, device, electronic device, and storage medium.
  • Text classification refers to the automatic classification of text by electronic equipment according to a certain classification system or standard, and it is widely used in people's daily life. For example, for the recommendation business of electronic equipment, the electronic equipment is required to classify text based on a large amount of recommended content. For another example, when the electronic device is in the intelligent voice control function, the electronic device is required to classify the text content converted by the voice.
  • the implementation of text classification is inseparable from the model, and the accuracy of text classification mainly depends on the model.
  • This application provides a text classification method, device, electronic equipment and storage medium, which can improve the accuracy of text classification.
  • an embodiment of the present application provides a text classification method, including:
  • the semantic classification network includes a convolution layer and a classification layer with different hyperparameters
  • classification processing is performed on the text to be classified at the classification layer to determine the text category of the text to be classified.
  • an embodiment of the present application also provides a text classification device, including:
  • the first obtaining module is used to obtain the text to be classified
  • the first conversion module is configured to convert the text to be classified into a semantic matrix according to the semantic representation network of the pre-trained text classification model, wherein the text classification model is composed of the semantic representation network and the semantic classification network;
  • the convolution operation module is used to perform convolution operations on the semantic matrix at the convolution layer of the semantic classification network to obtain semantic features of various sizes, wherein the semantic classification network includes convolutions with different hyperparameters Layer and classification layer;
  • the classification module is configured to classify the text to be classified at the classification layer according to the semantic features of the multiple sizes, so as to determine the text category of the text to be classified.
  • an embodiment of the present application also provides an electronic device, including: a processor, a memory, and a computer program stored in the memory and running on the processor.
  • the processor implements text when the computer program is executed.
  • the semantic classification network includes a convolution layer and a classification layer with different hyperparameters
  • classification processing is performed on the text to be classified at the classification layer to determine the text category of the text to be classified.
  • an embodiment of the present application also provides a storage medium containing executable instructions of an electronic device.
  • the executable instructions of the electronic device are executed by an electronic device processor, the text classification described in the embodiments of the present application is executed. method.
  • FIG. 1 is a schematic diagram of the first flow of a text classification method provided by an embodiment of the present application.
  • Fig. 2 is a schematic structural diagram of a text classification model provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of the first structure of a semantic classification network provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a second structure of a semantic classification network provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a second process of a text classification method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of the third process of a text classification method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the fourth process of a text classification method provided by an embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a text classification device provided by an embodiment of the present application.
  • FIG. 9 is a first structural schematic diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present application.
  • the embodiment of the present application provides a text classification method, and the text classification method is applied to an electronic device.
  • the execution subject of the text classification method may be the text classification device provided in the embodiment of the present application, or an electronic device integrated with the text classification device.
  • the text classification device may be implemented in hardware or software, and the electronic device may be a smart device. Mobile phones, tablet computers, handheld computers, notebook computers, or desktop computers are equipped with processors and have processing capabilities.
  • FIG. 1 is a schematic diagram of the first process of a text classification method provided by an embodiment of this application.
  • the text classification method is applied to the electronic device provided in the embodiment of the present application.
  • the process of the text classification method provided in the embodiment of the present application may be as follows:
  • the text to be classified is an object used for text classification.
  • the length of the text to be classified is not specifically limited in the embodiments of the present application.
  • the text to be classified can be a sentence, a paragraph, an article, and so on.
  • the embodiment of the present application does not specifically limit it.
  • the text to be classified may be Chinese text, English text, Japanese text, etc.
  • the electronic device can obtain the text to be classified according to the user's selection instruction.
  • the stored document 1 is used as the text to be classified.
  • the seventh paragraph in document 1 is used as the text to be classified.
  • the electronic device may obtain the text to be classified through an image, where the image carries text information.
  • the electronic device acquires an image through a camera, and the image carries text information of "Lu from tonight is white, and the moon is the hometown Ming", and then text recognition is performed on the acquired image to obtain the text to be classified, that is, the text to be classified is "Lu From white tonight, the moon is the hometown of Ming.”
  • FIG. 2 is a schematic structural diagram of a text classification model provided by an embodiment of the application.
  • the text classification model is composed of a semantic representation network and a semantic classification network.
  • the semantic representation network is mainly used to transform text.
  • Semantic classification network is mainly used to classify text. It should be noted that the semantic classification network takes the output of the semantic representation network as input.
  • the semantic matrix is obtained by combining the semantic vectors of the characters in the text to be classified.
  • the number of rows of the semantic matrix is equal to the number of characters in the text to be classified, and the number of columns is equal to the dimension of the semantic vector of each character.
  • the number of rows of the semantic matrix is equal to the dimension of the semantic vector of each character, and the number of columns is equal to the number of characters of the text to be classified. It is understandable that the dimension of the semantic vector of each character mainly depends on the dictionary in the semantic representation network.
  • the semantic vector of "chun” is (X11, X12, X13), the semantic vector of " ⁇ ” is (X21, X22, X23), and the semantic vector of "to” is ( X31, X32, X33), the semantic vector of "le” is (X41, X42, X43), then the semantic matrix of "spring is here" is as follows:
  • the electronic device after obtaining the text to be classified, inputs the text to be classified into the semantic representation network of the pre-trained text classification model, and outputs the semantic matrix of the text to be classified.
  • the electronic device After converting the text to be classified into a semantic matrix, the electronic device passes multiple convolution kernels of different sizes and the corresponding volumes of the multiple convolution kernels in the convolutional layer of the semantic classification network.
  • the product step size performs convolution operation on the semantic matrix to obtain semantic features of various sizes. It should be noted that the convolution operation in this scheme is a one-dimensional convolution operation.
  • the semantic classification network includes a convolutional layer and a classification layer with different hyperparameters, and the semantic classification network may also include an input layer and an output layer.
  • the hyperparameters include the convolution step size, the convolution kernel size, and the padding size.
  • the semantic feature size obtained is mainly determined by the hyperparameters. Assuming that the size of the semantic matrix is N1 ⁇ N2, where N1 refers to the number of characters in the text to be classified, and N2 refers to the dimension of the semantic vector of each character, the semantic feature size calculation formula is as follows:
  • M represents the semantic feature size
  • N1 represents the number of rows of the semantic matrix
  • P represents the padding size
  • S represents the convolution step size
  • the convolution kernel size is F1 ⁇ F2. It should be noted that the padding size is adjusted according to the convolution kernel size and the convolution step size.
  • F2 in the convolution kernel size is equal to N2.
  • the semantic feature size calculation formula is as follows:
  • M represents the semantic feature size
  • N4 represents the number of columns of the semantic matrix
  • P represents the padding size
  • S represents the convolution step size
  • the convolution kernel size is F3 ⁇ F4. It should be noted that the padding size is adjusted according to the convolution kernel size and the convolution step size.
  • F3 in the convolution kernel size is equal to N3.
  • the convolutional layer outputs semantic features of various sizes.
  • the convolution layer can output semantic features of various sizes.
  • the semantic matrix is 100 ⁇ 100
  • Convolution kernels of different sizes correspond to different receptive fields. For example, a larger convolution kernel has a larger receptive field than a smaller convolution kernel and can extract richer information. Therefore, in this example, the feature extraction is performed through two convolution kernels, one large and one small, so that the obtained overall semantic features contain richer information, which can improve the accuracy of text classification.
  • the convolution layer can output semantic features of various sizes.
  • the semantic matrix is 100 ⁇ 100
  • the step size can also be further adjusted. While enriching the semantic features of the text to be classified, the dimensionality of the features is reduced to improve the computational efficiency of the network.
  • FIG. 3 is a first structural schematic diagram of a semantic classification network provided by an embodiment of this application.
  • One convolutional layer in the semantic classification network may include multiple subconvolutional layers, where each subconvolutional layer has different hyperparameters. For example, the sizes of the convolution kernels of multiple sub-convolutional layers are different, and the convolution step lengths of multiple sub-convolutional layers are different.
  • the electronic device can perform convolution operations on the semantic matrix in multiple sub-convolutional layers of the same convolutional layer at the same time to obtain semantic features of various sizes. Among them, a convolution operation is performed on the semantic matrix based on a sub-convolution layer to obtain a semantic feature of a size.
  • the semantic classification network in this embodiment may have multiple convolutional layers, where each convolutional layer is composed of multiple subconvolutional layers, and the sizes of the convolution kernels of the multiple subconvolutional layers are different. .
  • FIG. 4 is a schematic diagram of a second structure of a semantic classification network provided by an embodiment of this application.
  • a convolutional layer of the semantic classification network can have multiple convolution kernels, and each convolution kernel can perform convolution operations according to its corresponding convolution step size and padding size.
  • the electronic device may perform convolution operations on the semantic matrix through multiple convolution kernels of different sizes and the respective convolution step lengths of the multiple convolution kernels in the convolution layer to obtain semantic features of various sizes.
  • a convolution operation is performed on the semantic matrix to obtain a semantic feature of a size.
  • the semantic classification network in this embodiment can have multiple convolutional layers, where each convolutional layer can have multiple convolution kernels, and each convolution kernel can be based on its corresponding convolution step size. And fill size for convolution operation.
  • the electronic device can classify the text to be classified at the classification layer of the semantic classification network according to the semantic features of various sizes to determine the text category of the text to be classified. For example, referring to Figure 3, in the classification, the semantic features output by the first convolutional layer and the second convolutional layer are combined to determine the category label of the text to be classified in the classification layer of the semantic classification network to determine the text to be classified. Text category.
  • the electronic device After the electronic device obtains the text to be classified, it converts the text to be classified into a semantic matrix according to the semantic representation network of the pre-trained text classification model, and then converts the text to be classified into a semantic matrix in the semantic classification network of the text classification model.
  • the convolution layer performs convolution operations on the semantic matrix to obtain semantic features of various sizes.
  • the semantic classification network includes convolution layers with different hyperparameters.
  • the classification layer of the semantic classification network Perform classification processing on the text to be classified to determine the text category of the text to be classified.
  • the convolution layer based on different hyperparameters obtains semantic features of various sizes, which can enrich the semantic features of the text to be classified and prevent the low accuracy of text classification caused by the lack of semantic features of the text to be classified. Thereby improving the accuracy of text classification.
  • FIG. 5 is a schematic diagram of a second process of a text classification method provided by an embodiment of this application.
  • 102 may include 1021 and 1022, as follows:
  • the semantic vector of each character is combined into a semantic matrix.
  • the electronic device converts each character in the text to be classified into a semantic vector according to the semantic representation network of the pre-trained text classification model, where one character is converted into a semantic vector. After all the characters of the text to be classified are converted into semantic vectors, the semantic vectors of the characters are combined into a semantic matrix according to the sequence of the characters.
  • the electronic device may remove invalid characters in the text to be classified.
  • the invalid characters of the text to be classified include emoticons, space characters, garbled characters, etc.
  • the semantic classification network further includes a pooling layer.
  • the electronic device can perform pooling processing on the semantic features of each size in the pooling layer, and then perform the pooling process on the semantic features of each size according to the pooling layer. For the processed semantic features, the text to be classified is classified at the classification layer.
  • the electronic device can use max pooling at the pooling layer to pool semantic features of each size.
  • Electronic devices can also use k_maxpooling at the pooling layer to pool semantic features of each size.
  • the electronic device divides the semantic features of each size into multiple groups, and obtains the first largest semantic feature, the second largest semantic feature in each group... the k-th largest semantic feature, that is, according to the size of the semantic feature , Get k semantic features from each group.
  • the electronic device uses k_maxpooling at the pooling layer to pool semantic features of each size, which can obtain richer semantic features and improve the accuracy of text classification.
  • 104 may include 1041 and 1042, as follows:
  • the electronic device calculates the probability value of the text to be classified in each preset text category according to the semantic features of multiple sizes and the preset parameter matrix at the classification layer.
  • the preset text category with the largest probability value is determined as the text category of the text to be classified.
  • the number of preset text categories is not specifically limited in the embodiment of the present application, for example, the number of preset text categories is 30.
  • the probability value of the text to be classified in a preset text category refers to the occurrence probability value of an event (the text to be classified is the preset text category).
  • the number of preset text categories is equal to the number of probability values. It is understandable that the probability value calculated each time is greater than or equal to 0 and less than or equal to 1.
  • S1 text category For example, suppose there are 4 preset text categories, denoted as S1 text category, S2 text category, S3 text category, S4 text category, calculate the probability value P1 of the text to be classified in the text category S1, and calculate the text to be classified in the text S2
  • the probability value P2 on the category calculate the probability value P3 of the text to be classified in the S3 text category, calculate the probability value P4 of the text to be classified in the S4 text category, when the number of probability values obtained is equal to the preset text category, from the probability The value P1, the probability value P2, the probability value P3, and the probability value P4 are searched for the largest probability value, and the preset text category with the largest probability value is determined as the text category of the text to be classified.
  • P1>P2>P3>P4 the preset text category of P1 (S1 text category) is the text category of the text to be classified.
  • FIG. 6 is a schematic diagram of the third process of the text classification method provided by an embodiment of this application.
  • 102 before 102, it also includes 105, 106, and 107, as follows:
  • the semantic representation network convert the plurality of first training texts in the first training set into a plurality of first semantic matrices.
  • the electronic device before converting the plurality of first training texts in the first training set into a plurality of first semantic matrices according to the semantic representation network, obtains the plurality of first training texts, Form the first training set.
  • the electronic device can perform supervised training on the preset convolutional neural network, and then use the trained convolutional neural network as a semantic classification network, consisting of a semantic representation network and a semantic classification network Form a text classification model.
  • the first training text in the first training set carries the target category label
  • the target category label is manually set by the user
  • the first training text One-to-one correspondence with the target category label.
  • the format of the first training text is as follows: "text content ⁇ target category label”.
  • the format of the first training text is as follows: "target category label ⁇ text content” and so on.
  • the electronic device converts the text content of each first training text in the first training set into a first semantic matrix according to the semantic representation network. After the text content of the plurality of first training texts in the first training set is converted, a plurality of first semantic matrices are obtained.
  • the electronic device when converting the text content of each first training text in the first training set into a first semantic matrix, may convert the text content of the first training text into a semantic vector according to the semantic representation network. Then, based on the sequence of each character in the text content, the semantic vector of each character is combined into the first semantic matrix.
  • the first training text is "Celebrating the 70th Anniversary of the Founding of the People’s Republic of China ⁇ National Day”
  • the text content of the first training text (“ "Celebrating the 70th Anniversary of the Founding of the People’s Republic of China") each character is represented by a semantic vector
  • the semantic vector of each character is combined into the first semantic matrix according to the sequence of each character in the text content.
  • the dimension of the semantic vector can be a dimension greater than or equal to 3.
  • the first semantic matrix can be seen in the above expression.
  • the dimension of the semantic vector can be 6 dimensions, that is, the number of components in the semantic vector is 6.
  • the semantic vector of " ⁇ " is (A011, A012, A013, A014, A015, A016)
  • the semantic vector of " ⁇ ” is (A021, A022, A023, A024, A025, A026), and so on.
  • the electronic device converts the target category label of each first training text in the first training set into a third semantic matrix according to the semantic representation network. After the target category labels of the multiple first training texts in the first training set are converted, multiple third semantic matrices are obtained.
  • the electronic device when converting the target category label of each first training text in the first training set into a third semantic matrix, can convert the target category label of each first training text into semantics according to the semantic representation network vector. Then, based on the sequence of each character in the target category label, the semantic vector of each character is combined into a third semantic matrix.
  • each character in the target category label ("National Day") of the first training text is represented by a semantic vector
  • each character is represented by a semantic vector in the target category label.
  • the semantic vector of each character is combined into a third semantic matrix.
  • the convolutional neural network After converting the plurality of first training texts in the first training set into a plurality of first semantic matrices and a plurality of third semantic matrices, training pre-training based on the plurality of first semantic matrices and a plurality of third semantic matrices
  • the convolutional neural network the trained convolutional neural network is used as the semantic classification network
  • the text classification model is formed by the semantic representation network and the semantic classification network.
  • the electronic device may perform calculations on the preset convolutional neural networks based on multiple first semantic matrices and preset loss functions.
  • the network is trained iteratively until it converges.
  • the preset loss function is not specifically limited in the embodiment of the present application, for example, the preset loss function is a cross-entropy loss function.
  • the electronic device inputs a plurality of first semantic matrices into a preset convolutional neural network, and outputs the probability value of the first training text corresponding to each first semantic matrix in each preset text category.
  • the electronic device may perform iterative training on the preset convolutional neural network based on multiple first semantic matrices and preset loss functions until the loss value of the preset loss function is the smallest and the accuracy of the convolutional neural network tends to Yu stable.
  • multiple verification texts are obtained to form a verification set, and the accuracy of the convolutional neural network is calculated through the verification set. If the accuracy becomes stable, stop training, and use the trained convolutional neural network as a semantic classification network. If the accuracy rate has not stabilized, adjust the model parameters of the convolutional neural network, and continue to train the convolutional neural network.
  • the electronic device when the electronic device performs iterative training on the preset convolutional neural network, each time the model parameters are adjusted, it will reuse multiple first semantic matrices to train the preset convolutional neural network and calculate it through the validation set.
  • the current accuracy of the convolutional neural network Compare the current accuracy rate with the saved historical accuracy rate. If the current accuracy rate is greater than the historical accuracy rate, delete the model parameters corresponding to the historical accuracy rate, and save the current accuracy rate and the model parameters corresponding to the current accuracy rate. If the current accuracy rate is accurate If the rate is less than or equal to the historical accuracy rate, the current accuracy rate is saved, but the model parameters corresponding to the current accuracy rate are not saved. If the accuracy rate obtained multiple times does not increase, the training ends.
  • the preset convolutional neural network is trained based on the plurality of first semantic matrices, only the model parameters of the convolutional neural network are updated, and the model parameters of the semantic representation network are not changed.
  • the text classification model constructed by supervised training of the convolutional neural network helps to improve the accuracy of text classification.
  • FIG. 7 is a schematic diagram of the fourth process of the text classification method provided by an embodiment of the application.
  • 106 before 106, it also includes 108 and 109, as follows:
  • the semantic representation network of the text classification model is a BERT network after fine-tuning training.
  • the electronic device Before converting the plurality of first training texts in the first training set into a plurality of first semantic matrices according to the semantic representation network, the electronic device may obtain a plurality of second training texts to form a second training set . Use the second training set to perform fine-tuning training on the BERT network to update the model parameters of the BERT network.
  • the first training text and the second training text belong to the same type of information, but the first training text is different from the second training text.
  • the first training text is used to train the preset convolutional neural network to obtain the semantic classification network of the text classification model
  • the second training text is used to fine-tune the training of the BERT network to obtain the semantic representation network of the text classification model.
  • the BERT network in this scheme is a multi-layer bidirectional encoder.
  • the BERT network includes 12 transformer layers, and each transformer layer includes 4 structures: self-attention, regularization, full connection, and regularization.
  • the semantic representation network in the text classification model in this solution uses the get_sequence_output function, the output of the semantic representation network is a semantic matrix composed of the semantic vectors of characters. Compared with outputting a semantic matrix composed of semantic vectors of words, outputting a semantic matrix composed of semantic vectors of characters can improve the classification accuracy of short texts.
  • this solution trains the semantic representation network and the semantic classification network separately, that is, the electronic device first fine-tunes and trains the semantic representation network, and then trains the semantic classification network to obtain a text classification model with excellent training effects, which can improve the text classification performance Accuracy.
  • this application is not limited by the order of execution of the various steps described, and certain steps may also be performed in other order or at the same time if there is no conflict.
  • acquiring multiple pieces of first training text to form the first training set and using the second training set to train the BERT network may be performed at the same time.
  • 107 includes 1071 and 1072:
  • the electronic device simultaneously trains a preset convolutional neural network through multiple first semantic matrices obtained based on the first training text and multiple second semantic matrices obtained based on the second training text. That is, the second training text is not only used to fine-tune the training of the BERT network, but also used to train the preset convolutional neural network.
  • this solution uses migration learning when training convolutional neural networks, such as using multiple second semantic matrices obtained when fine-tuning the training of the BERT network to train the preset convolutional neural network, which can effectively prevent The obtained text classification model is over-fitted, which improves the accuracy of text classification.
  • the method before converting the plurality of first training texts in the first training set into a plurality of first semantic matrices according to the semantic representation network, the method further includes:
  • the semantic representation network of the text classification model is a BERT network. Any two of the semantic representation network, the baseline BERT network used for fine-tuning training, and the source BERT network are not the same network, but of the same type The internet. Before converting the plurality of first training texts in the first training set into a plurality of first semantic matrices according to the semantic representation network, use transfer learning to determine the model parameters of the semantic representation network in the text classification model .
  • the electronic device may obtain the model parameters of the source BERT network, in the baseline Load the model parameters of the source BERT network in the BERT network, and obtain multiple third training texts to form the third training set. Then use the third training set to fine-tune the baseline BERT network to update the model parameters of the baseline BERT network. Then load the baseline BERT network to fine-tune the updated model parameters after training in the BERT network of the text classification model.
  • the first training text, the second training text, and the third training text belong to the same type of information
  • the third training text is different from the first training text
  • the third training text can be different from the second training text or the third training text. It can also be the same as the second training text.
  • Fig. 8 is a schematic structural diagram of a text classification device provided by an embodiment of the present application.
  • the device is used to execute the text classification method provided in the foregoing embodiment, and has functional modules and beneficial effects corresponding to the execution method.
  • the text classification device 200 specifically includes: a first acquisition module 201, a first conversion module 202, a convolution operation module 203, and a classification module 204, wherein:
  • the first obtaining module 201 is used to obtain the text to be classified
  • the first conversion module 202 is configured to convert the text to be classified into a semantic matrix according to the semantic representation network of the pre-trained text classification model, where the text classification model is composed of the semantic representation network and the semantic classification network;
  • the convolution operation module 203 is configured to perform a convolution operation on the semantic matrix in the convolution layer of the semantic classification network to obtain semantic features of various sizes, wherein the semantic classification network includes convolutions with different hyperparameters. Build-up layer and classification layer;
  • the classification module 204 is configured to classify the text to be classified at the classification layer according to the semantic features of the multiple sizes, so as to determine the text category of the text to be classified.
  • the first conversion module 202 when converting the text to be classified into a semantic matrix according to the semantic representation network of the pre-trained text classification model, the first conversion module 202 may be used to:
  • the semantic vector of each character is combined into a semantic matrix.
  • the classification module 204 may be used to:
  • the preset text category with the largest probability value is determined as the text category of the text to be classified.
  • the text classification device 200 further includes a pooling processing module, the pooling processing module is used to: in the pooling layer, the semantics of each size The features are pooled; the classification module 204 is further configured to: perform classification processing on the text to be classified at the classification layer according to the semantic features of the pooling processing.
  • the text classification apparatus 200 before converting the text to be classified into a semantic matrix according to the semantic representation network of the pre-trained text classification model, the text classification apparatus 200 further includes a removal module for removing the text to be classified. Invalid characters in classified text.
  • the text classification apparatus 200 before acquiring the text to be classified, the text classification apparatus 200 further includes a second acquisition module, a second conversion module, and a first training module:
  • the second acquisition module is used to acquire a plurality of first training texts to form a first training set
  • the second conversion module is configured to convert the plurality of first training texts in the first training set into a plurality of first semantic matrices according to the semantic representation network;
  • the first training module is configured to train a preset convolutional neural network based on the plurality of first semantic matrices, and use the trained convolutional neural network as the semantic classification network, which is represented by the semantics
  • the network and the semantic classification network constitute the text classification model.
  • the text classification apparatus 200 before converting the plurality of first training texts in the first training set into a plurality of first semantic matrices according to the semantic representation network, the text classification apparatus 200 further includes a third acquisition module And the second training module:
  • the third acquisition module is used to acquire a plurality of second training texts to form a second training set
  • the second training module is configured to use the second training set to train the BERT network to update the model parameters of the BERT network.
  • the first training module when training a preset convolutional neural network based on the plurality of first semantic matrices, is further configured to:
  • the first training module when training a preset convolutional neural network based on the plurality of first semantic matrices, the first training module may be used to:
  • the first acquisition module 201 acquires the text to be classified, and then the first conversion module 202 converts the text to be classified according to the semantic representation network of the pre-trained text classification model Is a semantic matrix, and then the convolution operation module 203 performs a convolution operation on the semantic matrix on the convolution layer of the semantic classification network of the text classification model to obtain semantic features of various sizes, wherein the semantic classification network includes Convolutional layer and classification layer of hyperparameters.
  • the classification module 204 classifies the text to be classified in the classification layer according to the semantic features of the multiple sizes to determine the text category of the text to be classified,
  • the semantic features of the text to be classified can be enriched, and the low accuracy of text classification caused by the few semantic features of the text to be classified can be prevented, thereby improving the accuracy of text classification.
  • the text classification device provided in this embodiment of the application belongs to the same concept as the text classification method in the above embodiment. Any method provided in the text classification method embodiment can be run on the text classification device, and its specific implementation For details of the process, refer to the embodiment of the text classification method, which will not be repeated here.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM,), or a random access device (Random Access Memory, RAM), etc.
  • the electronic device 300 includes a processor 301 and a memory 302. Wherein, the processor 301 and the memory 302 are electrically connected.
  • the processor 301 is the control center of the electronic device 300. It uses various interfaces and lines to connect the various parts of the entire electronic device. It executes the electronic device by running or loading the computer program stored in the memory 302 and calling the data stored in the memory 302. Various functions of the device 300 and processing data.
  • the memory 302 may be used to store software programs and modules.
  • the processor 301 executes various functional applications and data processing by running the computer programs and modules stored in the memory 302.
  • the memory 302 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, a computer program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of electronic equipment, etc.
  • the memory 302 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 302 may also include a memory controller to provide the processor 301 with access to the memory 302.
  • the processor 301 in the electronic device 300 will load the instructions corresponding to the process of one or more computer programs into the memory 302 according to the following steps, and run the instructions by the processor 301 and store them in the memory 302.
  • the processor 301 in the electronic device 300 will load the instructions corresponding to the process of one or more computer programs into the memory 302 according to the following steps, and run the instructions by the processor 301 and store them in the memory 302.
  • the semantic classification network includes a convolution layer and a classification layer with different hyperparameters
  • classification processing is performed on the text to be classified at the classification layer to determine the text category of the text to be classified.
  • FIG. 10 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the application.
  • the electronic device further includes a camera component 303, a radio frequency circuit 304, an audio circuit 305, and Power supply 306.
  • the camera component 303, the radio frequency circuit 304, the audio circuit 305, and the power supply 306 are electrically connected to the processor 301, respectively.
  • the camera component 303 may include an image processing circuit, which may be implemented by hardware and/or software components, and may include various processing units that define an image signal processing (Image Signal Processing) pipeline.
  • the image processing circuit may at least include: multiple cameras, an image signal processor (Image Signal Processor, ISP processor), a control logic, an image memory, a display, and the like.
  • Each camera may include at least one or more lenses and image sensors.
  • the image sensor may include a color filter array (such as a Bayer filter). The image sensor can obtain the light intensity and wavelength information captured with each imaging pixel of the image sensor, and provide a set of raw image data that can be processed by the image signal processor.
  • the radio frequency circuit 304 may be used to transmit and receive radio frequency signals to establish wireless communication with network equipment or other electronic equipment through wireless communication, and to transmit and receive signals with the network equipment or other electronic equipment.
  • the audio circuit 305 can be used to provide an audio interface between the user and the electronic device through a speaker or a microphone.
  • the power supply 306 can be used to power various components of the electronic device 300.
  • the power supply 306 may be logically connected to the processor 301 through a power management system, so that functions such as charging, discharging, and power consumption management can be managed through the power management system.
  • the processor 301 in the electronic device 300 will load the instructions corresponding to the process of one or more computer programs into the memory 302 according to the following steps, and run the instructions by the processor 301 and store them in the memory 302.
  • the processor 301 in the electronic device 300 will load the instructions corresponding to the process of one or more computer programs into the memory 302 according to the following steps, and run the instructions by the processor 301 and store them in the memory 302.
  • the semantic classification network includes a convolution layer and a classification layer with different hyperparameters
  • classification processing is performed on the text to be classified at the classification layer to determine the text category of the text to be classified.
  • the processor 301 may execute:
  • the semantic vector of each character is combined into a semantic matrix.
  • the processor 301 when performing classification processing on the text to be classified at the classification layer according to the semantic features of the multiple sizes, the processor 301 may execute:
  • the preset text category with the largest probability value is determined as the text category of the text to be classified.
  • the semantic classification network further includes a pooling layer. After obtaining semantic features of various sizes, the processor 301 may execute:
  • pooling is performed on semantic features of each size
  • the processor 301 may execute:
  • the text to be classified is classified at the classification layer.
  • the processor 301 may execute:
  • the processor 301 may execute:
  • the semantic representation network converting the plurality of first training texts in the first training set into a plurality of first semantic matrices
  • the semantic representation network is a BERT network; before converting the plurality of first training texts in the first training set into a plurality of first semantic matrices according to the semantic representation network, 301 can execute:
  • the second training set is used to train the BERT network to update the model parameters of the BERT network.
  • the processor 301 may execute:
  • the processor 301 may execute:
  • the electronic device After acquiring the text to be classified, the electronic device provided in this embodiment converts the text to be classified into a semantic matrix according to the semantic representation network of the pre-trained text classification model, and then converts the text to be classified into a semantic matrix.
  • the convolution layer performs convolution operations on the semantic matrix to obtain semantic features of various sizes.
  • the semantic classification network includes convolution layers with different hyperparameters.
  • the classification layer of the semantic classification network Performing classification processing on the text to be classified to determine the text category of the text to be classified can enrich the semantic features of the text to be classified, prevent low text classification accuracy caused by the lack of semantic features of the text to be classified, thereby improving the accuracy of text classification.
  • the embodiments of the present application also provide a storage medium that stores a computer program, and when the computer program is run on a computer, the computer is caused to execute the text classification method in any of the above embodiments, such as: obtaining the text to be classified According to the semantic representation network of the pre-trained text classification model, the text to be classified is converted into a semantic matrix, wherein the text classification model is composed of the semantic representation network and the semantic classification network; in the semantic classification network
  • the convolution layer performs a convolution operation on the semantic matrix to obtain semantic features of various sizes, wherein the semantic classification network includes a convolution layer and a classification layer with different hyperparameters; according to the semantic features of the multiple sizes , Performing classification processing on the text to be classified at the classification layer to determine the text category of the text to be classified.
  • the storage medium may be a magnetic disk, an optical disk, a read only memory (Read Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
  • the computer program can be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by at least one processor in the electronic device.
  • the execution process can include the implementation of a text classification method.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, and the like.
  • the text classification device of the embodiment of the present application its functional modules may be integrated in one processing chip, or each module may exist alone physically, or two or more modules may be integrated in one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium, such as a read-only memory, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种文本分类方法、装置、电子设备及存储介质,该方法包括获取待分类文本;根据预先训练的文本分类模型的语义表征网络,将待分类文本转换为语义矩阵;在文本分类模型的语义分类网络的卷积层对语义矩阵进行卷积运算,得到多种尺寸的语义特征;根据多种尺寸的语义特征,在分类层确定待分类文本的文本类别。

Description

一种文本分类方法、装置、电子设备及存储介质 技术领域
本申请实施例涉及计算机技术,尤其涉及一种文本分类方法、装置、电子设备及存储介质。
背景技术
文本分类是指电子设备按照一定的分类体系或标准对文本进行自动分类,其在人们的日常生活中应用十分广泛。例如,对于电子设备的推荐业务,需要电子设备根据大量推荐内容进行文本分类。又如,在电子设备处于智能语音控制功能时,需要电子设备对由语音转换的文字内容进行文本分类。
目前,文本分类的实施离不开模型,且文本分类的准确度主要取决于模型。
发明内容
本申请提供了一种文本分类方法、装置、电子设备及存储介质,可以提高文本分类的准确度。
第一方面,本申请实施例提供了一种文本分类方法,包括:
获取待分类文本;
根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网络构成;
在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
第二方面,本申请实施例还提供了一种文本分类装置,包括:
第一获取模块,用于获取待分类文本;
第一转换模块,用于根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网络构成;
卷积运算模块,用于在所述语义分类网络的卷积层对所述语义矩阵进行卷 积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
分类模块,用于根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
第三方面,本申请实施例还提供了一种电子设备,包括:处理器、存储器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现文本分类方法:
获取待分类文本;
根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由语义表征网络和语义分类网络构成;
在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
第四方面,本申请实施例还提供了一种包含电子设备可执行指令的存储介质,所述电子设备可执行指令在由电子设备处理器执行时用于执行本申请实施例所述的文本分类方法。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显。
图1是本申请实施例提供的文本分类方法的第一流程示意图。
图2是本申请实施例提供的文本分类模型的结构示意图。
图3是本申请实施例提供的语义分类网络的第一结构示意图。
图4是本申请实施例提供的语义分类网络的第二结构示意图。
图5是本申请实施例提供的文本分类方法的第二流程示意图。
图6是本申请实施例提供的文本分类方法的第三流程示意图。
图7是本申请实施例提供的文本分类方法的第四流程示意图。
图8是本申请实施例提供的文本分类装置的结构示意图。
图9是本申请实施例提供的电子设备的第一结构示意图。
图10是本申请实施例提供的电子设备的第二结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
本申请实施例提供一种文本分类方法,该文本分类方法应用于电子设备。其中,该文本分类方法的执行主体可以是本申请实施例提供的文本分类装置,或者集成了该文本分类装置的电子设备,该文本分类装置可以采用硬件或者软件的方式实现,电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等配置有处理器而具有处理能力的设备。
请参照图1,图1为本申请实施例提供的文本分类方法的第一流程示意图。该文本分类方法应用于本申请实施例提供的电子设备,如图1所示,本申请实施例提供的文本分类方法的流程可以如下:
101、获取待分类文本。
其中,待分类文本是用于文本分类的对象。对于待分类文本的篇幅,本申请实施例不作具体限定。例如,待分类文本可以是一个句子、一个段落、一篇文章等。对于待分类文本的语言,本申请实施例也不作具体限定。例如,待分类文本可以是中文文本、英语文本、日语文本等。
在一些实施例中,电子设备可以根据用户的选择指令,获取待分类文本。例如,根据用户的选择指令,将存储的文档1作为待分类文本。例如,根据用户的选择指令,将文档1中第7段作为待分类文本。
在一些实施例中,电子设备可以通过图像来获取待分类文本,其中,该图像携带文字信息。例如,电子设备通过摄像头获取图像,该图像中携带有“露从今夜白,月是故乡明”的文字信息,再对获取的图像进行文字识别,得到待分类文本,即待分类文本为“露从今夜白,月是故乡明”。
102、根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网 络构成。
其中,如图2所示,图2为本申请实施例提供的文本分类模型的结构示意图,文本分类模型由语义表征网络和语义分类网络构成。语义表征网络主要用于对文本进行转换。语义分类网络主要用于对文本进行分类。需要说明的是,该语义分类网络以该语义表征网络的输出作为输入。
该方案中,语义矩阵由待分类文本中各字符的语义向量组合得到。语义矩阵的行数等于待分类文本的字符数,列数等于各字符的语义向量的维数。或者,语义矩阵的行数等于各字符的语义向量的维数,列数等于待分类文本的字符数。可以理解的是,各字符语义向量的维数,主要取决于该语义表征网络中的字典。
例如,假设待分类文本是“春天到了”,“春”的语义向量为(X11,X12,X13),“天”的语义向量为(X21,X22,X23),“到”的语义向量为(X31,X32,X33),“了”的语义向量为(X41,X42,X43),那么“春天到了”的语义矩阵如下:
Figure PCTCN2019114871-appb-000001
Figure PCTCN2019114871-appb-000002
本申请实施例中,在获取待分类文本之后,电子设备将待分类文本输入预先训练的文本分类模型的语义表征网络,输出待分类文本的语义矩阵。
103、在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层。
本申请实施例中,在将所述待分类文本转换为语义矩阵之后,电子设备在语义分类网络的卷积层中,通过不同大小的多个卷积核以及多个卷积核各自对应的卷积步长对语义矩阵进行卷积运算,得到多种尺寸的语义特征。需要说明的是,该方案中的卷积运算,是一维卷积运算。
其中,语义分类网络包括具有不同超参数的卷积层和分类层,语义分类网络还可以包括输入层和输出层。可以理解的是,超参数包括卷积步长、卷积核大小以及填充尺寸。在对语义矩阵进行卷积运算时,得到的语义特征尺寸主要由超参数决定。假设语义矩阵的大小为N1×N2,其中,N1是指待分类文本 的字符数,N2是指各字符的语义向量的维数,则语义特征尺寸的计算公式如下:
Figure PCTCN2019114871-appb-000003
其中,M表示语义特征尺寸,N1表示语义矩阵的行数,P表示填充尺寸,S表示卷积步长,卷积核尺寸为F1×F2。需要说明的是,填充尺寸根据卷积核尺寸和卷积步长调整,在对语义矩阵进行卷积运算时,卷积核尺寸中的F2等于N2。
假设语义矩阵的大小为N3×N4,其中,N3是指各字符的语义向量的维数,N4是指待分类文本的字符数,则语义特征尺寸的计算公式如下:
Figure PCTCN2019114871-appb-000004
其中,M表示语义特征尺寸,N4表示语义矩阵的列数,P表示填充尺寸,S表示卷积步长,卷积核尺寸为F3×F4。需要说明的是,填充尺寸根据卷积核尺寸和卷积步长调整,在对语义矩阵进行卷积运算时,卷积核尺寸中的F3等于N3。
本实施例中,通过调整上述多个超参数中的一个或者多个超参数,使得卷积层输出多种尺寸的语义特征。
例如,通过调整超参数中的卷积核尺寸,使得卷积层输出多种尺寸的语义特征。比如,语义矩阵为100×100,电子设备在卷积神经网络的卷积层,基于大小为100*3的卷积核、S=2的卷积步长、P=1的填充尺寸对语义矩阵进行卷积计算,可得到1×50尺寸的语义特征。电子设备在卷积层基于大小为100*5的卷积核、S=2的卷积步长、P=3的填充尺寸对语义矩阵进行卷积计算,可得到1×50尺寸的语义特征。不同大小的卷积核对应的感受野不同,比如,较大的卷积核相对于较小的卷积核,具有更大的感受野,可以提取到更丰富的信息。因此,该例子中,通过一大一小两种卷积核进行特征的提取,使得获取到的整体语义特征中包含有更丰富的信息,可以提高文本分类的准确度。
又例如,通过调整超参数中的卷积核尺寸和步长,使得卷积层输出多种尺寸的语义特征。比如,语义矩阵为100×100,电子设备在卷积神经网络的卷 积层,基于大小为100*3的卷积核、S=2的卷积步长、P=1的填充尺寸对语义矩阵进行卷积计算,可得到1×50尺寸的语义特征。电子设备在卷积层基于大小为100*5的卷积核、S=3的卷积步长、P=4的填充尺寸对语义矩阵进行卷积计算,可得到1×34尺寸的语义特征。该例子中,在调整卷积核尺寸的同时,还可以进一步调整步长,在丰富待分类文本的语义特征的同时,对特征进行了降维,提高网络的计算效率。
在一些实施例中,请参阅图3,图3为本申请实施例提供的语义分类网络的第一结构示意图。语义分类网络中的一个卷积层可以包括多个子卷积层,其中,每个子卷积层具有不同的超参数。例如,多个子卷积层的卷积核的大小不相同,多个子卷积层的卷积步长不相同。该方案中,电子设备可以同时在同一个卷积层的多个子卷积层中对语义矩阵进行卷积运算,得到多种尺寸的语义特征。其中,基于一个子卷积层对语义矩阵进行卷积运算,得到一种尺寸的语义特征。可以理解的是,该实施例中的语义分类网络可以具有多个卷积层,其中,每一个卷积层由多个子卷积层构成,且多个子卷积层的卷积核的大小不相同。
在一些实施例中,请参阅图4,图4为本申请实施例提供的语义分类网络的第二结构示意图。语义分类网络的一个卷积层可以具有多种卷积核,每个卷积核可以按照各自对应的卷积步长和填充尺寸进行卷积运算。例如,电子设备可以在该卷积层中通过不同大小的多个卷积核以及多个卷积核各自对应的卷积步长,对语义矩阵进行卷积运算,得到多种尺寸的语义特征。其中,每通过一个卷积核以及该个卷积核对应的卷积步长,对语义矩阵进行卷积运算,得到一种尺寸的语义特征。可以理解的是,该实施例中的语义分类网络可以具有多个卷积层,其中,每一个卷积层可以具有多种卷积核,每个卷积核可以按照各自对应的卷积步长和填充尺寸进行卷积运算。
104、根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
本申请实施例中,在得到多种尺寸的语义特征之后,电子设备可以根据多种尺寸的语义特征,在语义分类网络的分类层对待分类文本进行分类处理,以确定待分类文本的文本类别。例如,请参阅图3,在分类时,结合第一卷积层和第二卷积层输出的语义特征,在语义分类网络的分类层中确定待分类文本的 类别标签,以确定待分类文本的文本类别。
由上可知,本申请实施例中,电子设备在获取待分类文本之后,根据预先训练的文本分类模型的语义表征网络,将待分类文本转换为语义矩阵,然后在文本分类模型的语义分类网络的卷积层对语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,语义分类网络包括具有不同超参数的卷积层,最后根据多种尺寸的语义特征,在语义分类网络的分类层对待分类文本进行分类处理,以确定待分类文本的文本类别。本方案在进行卷积运算时,基于不同超参数的卷积层得到多种尺寸的语义特征,可以丰富待分类文本的语义特征,防止由待分类文本语义特征少引起的文本分类准确度低,从而提高文本分类的准确度。
请参阅图5,图5为本申请实施例提供的文本分类方法的第二流程示意图。
在一些实施例中,102可以包括1021以及1022,如下:
1021、根据预先训练的文本分类模型的语义表征网络,将所述待分类文本中的各字符转换为语义向量。
1022、基于各字符的先后顺序,将各字符的语义向量组合为语义矩阵。
本申请实施例中,在获取待分类文本之后,电子设备根据预先训练的文本分类模型的语义表征网络,将待分类文本中的各字符转换为语义向量,其中,一个字符转换为一个语义向量。在待分类文本的所有字符都转化为语义向量后,根据各字符的先后顺序,将各字符的语义向量组合成语义矩阵。
在一些实施例中,在获取待分类文本之后,将所述待分类文本中的各字符转换为语义向量之前,电子设备可以去除待分类文本中的无效字符。其中,待分类文本的无效字符包括表情字符、空格字符、乱码字符等。
在一些实施例中,语义分类网络还包括池化层,在得到多种尺寸的语义特征之后,电子设备可以在池化层,对每一种尺寸的语义特征进行池化处理,然后根据池化处理的语义特征,在所述分类层对所述待分类文本进行分类处理。
其中,电子设备在池化层可以采用max pooling的方式来对每一种尺寸的语义特征进行池化处理。电子设备在池化层也可以采用k_maxpooling的方式来对每一种尺寸的语义特征进行池化处理。
例如,电子设备将每一种尺寸的语义特征划分成多个小组,获取每小组中第一大的语义特征、第二大的语义特征……第k大的语义特征,即按照语义特 征的大小,从每小组中获取k个语义特征。电子设备在池化层采用k_maxpooling的方式来对每一种尺寸的语义特征进行池化处理,可以获取更加丰富的语义特征,提高文本分类的准确度。
在一些实施例中,104可以包括1041以及1042,如下:
1041、在所述分类层根据所述多种尺寸的语义特征,计算所述待分类文本在每个预设文本类别上的概率值。
1042、将概率值最大的预设文本类别确定为所述待分类文本的文本类别。
本申请实施例中,在得到多种尺寸的语义特征之后,电子设备在分类层根据多种尺寸的语义特征和预设参数矩阵,计算待分类文本在每个预设文本类别上的概率值,将概率值最大的预设文本类别确定为待分类文本的文本类别。
其中,对于预设文本类别的数目,本申请实施例不作具体限定,如预设文本类别的数目为30。待分类文本在一个预设文本类别上的概率值是指一项事件(待分类文本为该个预设文本类别)的发生概率值。预设文本类别的数量和概率值的数量相等。可以理解的是,每次计算所得的概率值大于等于0且小于等于1。
例如,假设预设文本类别有4个,记为S1文本类别、S2文本类别、S3文本类别、S4文本类别,计算待分类文本在S1文本类别上的概率值P1,计算待分类文本在S2文本类别上的概率值P2,计算待分类文本在S3文本类别上的概率值P3,计算待分类文本在S4文本类别上的概率值P4,当得到的概率值数目等于预设文本类别时,从概率值P1、概率值P2、概率值P3、概率值P4中查找最大的概率值,将概率值最大的预设文本类别确定为待分类文本的文本类别。如假设P1>P2>P3>P4,则P1的预设文本类别(S1文本类别)是待分类文本的文本类别。
请参阅图6,图6为本申请实施例提供的文本分类方法的第三流程示意图。
在一些实施例中,102之前,还包括105、106以及107,如下:
105、获取多条第一训练文本,构成第一训练集。
106、根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵。
107、基于所述多个第一语义矩阵训练预设的卷积神经网络,并将训练后 的所述卷积神经网络作为所述语义分类网络,由所述语义表征网络和语义分类网络构成所述文本分类模型。
本申请实施例中,在根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,电子设备获取多条第一训练文本,构成第一训练集。
该方案中,在构成第一训练集后,电子设备可以对预设的卷积神经网络进行有监督训练,然后将训练后的卷积神经网络作为语义分类网络,由语义表征网络和语义分类网络构成文本分类模型。
可以理解的是,若电子设备对预设的卷积神经网络进行有监督训练,则第一训练集中的第一训练文本携带有目标类别标签,该目标类别标签由用户手动设置,第一训练文本与目标类别标签一一对应。关于目标类别标签相对于文本内容的设置位置,本申请不作具体限定。例如第一训练文本的格式如下:“文本内容\目标类别标签”。例如第一训练文本的格式如下:“目标类别标签\文本内容”等。
一方面,在构成第一训练集后,电子设备根据语义表征网络,将第一训练集中的每条第一训练文本的文本内容转换为一个第一语义矩阵。当第一训练集的多条第一训练文本的文本内容转换完毕后,得到多个第一语义矩阵。
其中,在将第一训练集中的每条第一训练文本的文本内容转换为一个第一语义矩阵时,电子设备可以根据语义表征网络,将第一训练文本的文本内容转换为语义向量。然后基于文本内容中各字符的先后顺序,将各字符的语义向量组合为第一语义矩阵。
例如,以一条第一训练文本为例,假设该条第一训练文本是“庆祝中华人民共和国成立七十周年\国庆”,基于文本分类模型的语义表征网络,第一训练文本的文本内容(“庆祝中华人民共和国成立七十周年”)中的每个字符用一个语义向量表示,并按照各字符在文本内容中的先后顺序,将各字符的语义向量组合为第一语义矩阵。其中,对于语义向量的维数,可以是大于或等于3的维数。
Figure PCTCN2019114871-appb-000005
如第一语义矩阵可参见上方表达式,此时,语义向量的维数可以是6维,即语义向量中分量的个数是6。“庆”的语义向量为(A011,A012,A013,A014,A015,A016),“庆”的语义向量为(A021,A022,A023,A024,A025,A026),依次类推。
另一方面,在构成第一训练集后,电子设备根据语义表征网络,将第一训练集中的每条第一训练文本的目标类别标签转换为一个第三语义矩阵。当第一训练集的多条第一训练文本的目标类别标签转换完毕后,得到多个第三语义矩阵。
其中,在将第一训练集中的每条第一训练文本的目标类别标签转换为一个第三语义矩阵时,电子设备可以根据语义表征网络,将每条第一训练文本的目标类别标签转换为语义向量。然后基于目标类别标签中各字符的先后顺序,将各字符的语义向量组合为第三语义矩阵。
例如,继上述例子,基于文本分类模型的语义表征网络,将第一训练文本的目标类别标签(“国庆”)中的每个字符用一个语义向量表示,并按照各字符在目标类别标签中的先后顺序,将各字符的语义向量组合成第三语义矩阵。
在将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵和多个第三语义矩阵之后,基于多个第一语义矩阵和多个第三语义矩阵训练预设的卷积神经网络,将训练后的卷积神经网络作为语义分类网络,由语义表征网络和语义分类网络构成文本分类模型。
在一些实施例中,基于多个第一语义矩阵和多个训练预设的卷积神经网络时,电子设备可以基于多个第一语义矩阵和预设的损失函数,对预设的卷积神经网络进行迭代训练直至收敛。对于预设的损失函数,本申请实施例不作具体限定,如预设的损失函数为交叉熵损失函数。
例如,电子设备将多个第一语义矩阵输入至预设的卷积神经网络中,输出每个第一语义矩阵对应的第一训练文本在每个预设文本类别上的概率值。根据概率值和预设的损失函数计算目标损失值。若目标损失值未达最小,则调整卷积神经网络的模型参数,并返回执行将多个第一语义矩阵输入至预设的卷积神经网络中。若目标损失值达到最小,此时收敛。
或者,电子设备可以基于多个第一语义矩阵和预设的损失函数,对预设的卷积神经网络进行迭代训练,直至预设的损失函数的损失值最小且卷积神经网络的准确率趋于稳定。在模型收敛后,获取多个验证文本,构成验证集,通过验证集计算卷积神经网络的准确率。若准确率趋于稳定则停止训练,将训练后的卷积神经网络作为语义分类网络。若准确率未趋于稳定则调整卷积神经网络的模型参数,继续对卷积神经网络进行训练。
又者,电子设备在对预设的卷积神经网络进行迭代训练时,每调整一次模型参数,会重新使用多个第一语义矩阵对预设的卷积神经网络进行训练,并通过验证集计算卷积神经网络的当前准确率。将当前准确率与保存的历史准确率进行大小比较,若当前准确率大于历史准确率,则删除历史准确率对应的模型参数,并保存当前准确率以及当前准确率对应的模型参数,若当前准确率小于或等于历史准确率,则保存当前准确率,但不保存当前准确率对应的模型参数。若多次得到的准确率不增加,则结束训练。
需要说明的是,在基于所述多个第一语义矩阵训练预设的卷积神经网络时,更新的仅是卷积神经网络的模型参数,不改变语义表征网络的模型参数。此外,通过有监督训练卷积神经网络以构建的文本分类模型,有利于提高文本分类的准确度。
请参阅图7,图7为本申请实施例提供的文本分类方法的第四流程示意图。
在一些实施例中,106之前,还包括108、109,如下:
108、获取多条第二训练文本,构成第二训练集。
109、使用所述第二训练集对BERT网络进行训练,以更新所述BERT网络的模型参数。
本申请实施例中,文本分类模型的语义表征网络为微调训练后的BERT网络。在根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,电子设备可以获取多条第二训练文本,构成第二训练集。使用所述第二训练集对所述BERT网络进行微调训练,以更新所述BERT网络的模型参数。
其中,第一训练文本和第二训练文本属于同种类型的信息,但第一训练文本不同于第二训练文本。第一训练文本用于训练预设的卷积神经网络,得到文本分类模型的语义分类网络,第二训练文本用于对BERT网络进行微调训练,得到文本分类模型的语义表征网络。
该方案中的BERT网络,是一种多层双向编码器。BERT网络包括12个transformer层,每一transformer层包括4个结构:自注意力、正则化、全连接、正则化。因为该方案中文本分类模型中的语义表征网络采用get_sequence_output函数,所以语义表征网络输出的是由字符的语义向量组成的语义矩阵。相比于输出由词语的语义向量组成的语义矩阵,输出由字符的语义向量组成的语义矩阵,可以提高短文本的分类准确度。
需要说明的是,该方案分开训练语义表征网络和语义分类网络,即电子设备首先微调训练语义表征网络,然后对语义分类网络进行训练,得到训练效果优异的文本分类模型,从而可以提高文本分类的准确度。
此外,需要说明的是,具体实施时,本申请不受所描述的各个步骤的执行顺序的限制,在不产生冲突的情况下,某些步骤还可以采用其它顺序进行或者同时进行。例如,在一些实施例中,获取多条第一训练文本,构成第一训练集与使用所述第二训练集对所述BERT网络进行训练可以同时进行。
在一些实施例中,107包括1071和1072:
1071、获取使用所述第二训练集训练所述BERT网络时得到的多个第二语义矩阵。
1072、基于所述多个第一语义矩阵和所述多个第二语义矩阵,训练预设的卷积神经网络,并将训练后的所述卷积神经网络作为所述语义分类网络,由所 述语义表征网络和语义分类网络构成所述文本分类模型。
本申请实施例中,电子设备通过基于第一训练文本得到的多个第一语义矩阵和基于第二训练文本得到的多个第二语义矩阵,同时训练预设的卷积神经网络。即第二训练文本除了用于对BERT网络进行微调训练,还用于训练预设的卷积神经网络。
需要说明的是,该方案在对卷积神经网络训练时采用迁移学习的方式,如使用微调训练BERT网络时得到的多个第二语义矩阵对预设的卷积神经网络进行训练,能够有效防止得到的文本分类模型过拟合,提高文本分类的准确度。
在一些实施例中,所述根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,还包括:
获取源生BERT网络的模型参数,在基线BERT网络中加载所述源生BERT网络的模型参数;
获取多条第三训练文本,构成第三训练集;
使用所述第三训练集对所述基线BERT网络进行训练,以更新所述基线BERT网络的模型参数;
在所述语义表征网络中加载所述基线BERT网络更新后的模型参数。
本申请实施例中,文本分类模型的语义表征网络为BERT网络,语义表征网络、用于微调训练的基线BERT网络和源生BERT网络中的任意两个网络不是同一个网络,但是同一种类型的网络。在根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,采用迁移学习的方式确定文本分类模型中语义表征网络的模型参数。
比如,在根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,电子设备可以获取源生BERT网络的模型参数,在基线BERT网络中加载源生BERT网络的模型参数,并获取多条第三训练文本,构成第三训练集。然后使用第三训练集对基线BERT网络进行微调训练,以更新基线BERT网络的模型参数。接着在文本分类模型的BERT网络中加载基线BERT网络微调训练后更新的模型参数。
其中,第一训练文本、第二训练文本和第三训练文本属于同种类型的信息,第三训练文本不同于第一训练文本,第三训练文本可以不同于第二训练文本或 第三训练文本也可以同于第二训练文本。
图8是本申请实施例提供的文本分类装置的结构示意图,该装置用于执行上述实施例提供的文本分类方法,具备执行方法相应的功能模块和有益效果。如图8所示,该文本分类装置200具体包括:第一获取模块201、第一转换模块202、卷积运算模块203以及分类模块204,其中:
第一获取模块201,用于获取待分类文本;
第一转换模块202,用于根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网络构成;
卷积运算模块203,用于在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
分类模块204,用于根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
在一些实施例中,在根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵时,第一转换模块202可以用于:
根据预先训练的文本分类模型的语义表征网络,将所述待分类文本中的各字符转换为语义向量;
基于各字符的先后顺序,将各字符的语义向量组合为语义矩阵。
在一些实施例中,在根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理时,分类模块204可以用于:
在所述分类层根据所述多种尺寸的语义特征,计算所述待分类文本在每个预设文本类别上的概率值;
将概率值最大的预设文本类别确定为所述待分类文本的文本类别。
在一些实施例中,在得到多种尺寸的语义特征之后,文本分类装置200还包括池化处理模块,所述池化处理模块用于:在所述池化层,对每一种尺寸的语义特征进行池化处理;所述分类模块204还用于:根据池化处理的语义特征,在所述分类层对所述待分类文本进行分类处理。
在一些实施例中,在根据预先训练的文本分类模型的语义表征网络,将所 述待分类文本转换为语义矩阵之前,文本分类装置200还包括去除模块,所述去除模块用于去除所述待分类文本中的无效字符。
在一些实施例中,在获取待分类文本之前,文本分类装置200还包括第二获取模块、第二转换模块以及第一训练模块:
所述第二获取模块,用于获取多条第一训练文本,构成第一训练集;
所述第二转换模块,用于根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵;
所述第一训练模块,用于基于所述多个第一语义矩阵训练预设的卷积神经网络,并将训练后的所述卷积神经网络作为所述语义分类网络,由所述语义表征网络和语义分类网络构成所述文本分类模型。
在一些实施例中,在根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,文本分类装置200还包括第三获取模块以及第二训练模块:
所述第三获取模块,用于获取多条第二训练文本,构成第二训练集;
所述第二训练模块,用于使用所述第二训练集对所述BERT网络进行训练,以更新所述BERT网络的模型参数。
在一些实施例中,在基于所述多个第一语义矩阵训练预设的卷积神经网络时,所述第一训练模块还用于:
获取使用所述第二训练集训练所述BERT网络时得到的多个第二语义矩阵;
基于所述多个第一语义矩阵和所述多个第二语义矩阵,训练预设的卷积神经网络。
在一些实施例中,在基于所述多个第一语义矩阵训练预设的卷积神经网络时,所述第一训练模块可以用于:
基于所述多个第一语义矩阵和预设的损失函数,对预设的卷积神经网络进行迭代训练直至收敛。
由上可知,本申请实施例提供的文本分类装置200,第一获取模块201获取待分类文本,然后第一转换模块202根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,接着卷积运算模块203在文本分类模型的语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸 的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层,最后分类模块204根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别,可以丰富待分类文本的语义特征,防止由待分类文本语义特征少引起的文本分类准确度低,从而提高文本分类的准确度。
应当说明的是,本申请实施例提供的文本分类装置与上文实施例中的文本分类方法属于同一构思,在文本分类装置上可以运行文本分类方法实施例中提供的任一方法,其具体实现过程详见文本分类方法实施例,此处不再赘述。
本申请实施例提供一种计算机可读的存储介质,其上存储有计算机程序,当其存储的计算机程序在计算机上执行时,使得计算机执行如本申请实施例提供的文本分类方法中的步骤。其中,存储介质可以是磁碟、光盘、只读存储器(Read Only Memory,ROM,)或者随机存取器(Random Access Memory,RAM)等。
本申请实施例还提供一种电子设备,请参照图9,电子设备300包括处理器301和存储器302。其中,处理器301与存储器302电性连接。
处理器301是电子设备300的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或加载存储在存储器302内的计算机程序,以及调用存储在存储器302内的数据,执行电子设备300的各种功能并处理数据。
存储器302可用于存储软件程序以及模块,处理器301通过运行存储在存储器302的计算机程序以及模块,从而执行各种功能应用以及数据处理。存储器302可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的计算机程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备的使用所创建的数据等。
此外,存储器302可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器302还可以包括存储器控制器,以提供处理器301对存储器302的访问。
在本申请实施例中,电子设备300中的处理器301会按照如下的步骤,将一个或一个以上的计算机程序的进程对应的指令加载到存储器302中,并由处理器301运行存储在存储器302中的计算机程序,从而实现各种功能,如下:
获取待分类文本;
根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网络构成;
在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
请参照图10,图10为本申请实施例提供的电子设备的第二结构示意图,与图9所示电子设备的区别在于,电子设备还包括:摄像组件303、射频电路304、音频电路305以及电源306。其中,摄像组件303、射频电路304、音频电路305以及电源306分别与处理器301电性连接。
摄像组件303可以包括图像处理电路,图像处理电路可以利用硬件和/或软件组件实现,可包括定义图像信号处理(Image Signal Processing)管线的各种处理单元。图像处理电路至少可以包括:多个摄像头、图像信号处理器(Image Signal Processor,ISP处理器)、控制逻辑器、图像存储器以及显示器等。其中每个摄像头至少可以包括一个或多个透镜和图像传感器。图像传感器可包括色彩滤镜阵列(如Bayer滤镜)。图像传感器可获取用图像传感器的每个成像像素捕捉的光强度和波长信息,并提供可由图像信号处理器处理的一组原始图像数据。
射频电路304可以用于收发射频信号,以通过无线通信与网络设备或其他电子设备建立无线通讯,与网络设备或其他电子设备之间收发信号。
音频电路305可以用于通过扬声器、传声器提供用户与电子设备之间的音频接口。
电源306可以用于给电子设备300的各个部件供电。在一些实施例中,电源306可以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。
在本申请实施例中,电子设备300中的处理器301会按照如下的步骤,将一个或一个以上的计算机程序的进程对应的指令加载到存储器302中,并由处 理器301运行存储在存储器302中的计算机程序,从而实现各种功能,如下:
获取待分类文本;
根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由语义表征网络和语义分类网络构成;
在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
在一些实施例中,在根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵时,处理器301可以执行:
根据预先训练的文本分类模型的语义表征网络,将所述待分类文本中的各字符转换为语义向量;
基于各字符的先后顺序,将各字符的语义向量组合为语义矩阵。
在一些实施例中,在根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理时,处理器301可以执行:
在所述分类层根据所述多种尺寸的语义特征,计算所述待分类文本在每个预设文本类别上的概率值;
将概率值最大的预设文本类别确定为所述待分类文本的文本类别。
在一些实施例中,语义分类网络还包括池化层,在得到多种尺寸的语义特征之后,处理器301可以执行:
在所述池化层,对每一种尺寸的语义特征进行池化处理;
在根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理时,处理器301可以执行:
根据池化处理的语义特征,在所述分类层对所述待分类文本进行分类处理。
在一些实施例中,在根据预先训练的文本分类模型的语义表征网络,得到所述待分类文本的语义矩阵之前,处理器301可以执行:
去除所述待分类文本中的无效字符。
在一些实施例中,在获取待分类文本之前,处理器301可以执行:
获取多条第一训练文本,构成第一训练集;
根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵;
基于所述多个第一语义矩阵训练预设的卷积神经网络,并将训练后的所述卷积神经网络作为所述语义分类网络,由所述语义表征网络和语义分类网络构成所述文本分类模型。
在一些实施例中,所述语义表征网络为BERT网络;在根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,处理器301可以执行:
获取多条第二训练文本,构成第二训练集;
使用所述第二训练集对所述BERT网络进行训练,以更新所述BERT网络的模型参数。
在一些实施例中,在基于所述多个第一语义矩阵训练预设的卷积神经网络时,处理器301可以执行:
获取使用所述第二训练集训练所述BERT网络时得到的多个第二语义矩阵;
基于所述多个第一语义矩阵和所述多个第二语义矩阵,训练预设的卷积神经网络。
在一些实施例中,在基于所述多个第一语义矩阵训练预设的卷积神经网络时,处理器301可以执行:
基于所述多个第一语义矩阵和预设的损失函数,对预设的卷积神经网络进行迭代训练直至收敛。
由上述可知,本实施例提供的电子设备,在获取待分类文本之后,根据预先训练的文本分类模型的语义表征网络,将待分类文本转换为语义矩阵,然后在文本分类模型的语义分类网络的卷积层对语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,语义分类网络包括具有不同超参数的卷积层,最后根据多种尺寸的语义特征,在语义分类网络的分类层对待分类文本进行分类处理,以确定待分类文本的文本类别,可以丰富待分类文本的语义特征,防止由待分类文本语义特征少引起的文本分类准确度低,从而提高文本分类的准确度。
本申请实施例还提供一种存储介质,该存储介质存储有计算机程序,当该计算机程序在计算机上运行时,使得该计算机执行上述任一实施例中的文本分 类方法,比如:获取待分类文本;根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网络构成;在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
在本申请实施例中,存储介质可以是磁碟、光盘、只读存储器(Read Only Memory,ROM)、或者随机存取记忆体(Random Access Memory,RAM)等。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
需要说明的是,对本申请实施例的文本分类方法而言,本领域普通测试人员可以理解实现本申请实施例的文本分类方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,如存储在电子设备的存储器中,并被该电子设备内的至少一个处理器执行,在执行过程中可包括如文本分类方法的实施例的流程。其中,存储介质可为磁碟、光盘、只读存储器、随机存取记忆体等。
对本申请实施例的文本分类装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。该集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,该存储介质譬如为只读存储器,磁盘或光盘等。
以上对本申请实施例所提供的一种文本分类方法、装置、存储介质以及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种文本分类方法,其中,所述方法包括:
    获取待分类文本;
    根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网络构成;
    在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
    根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
  2. 根据权利要求1所述的文本分类方法,其中,所述根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,包括:
    根据预先训练的文本分类模型的语义表征网络,将所述待分类文本中的各字符转换为语义向量;
    基于各字符的先后顺序,将各字符的语义向量组合为语义矩阵。
  3. 根据权利要求1所述的文本分类方法,其中,所述根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,包括:
    在所述分类层根据所述多种尺寸的语义特征,计算所述待分类文本在每个预设文本类别上的概率值;
    将概率值最大的预设文本类别确定为所述待分类文本的文本类别。
  4. 根据权利要求1所述的文本分类方法,其中,所述语义分类网络还包括池化层,所述得到多种尺寸的语义特征之后,还包括:
    在所述池化层,对每一种尺寸的语义特征进行池化处理;
    所述根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,包括:
    根据池化处理的语义特征,在所述分类层对所述待分类文本进行分类处理。
  5. 根据权利要求1所述的文本分类方法,其中,所述根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵之前,还包括:
    去除所述待分类文本中的无效字符。
  6. 根据权利要求1所述的文本分类方法,其中,所述获取待分类文本之前, 还包括:
    获取多条第一训练文本,构成第一训练集;
    根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵;
    基于所述多个第一语义矩阵训练预设的卷积神经网络,并将训练后的所述卷积神经网络作为所述语义分类网络,由所述语义表征网络和语义分类网络构成所述文本分类模型。
  7. 根据权利要求6所述的文本分类方法,其中,所述语义表征网络为BERT网络;所述根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,还包括:
    获取多条第二训练文本,构成第二训练集;
    使用所述第二训练集对所述BERT网络进行训练,以更新所述BERT网络的模型参数。
  8. 根据权利要求7所述的文本分类方法,其中,所述基于所述多个第一语义矩阵训练预设的卷积神经网络,包括:
    获取使用所述第二训练集训练所述BERT网络时得到的多个第二语义矩阵;
    基于所述多个第一语义矩阵和所述多个第二语义矩阵,训练预设的卷积神经网络。
  9. 根据权利要求6所述的文本分类方法,其中,所述语义表征网络为BERT网络;所述根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,还包括:
    获取源生BERT网络的模型参数,在基线BERT网络中加载所述源生BERT网络的模型参数;
    获取多条第三训练文本,构成第三训练集;
    使用所述第三训练集对所述基线BERT网络进行训练,以更新所述基线BERT网络的模型参数;
    在所述语义表征网络中加载所述基线BERT网络更新后的模型参数。
  10. 根据权利要求6所述的文本分类方法,其中,所述基于所述多个第一语义矩阵训练预设的卷积神经网络,包括:
    基于所述多个第一语义矩阵和预设的损失函数,对预设的卷积神经网络进行迭代训练直至收敛。
  11. 一种文本分类装置,其中,包括:
    第一获取模块,用于获取待分类文本;
    第一转换模块,用于根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由所述语义表征网络和语义分类网络构成;
    卷积运算模块,用于在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
    分类模块,用于根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
  12. 一种电子设备,包括:处理器、存储器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现文本分类方法:
    获取待分类文本;
    根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵,其中,所述文本分类模型由语义表征网络和语义分类网络构成;
    在所述语义分类网络的卷积层对所述语义矩阵进行卷积运算,得到多种尺寸的语义特征,其中,所述语义分类网络包括具有不同超参数的卷积层和分类层;
    根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理,以确定所述待分类文本的文本类别。
  13. 根据权利要求12所述的电子设备,其中,在所述根据预先训练的文本分类模型的语义表征网络,将所述待分类文本转换为语义矩阵时,所述处理器用于执行:
    根据预先训练的文本分类模型的语义表征网络,将所述待分类文本中的各字符转换为语义向量;
    基于各字符的先后顺序,将各字符的语义向量组合为语义矩阵。
  14. 根据权利要求12所述的电子设备,其中,在所述根据所述多种尺寸 的语义特征,在所述分类层对所述待分类文本进行分类处理时,所述处理器用于执行:
    在所述分类层根据所述多种尺寸的语义特征,计算所述待分类文本在每个预设文本类别上的概率值;
    将概率值最大的预设文本类别确定为所述待分类文本的文本类别。
  15. 根据权利要求12的电子设备,其中,所述语义分类网络还包括池化层,在所述得到多种尺寸的语义特征之后,所述处理器用于执行:
    在所述池化层,对每一种尺寸的语义特征进行池化处理;
    所述根据所述多种尺寸的语义特征,在所述分类层对所述待分类文本进行分类处理时,所述处理器用于执行:
    根据池化处理的语义特征,在所述分类层对所述待分类文本进行分类处理。
  16. 根据权利要求12所述的电子设备,其中,在所述获取待分类文本之前,所述处理器用于执行:
    获取多条第一训练文本,构成第一训练集;
    根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵;
    基于所述多个第一语义矩阵训练预设的卷积神经网络,并将训练后的所述卷积神经网络作为所述语义分类网络,由所述语义表征网络和语义分类网络构成所述文本分类模型。
  17. 根据权利要求16所述的电子设备,其中,所述语义表征网络为BERT网络;在所述根据所述语义表征网络,将所述第一训练集中的所述多条第一训练文本转换为多个第一语义矩阵之前,所述处理器用于执行:
    获取多条第二训练文本,构成第二训练集;
    使用所述第二训练集对所述BERT网络进行训练,以更新所述BERT网络的模型参数。
  18. 根据权利要求17所述的电子设备,其中,在所述基于所述多个第一语义矩阵训练预设的卷积神经网络时,所述处理器用于执行:
    获取使用所述第二训练集训练所述BERT网络时得到的多个第二语义矩阵;
    基于所述多个第一语义矩阵和所述多个第二语义矩阵,训练预设的卷积神 经网络。
  19. 根据权利要求16所述的电子设备,其中,在所述基于所述多个第一语义矩阵训练预设的卷积神经网络时,所述处理器用于执行:
    基于所述多个第一语义矩阵和预设的损失函数,对预设的卷积神经网络进行迭代训练直至收敛。
  20. 一种包含电子设备可执行指令的存储介质,其中,所述电子设备可执行指令在由电子设备处理器执行时用于执行如权利要求1至10任一项所述的文本分类方法。
PCT/CN2019/114871 2019-10-31 2019-10-31 一种文本分类方法、装置、电子设备及存储介质 WO2021081945A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/114871 WO2021081945A1 (zh) 2019-10-31 2019-10-31 一种文本分类方法、装置、电子设备及存储介质
CN201980099197.XA CN114207605A (zh) 2019-10-31 2019-10-31 一种文本分类方法、装置、电子设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/114871 WO2021081945A1 (zh) 2019-10-31 2019-10-31 一种文本分类方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021081945A1 true WO2021081945A1 (zh) 2021-05-06

Family

ID=75715730

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114871 WO2021081945A1 (zh) 2019-10-31 2019-10-31 一种文本分类方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN114207605A (zh)
WO (1) WO2021081945A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434699A (zh) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Bert模型的预训练方法、计算机装置和存储介质
CN113836302A (zh) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 文本分类方法、文本分类装置及存储介质
WO2023035940A1 (zh) * 2021-09-10 2023-03-16 上海明品医学数据科技有限公司 一种目标对象推荐方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
CN109710770A (zh) * 2019-01-31 2019-05-03 北京牡丹电子集团有限责任公司数字电视技术中心 一种基于迁移学习的文本分类方法及装置
CN110147452A (zh) * 2019-05-17 2019-08-20 北京理工大学 一种基于层级bert神经网络的粗粒度情感分析方法
CN110309511A (zh) * 2019-07-04 2019-10-08 哈尔滨工业大学 基于共享表示的多任务语言分析系统及方法
CN110334210A (zh) * 2019-05-30 2019-10-15 哈尔滨理工大学 一种基于bert与lstm、cnn融合的中文情感分析方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170308790A1 (en) * 2016-04-21 2017-10-26 International Business Machines Corporation Text classification by ranking with convolutional neural networks
CN109710770A (zh) * 2019-01-31 2019-05-03 北京牡丹电子集团有限责任公司数字电视技术中心 一种基于迁移学习的文本分类方法及装置
CN110147452A (zh) * 2019-05-17 2019-08-20 北京理工大学 一种基于层级bert神经网络的粗粒度情感分析方法
CN110334210A (zh) * 2019-05-30 2019-10-15 哈尔滨理工大学 一种基于bert与lstm、cnn融合的中文情感分析方法
CN110309511A (zh) * 2019-07-04 2019-10-08 哈尔滨工业大学 基于共享表示的多任务语言分析系统及方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434699A (zh) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Bert模型的预训练方法、计算机装置和存储介质
CN113434699B (zh) * 2021-06-30 2023-07-18 平安科技(深圳)有限公司 用于文本匹配的bert模型的预训练方法、计算机装置和存储介质
WO2023035940A1 (zh) * 2021-09-10 2023-03-16 上海明品医学数据科技有限公司 一种目标对象推荐方法及系统
CN113836302A (zh) * 2021-09-26 2021-12-24 平安科技(深圳)有限公司 文本分类方法、文本分类装置及存储介质

Also Published As

Publication number Publication date
CN114207605A (zh) 2022-03-18

Similar Documents

Publication Publication Date Title
WO2021169723A1 (zh) 图像识别方法、装置、电子设备及存储介质
EP3859488A1 (en) Signal processing device, signal processing method and related product
CN111209970B (zh) 视频分类方法、装置、存储介质及服务器
GB2547068B (en) Semantic natural language vector space
WO2021081945A1 (zh) 一种文本分类方法、装置、电子设备及存储介质
US11450319B2 (en) Image processing apparatus and method
US20220262151A1 (en) Method, apparatus, and system for recognizing text in image
WO2017219991A1 (zh) 适用于模式识别的模型的优化方法、装置及终端设备
US20190228763A1 (en) On-device neural network adaptation with binary mask learning for language understanding systems
US20200126554A1 (en) Image processing apparatus and method
AU2016256764A1 (en) Semantic natural language vector space for image captioning
WO2020001196A1 (zh) 图像处理方法、电子设备、计算机可读存储介质
US10977819B2 (en) Electronic device and method for reliability-based object recognition
WO2022121180A1 (zh) 模型的训练方法、装置、语音转换方法、设备及存储介质
CN111133453A (zh) 人工神经网络
WO2021092808A1 (zh) 网络模型的训练方法、图像的处理方法、装置及电子设备
EP3620982B1 (en) Sample processing method and device
US20180005086A1 (en) Technologies for classification using sparse coding in real time
US20240062056A1 (en) Offline Detector
WO2022042120A1 (zh) 目标图像提取方法、神经网络训练方法及装置
WO2022012205A1 (zh) 词补全方法和装置
CN111368536A (zh) 自然语言处理方法及其设备和存储介质
CN110717401A (zh) 年龄估计方法及装置、设备、存储介质
US11238865B2 (en) Function performance based on input intonation
WO2022155890A1 (en) Decreased quantization latency

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950620

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19950620

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 181022)

122 Ep: pct application non-entry in european phase

Ref document number: 19950620

Country of ref document: EP

Kind code of ref document: A1