WO2020224106A1 - Procédé et système de classement de texte basé sur un réseau neuronal, et dispositif informatique - Google Patents
Procédé et système de classement de texte basé sur un réseau neuronal, et dispositif informatique Download PDFInfo
- Publication number
- WO2020224106A1 WO2020224106A1 PCT/CN2019/102785 CN2019102785W WO2020224106A1 WO 2020224106 A1 WO2020224106 A1 WO 2020224106A1 CN 2019102785 W CN2019102785 W CN 2019102785W WO 2020224106 A1 WO2020224106 A1 WO 2020224106A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word segmentation
- text
- classified
- convolution
- word
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- the embodiments of the present application relate to the field of computer data processing, and in particular to a method, system, computer equipment, and non-volatile computer-readable storage medium for text classification based on neural networks.
- Text classification is one of the important tasks of natural language processing, similar to the industry classification of articles, sentiment analysis and many other natural language processing tasks are essentially text classification.
- text classifiers can be divided into two main categories: text classifiers based on prior rules and text classifiers based on models.
- the classification rules of text classifiers based on prior rules require manual mining or the accumulation of prior knowledge.
- Model-based text classifiers such as text classification based on topic models such as LDA (Latent Dirichlet Allocation, document topic generation model).
- the purpose of the embodiments of the present application is to provide a neural network-based text classification method, system, computer equipment, and non-volatile computer-readable storage medium to solve the problems of text classification errors and low classification accuracy.
- an embodiment of the present application provides a neural network-based text classification method, which includes the following steps:
- the j-th element in each convolution feature map is configured into the j-th input vector to obtain (L-f+1) input vectors, 1 ⁇ j ⁇ (L-f+1), where the The arrangement order of the elements in the j input vectors is determined by the value of i of the feature convolution map where each element is located, i is the convolution kernel identifier, 1 ⁇ i ⁇ M; and
- the (L-f+1) input vectors are sequentially input into the long-short and short-term memory network model, and the classification vector of the text to be classified is calculated.
- the embodiments of the present application also provide a neural network-based text classification system, including:
- the word segmentation module is used to perform word segmentation operations on the text to be classified to obtain L word segmentation;
- the word vector mapping module is used to perform word vector mapping on the L word segmentation respectively to obtain an L*d-dimensional word vector matrix, wherein each word segmentation is mapped to a d-dimensional word vector;
- a convolution module configured to perform a convolution operation on the L*d-dimensional word vector matrix through a convolution layer to obtain M convolution feature maps, and the convolution layer includes M f*d convolution kernels;
- the feature mapping module is used to configure the j-th element in each convolution feature map into the j-th input vector to obtain (L-f+1) input vectors, 1 ⁇ j ⁇ (L-f+1 ), wherein the arrangement order of the elements in the j-th input vector is determined by the i value of the feature convolution map where each element is located, i is the convolution kernel identifier, 1 ⁇ i ⁇ M; and
- the prediction module is used to input the (L-f+1) input vectors into the long-short and short-term memory network model in order to calculate the classification vector of the text to be classified.
- an embodiment of the present application further provides a computer device, the computer device memory, a processor, and computer-readable instructions stored in the memory and running on the processor, the computer When the readable instructions are executed by the processor, the following steps are implemented:
- the j-th element in each convolution feature map is configured into the j-th input vector to obtain (L-f+1) input vectors, 1 ⁇ j ⁇ (L-f+1), where the The arrangement order of the elements in the j input vectors is determined by the value of i of the feature convolution map where each element is located, i is the convolution kernel identifier, 1 ⁇ i ⁇ M; and
- the (L-f+1) input vectors are sequentially input into the long short short-term memory network model, and the classification vector of the text to be classified is calculated.
- the embodiments of the present application also provide a non-volatile computer-readable storage medium.
- the non-volatile computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions may Is executed by at least one processor, so that the at least one processor executes the following steps:
- the j-th element in each convolution feature map is configured into the j-th input vector to obtain (L-f+1) input vectors, 1 ⁇ j ⁇ (L-f+1), where the The arrangement order of the elements in the j input vectors is determined by the value of i of the feature convolution map where each element is located, i is the convolution kernel identifier, 1 ⁇ i ⁇ M; and
- the (L-f+1) input vectors are sequentially input into the long-short and short-term memory network model, and the classification vector of the text to be classified is calculated.
- the neural network-based text classification method, system, computer equipment and non-volatile computer-readable storage medium provided by the embodiments of the application combine convolution and long- and short-term network-based models to form a CNN+LSTM text classification model, which effectively takes into account The local context characteristics of the text, and the dependence between words in a wide span. Therefore, it can solve the problems of text classification errors and low classification accuracy, especially suitable for long text text classification tasks.
- FIG. 1 is a schematic flowchart of Embodiment 1 of a text classification method based on a neural network in this application.
- Fig. 2 is a schematic diagram of a specific flow of step S100 in Fig. 1.
- FIG. 3 is a schematic diagram of a specific flow of step S1008 in FIG. 2.
- FIG. 4 is a schematic diagram of program modules of Embodiment 2 of the text classification system of this application.
- FIG. 5 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
- FIG. 1 there is shown a flow chart of the steps of the neural network-based text classification method according to the first embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. details as follows.
- step S100 a word segmentation operation is performed on the text to be classified to obtain L word segmentation.
- the word segmentation operation may be based on dictionary word segmentation algorithms: forward maximum matching method, reverse maximum matching method, and two-way matching word segmentation method, and may also be based on algorithms such as hidden Markov model HMM, CRF, SVM, and deep learning.
- the step S100 may further include steps S1000 to S1008:
- Step S1000 Obtain multiple user attribute information of multiple users browsing the text to be classified.
- user attribute information includes, but is not limited to, age, gender, occupation, region, hobby, etc.
- Step S1002 According to the multiple user attribute information of the multiple users, analyze and obtain the target group for browsing the text to be classified.
- Step S1004 According to the historical user portraits of the target group, the predicted probability corresponding to each topic of the text to be classified is obtained.
- the historical user portrait is based on the historical behavior information of the target group to obtain the interest coefficient of the target group corresponding to each topic. There is a correspondence between the coefficient of interest and the predicted probability.
- Step S1006 according to the predicted probability of each topic, select multiple target topics whose predicted probability is greater than a preset threshold.
- Step S1008 Perform a word segmentation operation on the text to be classified based on the multiple target topics.
- the step S1008 may include: performing a word segmentation operation on the text to be classified according to multiple topic thesauruses of the multiple target topics. details as follows:
- step S1008 may further include steps S1008A to S1008D:
- Step S1008A performing word segmentation operations on the text to be classified according to the thesaurus associated with each target theme, to obtain multiple word segmentation sets;
- Step S1008B compare whether the word segmentation of each word segmentation set in the corresponding character position area is the same;
- Step S1008C if they are the same, put the word segmentation of the corresponding character position area into the target word segmentation set;
- step S1008D if they are not the same, the word segmentation of one of the word segmentation sets in the corresponding character position area is selected to be put into the target word segmentation set.
- the step S1008D may further include:
- Step 1 Analyze the segmentation probability of each segmentation set in the corresponding character position area through the hidden Markov model
- Step 2 Select and put the word segmentation with the highest division probability into the target word segmentation set.
- step S1008D may further include:
- Step 1 Analyze the segmentation probability of each segmentation set in the corresponding character position area through the hidden Markov model
- Step 2 Calculate the comprehensive weight coefficient of the word segmentation of each word segmentation set in the corresponding character position area according to the division probability of each word segmentation set in the corresponding character position area and the predicted probability of the target theme associated with each word segmentation set;
- Step 3 Select the word with the highest comprehensive weight coefficient to add to the target word set.
- Step S102 Perform word vector mapping on the L word segmentation respectively to obtain an L*d-dimensional word vector matrix, wherein each word segmentation is mapped to a d-dimensional word vector.
- the 128-dimensional word vector of each word segmentation can be obtained through models such as word2vec.
- Step S104 performing a convolution operation on the L*d-dimensional word vector matrix through a convolution layer to obtain M convolution feature maps, and the convolution layer includes M f*d convolution kernels.
- the convolutional layer includes a number of f*d convolution kernels with a step length of 1, and the convolutional layer performs a convolution operation on the L*d-dimensional word vector matrix to obtain several A (L-f+1)*1 convolution feature map. That is, the width of each convolution feature map is 1, and the length is L-f+1.
- the length of the convolution kernel is f, and the number of word segmentation is L. L is a positive integer greater than 1.
- the (L-f+1)*1 element in the convolution feature map is calculated as follows:
- c ij is the feature value of the j-th element in (L-f+1) in the i-th feature convolution map
- w ij is the ith convolutional feature map that is covered by the convolution kernel
- ⁇ means matrix multiplication
- mi is the convolution kernel used to calculate the i-th convolution feature map
- b i is the bias term used to calculate the i-th convolution feature map
- f is not Linear activation function, such as ReLU function.
- the number of the convolution kernels may be 4, so 4 (L-f+1)*1 convolution feature maps are obtained.
- Step S106 the j-th element in each convolution feature map is allocated to the j-th input vector to obtain (L-f+1) input vectors, 1 ⁇ j ⁇ (L-f+1).
- the arrangement order of the elements in the j-th input vector is determined by the value of i of the feature convolution map where each element is located, i is a convolution kernel identifier, and 1 ⁇ i ⁇ M.
- step S108 the (L-f+1) input vectors are sequentially input into a Long Short-Term Memory (LSTM) network model (Long Short-Term Memory), and a classification vector of the text to be classified is calculated.
- LSTM Long Short-Term Memory
- the long-short and short-term memory network model is used to deal with the sequence dependence between long spans, and is suitable for the task of dealing with long text dependence.
- step S108 may further include step S1080 to step S1082:
- Step S1080 obtaining (L-f+1) output vectors through the long and short-term memory network model.
- Step S1082 input the (L-f+1) output vectors to the classification layer, and output the classification vectors through the classification layer.
- the steps of calculating the classification vector of the text to be classified are as follows:
- o t ⁇ (W o [x t ,h t-1 ]+b o ), where o t ⁇ [0,1] represents the selection weight of the node cell memory information at time t, and b o is the bias of the output gate , W o is the weight matrix of the output gate, Represents the vector after the concatenation of the vectors xt and ht-1, that is, a vector of
- x t represents the input data of the LSTM neural network node at time t , that is, one of the (L-f+1) input vectors in this embodiment;
- h t is the output vector of the LSTM neural network node at time t.
- the LSTM model can output a total of (L-f+1) output vectors, and the (L-f+1) output vectors are input to the softmax layer, and the classification vector is output through the softmax layer.
- Each vector parameter in the classification vector represents the confidence of the corresponding text category.
- FIG. 4 shows a schematic diagram of program modules of Embodiment 4 of the text classification system of the present application.
- the text classification system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to complete this Apply and implement the above-mentioned neural network-based text classification method.
- the program module referred to in the embodiments of the present application refers to a series of computer-readable instruction segments capable of completing specific functions, and is more suitable for describing the execution process of the text classification system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
- the word segmentation module 200 is used to perform word segmentation operations on the text to be classified to obtain L word segmentation.
- the word segmentation module 200 may include an acquisition module, an analysis module, a topic prediction module, a screening module, and a word segmentation module, which are specifically as follows:
- the acquiring module is configured to acquire multiple user attribute information of multiple users browsing the text to be classified.
- the analysis module is configured to analyze and obtain the target group for browsing the text to be classified according to multiple user attribute information of the multiple users.
- the topic prediction module is used to obtain the predicted probability of each topic corresponding to the text to be classified according to the historical user portrait of the target group.
- the analysis module is used to obtain target attribute information from the plurality of user attribute information.
- the topic prediction module is used to input the target attribute information into a pre-configured neural network model to obtain the prediction probability of each topic.
- the screening module is configured to screen multiple target themes whose predicted probabilities are greater than a preset threshold according to the predicted probability of each subject.
- the word segmentation module is configured to perform word segmentation operations on the text to be classified based on the multiple target topics.
- the word segmentation module is also used to: perform word segmentation operations on the text to be classified according to multiple subject thesauruses of the multiple target topics, specifically as follows: according to the subject thesaurus associated with each target topic, respectively
- the classified text performs word segmentation operation to obtain multiple word segmentation sets; compare whether the word segmentation of each word segmentation set in the corresponding character position area is the same; if they are the same, put the word segmentation in the corresponding character position area into the target word segmentation set; if they are not the same, then The word segmentation of one of the word segmentation sets in the corresponding character position area is selected to be put into the target word segmentation set.
- selecting and placing the word segmentation of one of the word segmentation sets in the corresponding character position area into the target word segmentation set further includes: analyzing the word segmentation of each word segmentation set in the corresponding character position area through a hidden Markov model The probability of being divided; select the word with the highest probability of being divided into the target word segmentation set.
- selecting and placing the word segmentation of one of the word segmentation sets in the corresponding character position area into the target word segmentation set further includes: analyzing each word segmentation set in the corresponding character position area through a hidden Markov model According to the partition probability of each word segmentation set in the corresponding character position area and the predicted probability of the target theme associated with each word segmentation set, the comprehensive weight coefficient of the word segmentation of each word segmentation set in the corresponding character position area is calculated ; And choose to add the word segmentation with the highest comprehensive weight coefficient to the target word segmentation set.
- the word vector mapping module 202 is configured to perform word vector mapping on the L word segmentation respectively to obtain an L*d-dimensional word vector matrix, wherein each word segmentation is mapped to a d-dimensional word vector.
- the 128-dimensional word vector of each word segmentation can be obtained through models such as word2vec.
- the convolution module 204 is configured to perform a convolution operation on the L*d-dimensional word vector matrix through a convolution layer to obtain M convolution feature maps, and the convolution layer includes M f*d convolution kernels.
- the convolutional layer includes a number of f*d convolution kernels with a step length of 1, and the convolutional layer performs a convolution operation on the L*d-dimensional word vector matrix to obtain several A (L-f+1)*1 convolution feature map. That is, the width of each convolution feature map is 1, and the length is L-f+1.
- the length of the convolution kernel is f, and the number of word segmentation is L.
- the (L-f+1)*1 element in the convolution feature map is calculated as follows:
- c ij is the feature value of the j-th element in (L-f+1) in the i-th feature convolution map
- w ij is the ith convolutional feature map that is covered by the convolution kernel
- the word vector matrix of ⁇ means matrix multiplication
- mi is the convolution kernel used to calculate the i-th convolution feature map
- b i is the bias term used to calculate the i-th convolution feature map
- f is not Linear activation function, such as ReLU function.
- the number of the convolution kernels may be 4, so 4 (L-f+1)*1 convolution feature maps are obtained.
- the feature mapping module 206 is used to configure the j-th element in each convolution feature map into the j-th input vector to obtain (L-f+1) input vectors, 1 ⁇ j ⁇ (L-f+ 1).
- the arrangement order of the elements in the j-th input vector is determined by the value of i of the feature convolution map where each element is located, i is the convolution kernel identifier, and 1 ⁇ i ⁇ M.
- the prediction module 208 is configured to sequentially input the (L-f+1) input vectors into the long-short and short-term memory network model to calculate the classification vector of the text to be classified.
- the prediction module 208 is further configured to: obtain (L-f+1) output vectors through the long and short-term memory network model; and combine the (L-f+1) The output vector is input to the classification layer, and the classification vector is output through the classification layer.
- the computer device 2 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions.
- the computer device 2 may be a PC, a rack server, a blade server, a tower server, or a cabinet server (including an independent server, or a server cluster composed of multiple servers).
- the computer device 2 at least includes, but is not limited to, a memory 21, a processor 22, a network interface 23, and a text classification system 20 that can communicate with each other through a system bus. among them:
- the memory 21 includes at least one type of non-volatile computer-readable storage medium.
- the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), Random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk Wait.
- the memory 21 may be an internal storage unit of the computer device 2, for example, the hard disk or memory of the computer device 2.
- the memory 21 may also be an external storage device of the computer device 2, for example, a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SD card, Flash Card, etc.
- the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
- the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, for example, the program code of the text classification system 20 in the second embodiment.
- the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
- the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
- the processor 22 is generally used to control the overall operation of the computer device 2.
- the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the text classification system 20, to implement the neural network-based text classification method of the first embodiment.
- the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices.
- the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
- the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
- FIG. 5 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
- the text classification system 20 stored in the memory 21 may also be divided into one or more program modules.
- the one or more program modules are stored in the memory 21 and are composed of one or more program modules. It is executed by two processors (in this embodiment, the processor 22) to complete the application.
- FIG. 4 shows a schematic diagram of program modules for implementing the second embodiment of the text classification system 20.
- the text-based classification system 20 can be divided into a word segmentation module 200, a word vector mapping module 202, and convolution. Module 204, feature mapping module 206 and prediction module 208.
- the program module referred to in this application refers to a series of computer-readable instruction segments that can complete specific functions. The specific functions of the program modules 200-208 have been described in detail in the second embodiment, and will not be repeated here.
- This embodiment also provides a non-volatile computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application mall, etc., on which storage There are computer-readable instructions, and the corresponding functions are realized when the program is executed by the processor.
- the non-volatile computer-readable storage medium of this embodiment is used to store the text classification system 20, and when executed by a processor, the following steps are implemented:
- the j-th element in each convolution feature map is configured into the j-th input vector to obtain (L-f+1) input vectors, 1 ⁇ j ⁇ (L-f+1), where the The arrangement order of the elements in the j input vectors is determined by the value of i of the feature convolution map where each element is located, i is the convolution kernel identifier, 1 ⁇ i ⁇ M; and
- the (L-f+1) input vectors are sequentially input into the long-short and short-term memory network model, and the classification vector of the text to be classified is calculated.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Procédé de classement de texte basé sur un réseau neuronal. Le procédé consiste à : réaliser une opération de segmentation de mots sur un texte à classer pour acquérir L mots segmentés (S100) ; réaliser respectivement une mise en correspondance de vecteurs de mots sur les L mots segmentés pour acquérir une matrice de vecteurs de mots L*d-dimensionnelle (S102) ; exécuter une opération de convolution sur la matrice de vecteurs de mots L*d-dimensionnelle au moyen d'une couche de convolution pour obtenir M cartes de caractéristiques de convolution, la couche de convolution comprenant M noyaux de convolution f*d (S104) ; configurer un j-ème élément dans chaque carte de caractéristiques de convolution en un j-ème vecteur d'entrée pour obtenir des (L-f+1) vecteurs d'entrée, où 1 ≤ j ≤ (L-f+1) (S106) ; et entrer les (L-f+1) vecteurs d'entrée dans un long modèle de réseau de mémoire à court terme en séquence, et calculer un vecteur de classement du texte à classer (S108). Au moyen du procédé, le problème d'erreurs de classement de texte peut être efficacement évité, ce qui permet d'améliorer la précision de classement.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910374240.0 | 2019-05-07 | ||
CN201910374240.0A CN110263152B (zh) | 2019-05-07 | 2019-05-07 | 基于神经网络的文本分类方法、系统及计算机设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020224106A1 true WO2020224106A1 (fr) | 2020-11-12 |
Family
ID=67914250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/102785 WO2020224106A1 (fr) | 2019-05-07 | 2019-08-27 | Procédé et système de classement de texte basé sur un réseau neuronal, et dispositif informatique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110263152B (fr) |
WO (1) | WO2020224106A1 (fr) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112597764A (zh) * | 2020-12-23 | 2021-04-02 | 青岛海尔科技有限公司 | 文本分类方法及装置、存储介质、电子装置 |
CN112765357A (zh) * | 2021-02-05 | 2021-05-07 | 北京灵汐科技有限公司 | 文本分类方法、装置和电子设备 |
CN113204698A (zh) * | 2021-05-31 | 2021-08-03 | 平安科技(深圳)有限公司 | 新闻主题词生成方法、装置、设备及介质 |
CN113886885A (zh) * | 2021-10-21 | 2022-01-04 | 平安科技(深圳)有限公司 | 数据脱敏方法、数据脱敏装置、设备及存储介质 |
CN114579752A (zh) * | 2022-05-09 | 2022-06-03 | 中国人民解放军国防科技大学 | 基于特征重要度的长文本分类方法、装置和计算机设备 |
CN114722801A (zh) * | 2020-12-22 | 2022-07-08 | 航天信息股份有限公司 | 政务数据分类存储方法及相关装置 |
CN117221134A (zh) * | 2023-09-19 | 2023-12-12 | 合肥尚廷电子科技有限公司 | 一种基于互联网的状态分析方法及系统 |
CN117787249A (zh) * | 2024-02-23 | 2024-03-29 | 北京大学深圳研究生院 | 一种用于材料与化工行业科技情报的数据处理方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717330A (zh) * | 2019-09-23 | 2020-01-21 | 哈尔滨工程大学 | 基于深度学习的词句级短文本分类方法 |
CN111178070B (zh) * | 2019-12-25 | 2022-11-25 | 深圳平安医疗健康科技服务有限公司 | 基于分词的单词序列获取方法、装置和计算机设备 |
CN113515920B (zh) * | 2020-04-09 | 2024-06-21 | 北京庖丁科技有限公司 | 从表格中提取公式的方法、电子设备和计算机可读介质 |
CN117473095B (zh) * | 2023-12-27 | 2024-03-29 | 合肥工业大学 | 基于主题增强词表示的短文本分类方法和系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9972302B2 (en) * | 2013-05-28 | 2018-05-15 | International Business Machines Corporation | Hybrid predictive model for enhancing prosodic expressiveness |
CN109213868A (zh) * | 2018-11-21 | 2019-01-15 | 中国科学院自动化研究所 | 基于卷积注意力机制网络的实体级别情感分类方法 |
CN109543029A (zh) * | 2018-09-27 | 2019-03-29 | 平安科技(深圳)有限公司 | 基于卷积神经网络的文本分类方法、装置、介质和设备 |
CN109684476A (zh) * | 2018-12-07 | 2019-04-26 | 中科恒运股份有限公司 | 一种文本分类方法、文本分类装置及终端设备 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169035B (zh) * | 2017-04-19 | 2019-10-18 | 华南理工大学 | 一种混合长短期记忆网络和卷积神经网络的文本分类方法 |
CN107301246A (zh) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | 基于超深卷积神经网络结构模型的中文文本分类方法 |
CN107729311B (zh) * | 2017-08-28 | 2020-10-16 | 云南大学 | 一种融合文本语气的中文文本特征提取方法 |
CN108763216A (zh) * | 2018-06-01 | 2018-11-06 | 河南理工大学 | 一种基于中文数据集的文本情感分析方法 |
CN109299268A (zh) * | 2018-10-24 | 2019-02-01 | 河南理工大学 | 一种基于双通道模型的文本情感分析方法 |
-
2019
- 2019-05-07 CN CN201910374240.0A patent/CN110263152B/zh active Active
- 2019-08-27 WO PCT/CN2019/102785 patent/WO2020224106A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9972302B2 (en) * | 2013-05-28 | 2018-05-15 | International Business Machines Corporation | Hybrid predictive model for enhancing prosodic expressiveness |
CN109543029A (zh) * | 2018-09-27 | 2019-03-29 | 平安科技(深圳)有限公司 | 基于卷积神经网络的文本分类方法、装置、介质和设备 |
CN109213868A (zh) * | 2018-11-21 | 2019-01-15 | 中国科学院自动化研究所 | 基于卷积注意力机制网络的实体级别情感分类方法 |
CN109684476A (zh) * | 2018-12-07 | 2019-04-26 | 中科恒运股份有限公司 | 一种文本分类方法、文本分类装置及终端设备 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114722801A (zh) * | 2020-12-22 | 2022-07-08 | 航天信息股份有限公司 | 政务数据分类存储方法及相关装置 |
CN112597764A (zh) * | 2020-12-23 | 2021-04-02 | 青岛海尔科技有限公司 | 文本分类方法及装置、存储介质、电子装置 |
CN112597764B (zh) * | 2020-12-23 | 2023-07-25 | 青岛海尔科技有限公司 | 文本分类方法及装置、存储介质、电子装置 |
CN112765357A (zh) * | 2021-02-05 | 2021-05-07 | 北京灵汐科技有限公司 | 文本分类方法、装置和电子设备 |
CN113204698A (zh) * | 2021-05-31 | 2021-08-03 | 平安科技(深圳)有限公司 | 新闻主题词生成方法、装置、设备及介质 |
CN113204698B (zh) * | 2021-05-31 | 2023-12-26 | 平安科技(深圳)有限公司 | 新闻主题词生成方法、装置、设备及介质 |
CN113886885A (zh) * | 2021-10-21 | 2022-01-04 | 平安科技(深圳)有限公司 | 数据脱敏方法、数据脱敏装置、设备及存储介质 |
CN114579752A (zh) * | 2022-05-09 | 2022-06-03 | 中国人民解放军国防科技大学 | 基于特征重要度的长文本分类方法、装置和计算机设备 |
CN114579752B (zh) * | 2022-05-09 | 2023-05-26 | 中国人民解放军国防科技大学 | 基于特征重要度的长文本分类方法、装置和计算机设备 |
CN117221134A (zh) * | 2023-09-19 | 2023-12-12 | 合肥尚廷电子科技有限公司 | 一种基于互联网的状态分析方法及系统 |
CN117787249A (zh) * | 2024-02-23 | 2024-03-29 | 北京大学深圳研究生院 | 一种用于材料与化工行业科技情报的数据处理方法 |
CN117787249B (zh) * | 2024-02-23 | 2024-05-28 | 北京大学深圳研究生院 | 一种用于材料与化工行业科技情报的数据处理方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110263152A (zh) | 2019-09-20 |
CN110263152B (zh) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020224106A1 (fr) | Procédé et système de classement de texte basé sur un réseau neuronal, et dispositif informatique | |
CN108536800B (zh) | 文本分类方法、系统、计算机设备和存储介质 | |
CN108399228B (zh) | 文章分类方法、装置、计算机设备及存储介质 | |
WO2020007138A1 (fr) | Procédé d'identification d'événement, procédé d'apprentissage de modèle, dispositif et support d'informations | |
CN110750965B (zh) | 英文文本序列标注方法、系统及计算机设备 | |
CN108520041B (zh) | 文本的行业分类方法、系统、计算机设备和存储介质 | |
CN112256886B (zh) | 图谱中的概率计算方法、装置、计算机设备及存储介质 | |
WO2021027142A1 (fr) | Procédé et système d'entraînement de modèle de classification d'images, et dispositif informatique | |
CN113254649B (zh) | 敏感内容识别模型的训练方法、文本识别方法及相关装置 | |
CN111339308B (zh) | 基础分类模型的训练方法、装置和电子设备 | |
CN110609952B (zh) | 数据采集方法、系统和计算机设备 | |
CN111177392A (zh) | 一种数据处理方法及装置 | |
CN114780746A (zh) | 基于知识图谱的文档检索方法及其相关设备 | |
CN115730597A (zh) | 多级语义意图识别方法及其相关设备 | |
WO2022116444A1 (fr) | Procédé et appareil de classification de textes, ainsi que dispositif informatique et support | |
CN111930891B (zh) | 基于知识图谱的检索文本扩展方法及相关装置 | |
CN115062619B (zh) | 中文实体链接方法、装置、设备及存储介质 | |
US20230162518A1 (en) | Systems for Generating Indications of Relationships between Electronic Documents | |
CN114513578A (zh) | 外呼方法、装置、计算机设备及存储介质 | |
CN113011153B (zh) | 文本相关性检测方法、装置、设备及存储介质 | |
CN111177493B (zh) | 数据处理方法、装置、服务器和存储介质 | |
CN115700828A (zh) | 表格元素识别方法、装置、计算机设备和存储介质 | |
WO2021056740A1 (fr) | Système et procédé de construction de modèle linguistique, dispositif informatique et support de stockage lisible | |
CN113792163B (zh) | 多媒体推荐方法、装置、电子设备及存储介质 | |
US20240095583A1 (en) | Machine learning training approach for a multitask predictive domain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19927730 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19927730 Country of ref document: EP Kind code of ref document: A1 |