WO2020207167A1 - Procédé, appareil et dispositif de classification de texte et support de stockage lisible par ordinateur - Google Patents

Procédé, appareil et dispositif de classification de texte et support de stockage lisible par ordinateur Download PDF

Info

Publication number
WO2020207167A1
WO2020207167A1 PCT/CN2020/078389 CN2020078389W WO2020207167A1 WO 2020207167 A1 WO2020207167 A1 WO 2020207167A1 CN 2020078389 W CN2020078389 W CN 2020078389W WO 2020207167 A1 WO2020207167 A1 WO 2020207167A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
preset
classified
feature
alarm
Prior art date
Application number
PCT/CN2020/078389
Other languages
English (en)
Chinese (zh)
Inventor
张威
杨永帮
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2020207167A1 publication Critical patent/WO2020207167A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Definitions

  • This application relates to the technical field of financial technology (Fintech), and in particular to a text classification method, device, equipment, and computer-readable storage medium.
  • the main purpose of this application is to provide a text classification method, device, equipment, and computer-readable storage medium, aiming to solve the problem of poor accuracy of existing alarm text classification for operation and maintenance scenarios.
  • the present application provides a text classification method, the text classification method includes:
  • the similarity between the first text feature vector and the second text feature vector is calculated, and the alarm text to be classified is classified according to the calculation result.
  • the present application also provides a text classification device, the text classification device includes:
  • the word segmentation processing module is used to receive the alarm text to be classified and perform word segmentation processing on the alarm text to be classified to obtain the first word segmentation set;
  • the template detection module is used to detect whether there is template text in the preset template pool
  • the first extraction module is configured to, if template text exists in the preset template pool, perform feature extraction on the alarm text to be classified based on the first word segmentation set and the first preset rule to obtain a first text feature vector, and Performing feature extraction on the template text to obtain a second text feature vector;
  • the text classification module is used for calculating the similarity between the first text feature vector and the second text feature vector, and classifying the alarm text to be classified according to the calculation result.
  • this application also provides a text classification device, the text classification device includes: a memory, a processor, and a text classification program stored in the memory and running on the processor, so When the text classification program is executed by the processor, the steps of the text classification method described above are realized.
  • the present application also provides a computer-readable storage medium having a text classification program stored on the computer-readable storage medium, and when the text classification program is executed by a processor, the text classification as described above is realized Method steps.
  • This application provides a text classification method, device, equipment, and computer-readable storage medium.
  • the first word segmentation set is obtained, and whether there is a preset template pool is detected Template text; if template text exists in the preset template pool, feature extraction is performed on the alarm text to be classified based on the first word segmentation set and the first preset rule to obtain the first text feature vector, and feature extraction is performed on the template text, Obtain the second text feature vector; calculate the similarity between the first text feature vector and the second text feature vector, and classify the text to be classified according to the calculation result.
  • the present application extracts the first text feature vector and the second text feature vector based on the alarm text to be classified after word segmentation processing and the template text in the preset template pool, and calculates the similarity between the two. It can accurately classify the classified alarm information, which can improve the accuracy of alarm text classification in operation and maintenance scenarios. At the same time, this application intelligently and accurately classifies the alarm text in the operation and maintenance scene, which can also improve the work efficiency of the operation and maintenance staff.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application;
  • step S30 is a schematic diagram of the detailed flow of step S30 in the first embodiment of the application.
  • FIG. 5 is a schematic diagram of functional modules of the first embodiment of the text classification device of this application.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application.
  • the text classification device in the embodiment of this application may be a PC (Personal Computer, personal computer), or a terminal device such as a server, a tablet computer, a portable computer, and a smart phone.
  • PC Personal Computer
  • terminal device such as a server, a tablet computer, a portable computer, and a smart phone.
  • the text classification device may include a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a Wi-Fi interface).
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the foregoing processor 1001.
  • the structure of the text classification device shown in FIG. 1 does not constitute a limitation on the text classification device, and may include more or less components than shown in the figure, or a combination of certain components, or different components Layout.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a text classification program.
  • the network interface 1004 is mainly used to connect to a back-end server and communicate with the back-end server;
  • the user interface 1003 is mainly used to connect to a client and communicate with the client;
  • the processor 1001 can be used to Call the text classification program stored in the memory 1005, and execute each step of the following text classification method.
  • This application provides a text classification method.
  • FIG. 2 is a schematic flowchart of a first embodiment of a text classification method of this application.
  • the text classification method includes:
  • Step S10 receiving an alarm text to be classified, and performing word segmentation processing on the alarm text to be classified to obtain a first word segmentation set
  • the text classification method of this embodiment is implemented by a text classification device, and the device is described by taking a server as an example.
  • the server first receives the to-be-classified alarm text sent by each business system (in this embodiment, it can be a banking institution’s business system or a financial management institution’s business system, of course, it can also be other types of systems). Then perform word segmentation processing on the alarm text to be classified to obtain the first word segmentation set.
  • word segmentation processing can be implemented by word segmentation tools, such as Chinese lexical analysis system ICTCLAS, Chinese lexical analysis program THULAC, language technology platform LTP, etc.
  • the word segmentation is mainly based on the characteristics of the Chinese language, cutting each Chinese text in the sample data into a word.
  • Step S20 detecting whether there is template text in the preset template pool
  • the first word segmentation set is obtained through word segmentation processing, it is detected whether there is template text in the preset template pool, where the template text refers to the alarm text that has been classified, and an alarm text is selected and taken out of each category obtained by classification , Used to compare with the subsequently received alarm texts to be classified to classify the subsequently received alarm texts to be classified.
  • step S30 is executed: feature extraction of the alarm text to be classified based on the first word segmentation set and the first preset rule to obtain the first text feature vector, and compare the Perform feature extraction on the template text to obtain a second text feature vector;
  • step S30 includes:
  • Step S31 performing word segmentation processing on the template text to obtain a second word segmentation set
  • the word segmentation processing method can refer to the word segmentation processing method for the alarm text to be classified, which will not be repeated here. It is understandable that because the template text is also filtered after word segmentation processing, text feature vector extraction and classification, etc., when the template text is saved to the preset template pool, the word segmentation set corresponding to the template text can also be selected Save in association with the template text, so there is no need to perform word segmentation again at this time, and the word segmentation set corresponding to the template text can be directly obtained.
  • Step S32 Calculate the first attribute value of each preset feature word based on the first word segmentation set and the second word segmentation set, and respectively calculate the second attribute value and the second attribute value of each preset feature word in the alarm text to be classified The third attribute value of each preset feature word in the template text;
  • the first attribute value of each preset feature word is calculated based on the first word segmentation set and the second word segmentation set, and the second attribute value of each preset feature word in the alarm text to be classified and each preset feature in the template text are respectively calculated
  • the third attribute value of the word is:
  • F s1 is the first attribute value of the preset feature word s.
  • the F s1 reflects the frequency of the preset feature word s in all texts. If a word appears in many texts, then the value of F s1 should be lower Low
  • n s is the total number of the alarm text to be classified and the template text (ie the sum of the number of alarm text to be classified and the template text)
  • df(t, s) is the alarm text to be classified and the template text contains The number of texts of the preset feature word s
  • is a preset value (it can be set according to the actual situation and is not limited here)
  • F s2 is the second attribute value of the preset feature word s in the alarm text to be classified
  • F s2 represents the frequency of the preset feature word s in the current alarm text to be classified
  • t s1 is the number of times the preset feature word s appears in the current alarm text to be classified
  • t total1 is the total number of word
  • Step S33 Calculate the first feature value of each preset feature word in the alarm text to be classified according to the first attribute value and the second attribute value, and splice the first feature value to obtain the first feature value.
  • Text feature vector
  • Step S34 Calculate the second feature value of each preset feature word in the template text according to the first attribute value and the third attribute value, and splice the second feature value to obtain a second text feature vector.
  • the first characteristic value of each preset characteristic word in the alarm text to be classified is calculated according to the first attribute value and the second attribute value, and the first characteristic Values are spliced to obtain the first text feature vector.
  • the second feature value of each preset feature word in the template text is calculated according to the first attribute value and the third attribute value, and the second feature value is spliced to obtain the second Text feature vector. It should be noted that the execution order of steps S33 and S34 is in no particular order.
  • V s1 F s1 ⁇ F s2
  • V s2 F s1 ⁇ F s3
  • V s1 is the first feature value of the preset feature word s in the alarm text to be classified
  • V s2 is the second feature value of the preset feature word s in the template text
  • F s1 is the first attribute value of the preset feature word s
  • F s2 is the second attribute value of the preset feature word s in the alarm text to be classified
  • F s3 is the third attribute value of the preset feature word s in the template text.
  • the method for obtaining the first text feature vector is: splicing the first feature value of each preset feature word in the alarm text to be classified. For example, assuming that there are y preset feature words, the preset feature word 1-y is calculated The first feature values of are respectively V 11 , V 21 , ...V s1 , ..., V y1 , and the first text feature vector that can be spliced is ⁇ V 11 ,V 21 ,...V s1 ,...,V y1 ⁇ .
  • the second text feature vector and the first text feature vector are acquired in a similar manner, and will not be repeated here.
  • Step S40 Calculate the similarity between the first text feature vector and the second text feature vector, and classify the alarm text to be classified according to the calculation result.
  • step S40 may include:
  • Step a1 calculating the Euclidean distance between the first text feature vector and each of the second text feature vectors, and judging whether there is a Euclidean distance greater than a preset threshold according to the calculation result;
  • the similarity can be characterized by calculating Euclidean distance. Specifically, the Euclidean distance between the first text feature vector and each second text feature vector is calculated, and according to the calculation result, it is determined whether there is a Euclidean distance greater than a preset threshold. Among them, Euclidean distance is also called Euclidean distance or Euclidean metric, which refers to the straight-line distance between two points in Euclidean space.
  • the cosine similarity between the first text feature vector and each second text feature vector can also be calculated to characterize the similarity between the two. degree.
  • Step a2 if there is an Euclidean distance greater than a preset threshold, classify the alarm text to be classified and the template text corresponding to the Euclidean distance greater than the preset threshold into the same category;
  • the preset threshold can be set according to actual needs and is not limited here.
  • Step a3 if there is no Euclidean distance greater than the preset threshold, the alarm text to be classified is divided into a new category, and the alarm text to be classified is saved in the preset template pool as a new Template text.
  • the alarm text to be classified is not similar to any text in the template text.
  • the alarm text to be classified is divided into a new category, and the The alarm text to be classified is saved in the preset template pool as a new template text for classifying the subsequent received alarm text.
  • the embodiment of the application provides a text classification method.
  • the first word segmentation set is obtained, and whether there is template text in the preset template pool is detected; if the preset template pool is If there is a template text in the template text, feature extraction is performed on the alarm text to be classified based on the first word segmentation set and the first preset rule to obtain the first text feature vector, and feature extraction is performed on the template text to obtain the second text feature vector; The similarity between the first text feature vector and the second text feature vector, and the to-be-classified alarm text is classified according to the calculation result.
  • the embodiment of this application extracts the first text feature vector and the second text feature vector based on the alarm text to be classified after word segmentation processing and the template text in the preset template pool, and calculates the similarity between the two It can accurately classify the classified alarm information, which can improve the accuracy of alarm text classification in operation and maintenance scenarios.
  • this application intelligently and accurately classifies the alarm text in the operation and maintenance scene, which can also improve the work efficiency of the operation and maintenance staff.
  • the text classification method may further include the following steps:
  • the step of "performing word segmentation processing on the alarm text to be classified to obtain the first word segmentation set” includes: performing word segmentation processing on the alarm text to be classified through word replacement processing to obtain the first word segmentation set.
  • each word has a larger weight in the entire alarm text, in order to avoid words that disturb the overall semantic judgment from affecting the subsequent feature value calculation results and classification accuracy . It is necessary to preprocess non-standard vocabularies of the same nature, and replace vocabulary containing similar information with corresponding vocabulary tags or class names, so as to reduce the disturbance of vocabulary to semantic judgment and improve the accuracy of text classification.
  • word replacement processing is performed on the alarm text to be classified to improve the accuracy of text classification.
  • the step of "performing vocabulary replacement processing on the alarm text to be classified" includes:
  • Step b1 receiving an alarm text to be classified, and detecting whether there is a preset target vocabulary in the alarm text to be classified;
  • Step b2 If there is a preset target vocabulary in the alarm text to be classified, replace the preset target vocabulary existing in the alarm text to be classified with a corresponding vocabulary tag.
  • the preset target vocabulary in the alarm text to be classified is replaced with a corresponding vocabulary tag.
  • the mapping relationship between the preset target vocabulary and the vocabulary label may be constructed in advance, and when the preset target vocabulary is detected in the alarm text to be classified, the corresponding vocabulary label is determined according to the mapping relationship and replaced. For example, for alarm text 1: The CPU occupancy rate of the XXX system reaches 98.7%, and the alarm text 2: The CPU occupancy rate of the XXX system reaches 90.1%.
  • the specific values in the alarm text 1 and 2 account for the entire alarm text. The weight is relatively large, but it has no substantial effect on the classification of the alarm text.
  • the numerical vocabulary can be replaced with the vocabulary tag ⁇ num>.
  • the same (type) system may have different system naming, so you can set it to replace it with the vocabulary tag ⁇ SUBSYS>.
  • the preprocessing of the warning text to be classified in this application is to replace non-standard words of the same nature with the same vocabulary label to help the warning text to be classified Pure information that has a substantial impact on subsequent classification is extracted, reducing the disturbance of similar words on semantic judgments, thereby improving the accuracy of text classification.
  • the word segmentation process is performed on the alarm text to be classified after vocabulary replacement processing to obtain the first word segmentation set, and then the subsequent steps are performed.
  • the specific process please refer to the first embodiment above, which will not be repeated here. .
  • the first attribute value of each preset feature word is calculated subsequently based on the first word segmentation set and the second word segmentation set, and the second attribute value and template text of each preset feature word in the alarm text to be classified are calculated respectively
  • the preset feature word may not only include the above-mentioned vocabulary that has a substantial influence on the classification of the alarm text, but also may include the vocabulary label after the above-mentioned word replacement.
  • the vocabulary replacement process is performed on the alarm text to be classified, and non-standard words of the same nature are replaced with the same vocabulary label, which reduces the disturbance of the semantic judgment of the similar words, which can help the classification of the alarm text to be classified.
  • the affected feature information is extracted, which can further improve the accuracy of alarm text classification.
  • FIG. 4 is a schematic flowchart of a second embodiment of a text classification method of this application.
  • the text classification method further includes:
  • step S50 perform feature extraction on the alarm text to be classified based on the first word segmentation set and the second preset rule to obtain a third text feature vector;
  • Step S50 includes:
  • Step c1 calculating the fourth attribute value and the fifth attribute value of each preset feature word based on the first word segmentation set;
  • Step c2 Calculate the third feature value of each preset feature word in the alarm text to be classified according to the fourth attribute value and the fifth attribute value, and splice the third feature value to obtain the third Text feature vector.
  • the third feature value of each preset feature word in the alarm text to be classified is calculated according to the fourth attribute value and the fifth attribute value, and the third feature value is spliced to obtain a third text feature vector.
  • the third text feature vector and the first text feature vector are acquired in a similar manner, and reference may be made to the above-mentioned first embodiment, which will not be repeated here.
  • Step S60 clustering the third text feature vector, and classifying the alarm text to be classified according to the clustering result
  • clustering and classification methods may include but are not limited to: 1) Calculate the Euclidean distance between each third text feature vector, and classify the alarm text to be classified corresponding to the third text feature vector whose Euclidean distance is greater than a preset threshold The same category; 2) Calculate the cosine similarity (or Jaccard distance and other values that can characterize the similarity) between the third text feature vectors, and classify the alarm texts to be classified according to the calculation results; 3) Use the preset clustering algorithm (Such as K-Means (K-means) clustering, hierarchical clustering algorithm, etc.) cluster the third text feature vector, and classify the alarm text to be classified according to the clustering result, for example, clustering into n categories, then Regarding each category as a category, n types of alarm texts are obtained.
  • K-Means K-means
  • Step S70 randomly select an alarm text to be classified from each classification according to the classification result, as a template text, and save it in the preset template pool.
  • the embodiment of the present application introduces a classification method of the alarm text to be classified when there is no template text in the preset template pool.
  • a classification method of the alarm text to be classified By extracting features of the alarm text to be classified, and then clustering the extracted third text feature vector, and classifying the alarm information to be classified according to the clustering result, the accuracy of the alarm text classification in the operation and maintenance scene can be improved.
  • this application intelligently and accurately classifies the alarm text in the operation and maintenance scene, which can also improve the work efficiency of the operation and maintenance staff.
  • vocabulary replacement processing can be performed on the classified alarm text, so that non-standard vocabulary of the same nature can be replaced with the same vocabulary label through vocabulary replacement processing, thereby reducing similar vocabulary pairs.
  • the disturbance of semantic judgment can help the feature information of the alarm text to be classified that has a substantial impact on the classification to be extracted, so as to further improve the accuracy of alarm text classification.
  • word replacement processing process please refer to the above-mentioned embodiment, which is not repeated here.
  • the application also provides a text classification device.
  • FIG. 5 is a schematic diagram of the functional modules of the first embodiment of the text classification device of this application.
  • the text classification device includes:
  • the word segmentation processing module 10 is configured to receive the alarm text to be classified and perform word segmentation processing on the alarm text to be classified to obtain the first word segmentation set;
  • the template detection module 20 is used to detect whether template text exists in the preset template pool
  • the first extraction module 30 is configured to, if a template text exists in the preset template pool, perform feature extraction on the alarm text to be classified based on the first word segmentation set and the first preset rule to obtain a first text feature vector, And perform feature extraction on the template text to obtain a second text feature vector;
  • the text classification module 40 is configured to calculate the similarity between the first text feature vector and the second text feature vector, and classify the alarm text to be classified according to the calculation result.
  • the first extraction module 30 includes:
  • the word segmentation processing unit is configured to perform word segmentation processing on the template text to obtain a second word segmentation set
  • the first calculation unit is configured to calculate the first attribute value of each preset feature word based on the first word segmentation set and the second word segmentation set, and respectively calculate the first attribute value of each preset feature word in the alarm text to be classified A second attribute value and a third attribute value of each preset feature word in the template text;
  • the first splicing unit is configured to calculate the first characteristic value of each preset characteristic word in the alarm text to be classified according to the first attribute value and the second attribute value, and splice the first characteristic value , Get the first text feature vector;
  • the second splicing unit is configured to calculate the second characteristic value of each preset characteristic word in the template text according to the first attribute value and the third attribute value, and splice the second characteristic value to obtain The second text feature vector.
  • calculation formula of the first attribute value is:
  • V s1 F s1 ⁇ F s2 ;
  • V s2 F s1 ⁇ F s3 ;
  • F s1 is the first attribute value of the preset feature word s
  • n s is the total number of the alarm text to be classified and the template text
  • df(t, s) is the alarm text to be classified and the
  • the template text contains the number of preset feature words s
  • is a preset value
  • F s2 is the second attribute value of the preset feature word s in the alarm text to be classified
  • t s1 is the preset feature word s
  • t total1 is the total number of word segmentation in the current alarm text to be classified
  • F s3 is the third attribute value of the preset feature word s in the template text
  • t s2 is the preset feature word s
  • t total2 is the total number of word segmentation in the current template text
  • V s1 is the first feature value of the preset feature word s in the alarm text to
  • the text classification module 40 includes:
  • a second calculation unit configured to calculate the Euclidean distance between the first text feature vector and each of the second text feature vectors, and determine whether there is a Euclidean distance greater than a preset threshold according to the calculation result;
  • the first classification unit is configured to, if there is a Euclidean distance greater than a preset threshold, classify the alarm text to be classified and the template text corresponding to the Euclidean distance greater than the preset threshold into the same category;
  • the second classification unit is configured to, if there is no Euclidean distance greater than the preset threshold, divide the alarm text to be classified into a new category, and save the alarm text to be classified into the preset template pool , As a new template text.
  • the text classification device further includes:
  • the word replacement module is used to perform word replacement processing on the alarm text to be classified
  • the word replacement module includes:
  • a text detection unit for detecting whether there is a preset target vocabulary in the alarm text to be classified
  • a vocabulary replacement unit configured to replace the preset target vocabulary existing in the alarm text to be classified with a corresponding vocabulary label if there is a preset target vocabulary in the alarm text to be classified;
  • the word segmentation processing module 10 is specifically configured to perform word segmentation processing on the alarm text to be classified after word replacement processing to obtain the first word segmentation set.
  • the text classification device further includes:
  • the second extraction module is configured to, if there is no template text in the preset template pool, perform feature extraction on the alarm text to be classified based on the first word segmentation set and the second preset rule to obtain a third text feature vector;
  • a vector clustering module configured to cluster the third text feature vector, and classify the alarm text to be classified according to the clustering result
  • the template selection module is used to randomly select an alarm text to be classified from each classification according to the classification result, as the template text, and save it in the preset template pool.
  • the second extraction module includes:
  • the third calculation unit is configured to calculate the fourth attribute value and the fifth attribute value of each preset feature word based on the first word segmentation set;
  • the third splicing unit is configured to calculate the third characteristic value of each preset characteristic word in the alarm text to be classified according to the fourth attribute value and the fifth attribute value, and splicing the third characteristic value , Get the third text feature vector.
  • each module in the above text classification device corresponds to each step in the above embodiment of the text classification method, and the functions and realization processes thereof will not be repeated here.
  • the present application also provides a computer-readable storage medium having a text classification program stored on the computer-readable storage medium, and when the text classification program is executed by a processor, the text classification method as described in any of the above embodiments is implemented step.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé, un appareil et un dispositif de classification de texte et un support de stockage lisible par ordinateur. Le procédé de classification de texte consiste à : recevoir un texte d'alarme à classifier et effectuer un traitement de segmentation de mot sur le texte d'alarme à classifier, de façon à obtenir un premier ensemble de mots segmenté (S10) ; détecter s'il existe un texte de modèle dans un groupe de modèles prédéfini (S20) ; s'il y a un texte de modèle dans le groupe de modèles prédéfini, effectuer une extraction de caractéristiques sur le texte d'alarme à classifier sur la base du premier ensemble de mots segmenté et d'une première règle prédéfinie, de façon à obtenir un premier vecteur de caractéristique de texte, et effectuer une extraction de caractéristique sur le texte de modèle pour obtenir un second vecteur de caractéristique de texte (S30) ; et calculer la similarité entre le premier vecteur de caractéristique de texte et le second vecteur de caractéristique de texte et classifier le texte d'alarme à classifier en fonction d'un résultat de calcul (S40). Le procédé peut résoudre le problème selon lequel la précision de la classification de texte d'alarme pour un scénario de fonctionnement et de maintenance est relativement faible.
PCT/CN2020/078389 2019-04-12 2020-03-09 Procédé, appareil et dispositif de classification de texte et support de stockage lisible par ordinateur WO2020207167A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910297133.2A CN110008343B (zh) 2019-04-12 2019-04-12 文本分类方法、装置、设备及计算机可读存储介质
CN201910297133.2 2019-04-12

Publications (1)

Publication Number Publication Date
WO2020207167A1 true WO2020207167A1 (fr) 2020-10-15

Family

ID=67171668

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/078389 WO2020207167A1 (fr) 2019-04-12 2020-03-09 Procédé, appareil et dispositif de classification de texte et support de stockage lisible par ordinateur

Country Status (2)

Country Link
CN (1) CN110008343B (fr)
WO (1) WO2020207167A1 (fr)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008343B (zh) * 2019-04-12 2024-08-02 深圳前海微众银行股份有限公司 文本分类方法、装置、设备及计算机可读存储介质
JP7091295B2 (ja) * 2019-09-06 2022-06-27 株式会社東芝 解析装置、解析方法及びプログラム
CN113111895A (zh) * 2020-02-13 2021-07-13 北京明亿科技有限公司 基于支持向量机的处警警情类别确定方法和装置
CN111460180B (zh) * 2020-03-30 2024-03-15 维沃移动通信有限公司 信息显示方法、装置、电子设备及存储介质
CN112328799B (zh) * 2021-01-06 2021-04-02 腾讯科技(深圳)有限公司 问题分类方法和装置
CN112989050B (zh) * 2021-03-31 2023-05-30 建信金融科技有限责任公司 一种表格分类方法、装置、设备及存储介质
CN112988954B (zh) * 2021-05-17 2021-09-21 腾讯科技(深圳)有限公司 文本分类方法、装置、电子设备和计算机可读存储介质
CN113377911B (zh) * 2021-06-09 2022-10-14 广东电网有限责任公司广州供电局 一种文本信息提取方法、装置、电子设备及存储介质
CN113254653B (zh) * 2021-07-05 2021-12-21 明品云(北京)数据科技有限公司 一种文本分类方法、系统、设备及介质
CN113657445B (zh) * 2021-07-13 2022-06-07 珠海金智维信息科技有限公司 基于Resnet的单行文本图片比对方法及系统
CN113704467B (zh) * 2021-07-29 2024-07-02 大箴(杭州)科技有限公司 基于数据模板的海量文本监控方法及装置、介质、设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103257957A (zh) * 2012-02-15 2013-08-21 深圳市腾讯计算机系统有限公司 一种基于中文分词的文本相似性识别方法及装置
US20140052728A1 (en) * 2011-04-27 2014-02-20 Nec Corporation Text clustering device, text clustering method, and computer-readable recording medium
CN104112026A (zh) * 2014-08-01 2014-10-22 中国联合网络通信集团有限公司 一种短信文本分类方法及系统
CN105045812A (zh) * 2015-06-18 2015-11-11 上海高欣计算机系统有限公司 文本主题的分类方法及系统
CN110008343A (zh) * 2019-04-12 2019-07-12 深圳前海微众银行股份有限公司 文本分类方法、装置、设备及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102937960B (zh) * 2012-09-06 2015-06-17 北京邮电大学 突发事件热点话题的识别与评估装置
CN102831246B (zh) * 2012-09-17 2014-09-24 中央民族大学 藏文网页分类方法和装置
CN106919619B (zh) * 2015-12-28 2021-09-07 阿里巴巴集团控股有限公司 一种商品聚类方法、装置及电子设备
CN105677873B (zh) * 2016-01-11 2019-03-26 中国电子科技集团公司第十研究所 基于领域知识模型的文本情报关联聚类汇集处理方法
CN107291723B (zh) * 2016-03-30 2021-04-30 阿里巴巴集团控股有限公司 网页文本分类的方法和装置,网页文本识别的方法和装置
CN107315777A (zh) * 2017-05-31 2017-11-03 国家电网公司 一种基于k最近邻算法的电网监控信号的分类压缩方法
CN108563722B (zh) * 2018-04-03 2021-04-02 有米科技股份有限公司 文本信息的行业分类方法、系统、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140052728A1 (en) * 2011-04-27 2014-02-20 Nec Corporation Text clustering device, text clustering method, and computer-readable recording medium
CN103257957A (zh) * 2012-02-15 2013-08-21 深圳市腾讯计算机系统有限公司 一种基于中文分词的文本相似性识别方法及装置
CN104112026A (zh) * 2014-08-01 2014-10-22 中国联合网络通信集团有限公司 一种短信文本分类方法及系统
CN105045812A (zh) * 2015-06-18 2015-11-11 上海高欣计算机系统有限公司 文本主题的分类方法及系统
CN110008343A (zh) * 2019-04-12 2019-07-12 深圳前海微众银行股份有限公司 文本分类方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN110008343A (zh) 2019-07-12
CN110008343B (zh) 2024-08-02

Similar Documents

Publication Publication Date Title
WO2020207167A1 (fr) Procédé, appareil et dispositif de classification de texte et support de stockage lisible par ordinateur
US11093854B2 (en) Emoji recommendation method and device thereof
CN111444723B (zh) 信息抽取方法、计算机设备和存储介质
WO2020108063A1 (fr) Procédé, appareil et serveur de détermination de mots caractéristiques
WO2022126963A1 (fr) Procédé de profilage de client basé sur un corpus de réponse client, et dispositif associé
CN110909165A (zh) 数据处理方法、装置、介质及电子设备
US20120136812A1 (en) Method and system for machine-learning based optimization and customization of document similarities calculation
WO2020057021A1 (fr) Procédé et dispositif de traitement de table de données, dispositif informatique et support d'informations
CN113051362A (zh) 数据的查询方法、装置和服务器
CN113657088A (zh) 接口文档解析方法、装置、电子设备以及存储介质
WO2016188334A1 (fr) Procédé et dispositif de traitement de données d'accès à une application
CN114491034B (zh) 一种文本分类方法及智能设备
CN117725161A (zh) 文本中变种词的识别及提取敏感词的方法和系统
CN114724156A (zh) 表单识别方法、装置及电子设备
CN114092948A (zh) 一种票据识别方法、装置、设备以及存储介质
WO2021174814A1 (fr) Procédé et appareil de vérification de réponses pour une tâche d'externalisation ouverte, dispositif informatique et support d'informations
CN112487808A (zh) 基于大数据的新闻消息推送方法、装置、设备及存储介质
CN116955856A (zh) 信息展示方法、装置、电子设备以及存储介质
US20230004715A1 (en) Method and apparatus for constructing object relationship network, and electronic device
CN114444514B (zh) 语义匹配模型训练、语义匹配方法及相关装置
CN113095073B (zh) 语料标签生成方法、装置、计算机设备和存储介质
CN115906797A (zh) 文本实体对齐方法、装置、设备及介质
CN112926297B (zh) 处理信息的方法、装置、设备和存储介质
CN115982347A (zh) 一种标注数据质检方法、终端设备及存储介质
CN113742501A (zh) 一种信息提取方法、装置、设备、及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20786829

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20786829

Country of ref document: EP

Kind code of ref document: A1