WO2018028065A1 - Procédé et dispositif de classification de texto et support d'informations informatique - Google Patents
Procédé et dispositif de classification de texto et support d'informations informatique Download PDFInfo
- Publication number
- WO2018028065A1 WO2018028065A1 PCT/CN2016/105378 CN2016105378W WO2018028065A1 WO 2018028065 A1 WO2018028065 A1 WO 2018028065A1 CN 2016105378 W CN2016105378 W CN 2016105378W WO 2018028065 A1 WO2018028065 A1 WO 2018028065A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- short message
- type
- short
- vector
- word
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000013145 classification model Methods 0.000 claims abstract description 102
- 238000012545 processing Methods 0.000 claims description 22
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/725—Cordless telephones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
- H04M1/7243—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
- H04M1/72436—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. short messaging services [SMS] or e-mails
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Definitions
- a first determining module configured to determine a first classification model, where the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;
- the first determining module is configured to determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.
- a second operation module configured to perform a weighting operation on the read symbol vector and the word vector according to the second classification model, to obtain a second operation result
- Step 101 Identify a preset feature word in the received short message.
- Step 106 Determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.
- the preset feature words may be an email address, a web address, a date, a time, a percentage, a quantifier, a currency, a phone number, a number, a foreign language, etc., or may be a customized vocabulary, including a vocabulary of a professional application field. Idioms, food, places, works, equipment, names of people, place names and institution names, etc., are not limited by the present invention.
- the short information may be gradually determined by using a cascading manner, that is, the first classification model, the second classification model, the third classification model, and the fourth classification model are sequentially used to determine Achieve a finer classification.
- the standardized short message can facilitate subsequent semantic analysis.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
- the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- General Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
La présente invention concerne un procédé et un dispositif de classification d'un texto et un support d'informations informatique. Le procédé de classification d'un texto consiste : à reconnaître un mot de caractéristique prédéfini dans un texto reçu; à remplacer le mot de caractéristique prédéfini dans le texto par un symbole de caractéristique correspondant au mot de caractéristique prédéfini; à déterminer un premier modèle de classification; à lire, à partir d'une bibliothèque de vecteurs de mots à fréquence élevée du premier modèle de classification, un vecteur de symbole du symbole de caractéristique et un vecteur de mot des mots restants autres que le mot de caractéristique prédéfini dans le texto, à effectuer une opération pondérée, en fonction du premier modèle de classification, sur le vecteur de symbole et le vecteur de mot ayant été lus, en vue d'obtenir un premier résultat d'opération et de déterminer le type du texto en fonction du premier résultat d'opération. La solution de la présente invention, à l'aide d'un modèle de classification prédéfini, permet de déterminer avec précision le type de texto auquel appartient un texto, de réaliser une gestion intelligente de textos et de faciliter l'interrogation et l'organisation de textos par un utilisateur.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610659527.4 | 2016-08-11 | ||
CN201610659527.4A CN107734131B (zh) | 2016-08-11 | 2016-08-11 | 一种短信息分类方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018028065A1 true WO2018028065A1 (fr) | 2018-02-15 |
Family
ID=61161749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/105378 WO2018028065A1 (fr) | 2016-08-11 | 2016-11-10 | Procédé et dispositif de classification de texto et support d'informations informatique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107734131B (fr) |
WO (1) | WO2018028065A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241269A (zh) * | 2018-11-09 | 2020-06-05 | 中移(杭州)信息技术有限公司 | 一种短信文本分类方法、装置、电子设备及存储介质 |
CN113657106A (zh) * | 2021-07-05 | 2021-11-16 | 西安理工大学 | 基于归一化词频权重的特征选择方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110913354A (zh) * | 2018-09-17 | 2020-03-24 | 阿里巴巴集团控股有限公司 | 短信分类方法、装置及电子设备 |
CN110929025B (zh) * | 2018-09-17 | 2023-04-25 | 阿里巴巴集团控股有限公司 | 垃圾文本的识别方法、装置、计算设备及可读存储介质 |
CN111209751B (zh) * | 2020-02-14 | 2023-07-28 | 全球能源互联网研究院有限公司 | 一种中文分词方法、装置及存储介质 |
CN116468037A (zh) * | 2023-03-17 | 2023-07-21 | 北京深维智讯科技有限公司 | 一种基于nlp的数据处理方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013061757A (ja) * | 2011-09-13 | 2013-04-04 | Hitachi Solutions Ltd | 文書分類方法 |
JP2013120534A (ja) * | 2011-12-08 | 2013-06-17 | Mitsubishi Electric Corp | 関連語分類装置及びコンピュータプログラム及び関連語分類方法 |
CN103778226A (zh) * | 2014-01-23 | 2014-05-07 | 北京奇虎科技有限公司 | 构建语言信息识别模型的方法及语言信息识别装置 |
CN104834747A (zh) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | 基于卷积神经网络的短文本分类方法 |
CN104978354A (zh) * | 2014-04-10 | 2015-10-14 | 中电长城网际系统应用有限公司 | 文本分类方法和装置 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103024746B (zh) * | 2012-12-30 | 2015-06-17 | 清华大学 | 一种电信运营商垃圾短信处理系统及处理方法 |
CN105447750B (zh) * | 2015-11-17 | 2022-06-03 | 小米科技有限责任公司 | 信息识别方法、装置、终端及服务器 |
CN105488025B (zh) * | 2015-11-24 | 2019-02-12 | 小米科技有限责任公司 | 模板构建方法和装置、信息识别方法和装置 |
-
2016
- 2016-08-11 CN CN201610659527.4A patent/CN107734131B/zh active Active
- 2016-11-10 WO PCT/CN2016/105378 patent/WO2018028065A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013061757A (ja) * | 2011-09-13 | 2013-04-04 | Hitachi Solutions Ltd | 文書分類方法 |
JP2013120534A (ja) * | 2011-12-08 | 2013-06-17 | Mitsubishi Electric Corp | 関連語分類装置及びコンピュータプログラム及び関連語分類方法 |
CN103778226A (zh) * | 2014-01-23 | 2014-05-07 | 北京奇虎科技有限公司 | 构建语言信息识别模型的方法及语言信息识别装置 |
CN104978354A (zh) * | 2014-04-10 | 2015-10-14 | 中电长城网际系统应用有限公司 | 文本分类方法和装置 |
CN104834747A (zh) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | 基于卷积神经网络的短文本分类方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241269A (zh) * | 2018-11-09 | 2020-06-05 | 中移(杭州)信息技术有限公司 | 一种短信文本分类方法、装置、电子设备及存储介质 |
CN111241269B (zh) * | 2018-11-09 | 2024-02-23 | 中移(杭州)信息技术有限公司 | 一种短信文本分类方法、装置、电子设备及存储介质 |
CN113657106A (zh) * | 2021-07-05 | 2021-11-16 | 西安理工大学 | 基于归一化词频权重的特征选择方法 |
Also Published As
Publication number | Publication date |
---|---|
CN107734131B (zh) | 2021-02-12 |
CN107734131A (zh) | 2018-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018028065A1 (fr) | Procédé et dispositif de classification de texto et support d'informations informatique | |
US20230222366A1 (en) | Systems and methods for semantic analysis based on knowledge graph | |
CN111177319B (zh) | 风险事件的确定方法、装置、电子设备和存储介质 | |
US10447635B2 (en) | Filtering electronic messages | |
US20160379281A1 (en) | Compliance violation early warning system | |
CN110765101B (zh) | 标签的生成方法、装置、计算机可读存储介质及服务器 | |
WO2017173093A1 (fr) | Procédé et dispositif d'identification d'un courrier électronique indésirable | |
CN112818692B (zh) | 命名实体识别和处理方法、装置、设备及可读存储介质 | |
CN111783432A (zh) | 信用证审单检查要点清单的生成方法及装置 | |
CN111259207A (zh) | 短信的识别方法、装置及设备 | |
WO2018028164A1 (fr) | Procédé et dispositif d'extraction d'informations textuelles, et terminal mobile | |
US8620918B1 (en) | Contextual text interpretation | |
CN109101487A (zh) | 对话角色区分方法、装置、终端设备及存储介质 | |
CN114741501A (zh) | 舆情预警方法、装置、可读存储介质及电子设备 | |
CN112699949B (zh) | 一种基于社交平台数据的潜在用户识别方法及装置 | |
CN113901817A (zh) | 文档分类方法、装置、计算机设备和存储介质 | |
CN113051396A (zh) | 文档的分类识别方法、装置和电子设备 | |
CN113472686A (zh) | 信息识别方法、装置、设备及存储介质 | |
CN113157948A (zh) | 非结构化数据的审计方法、电子设备及存储介质 | |
CN110610213A (zh) | 一种邮件分类方法、装置、设备及计算机可读存储介质 | |
CN114091431B (zh) | 事项信息提取方法、装置、计算机设备及存储介质 | |
Minhas et al. | Linguistic correlates of deception in financial text a corpus linguistics based approach | |
KR102713581B1 (ko) | 인공지능 기반의 투자 지표 결정 및 종목 정보 제공 방법 및 이를 수행하는 컴퓨팅 시스템 | |
KR102451168B1 (ko) | 사기피해 정보 제공 방법 및 프로그램 | |
CN116886817A (zh) | 业务操作提醒方法、装置、设备、介质和产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16912519 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16912519 Country of ref document: EP Kind code of ref document: A1 |