WO2018028065A1 - Procédé et dispositif de classification de texto et support d'informations informatique - Google Patents

Procédé et dispositif de classification de texto et support d'informations informatique Download PDF

Info

Publication number
WO2018028065A1
WO2018028065A1 PCT/CN2016/105378 CN2016105378W WO2018028065A1 WO 2018028065 A1 WO2018028065 A1 WO 2018028065A1 CN 2016105378 W CN2016105378 W CN 2016105378W WO 2018028065 A1 WO2018028065 A1 WO 2018028065A1
Authority
WO
WIPO (PCT)
Prior art keywords
short message
type
short
vector
word
Prior art date
Application number
PCT/CN2016/105378
Other languages
English (en)
Chinese (zh)
Inventor
陈军
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018028065A1 publication Critical patent/WO2018028065A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. short messaging services [SMS] or e-mails
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Definitions

  • a first determining module configured to determine a first classification model, where the short information type corresponding to the first classification model includes at least one first short information type and a non-first short information type;
  • the first determining module is configured to determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.
  • a second operation module configured to perform a weighting operation on the read symbol vector and the word vector according to the second classification model, to obtain a second operation result
  • Step 101 Identify a preset feature word in the received short message.
  • Step 106 Determine, according to the first operation result, that the type of the short message is the first short information type or the non-first short information type.
  • the preset feature words may be an email address, a web address, a date, a time, a percentage, a quantifier, a currency, a phone number, a number, a foreign language, etc., or may be a customized vocabulary, including a vocabulary of a professional application field. Idioms, food, places, works, equipment, names of people, place names and institution names, etc., are not limited by the present invention.
  • the short information may be gradually determined by using a cascading manner, that is, the first classification model, the second classification model, the third classification model, and the fourth classification model are sequentially used to determine Achieve a finer classification.
  • the standardized short message can facilitate subsequent semantic analysis.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de classification d'un texto et un support d'informations informatique. Le procédé de classification d'un texto consiste : à reconnaître un mot de caractéristique prédéfini dans un texto reçu; à remplacer le mot de caractéristique prédéfini dans le texto par un symbole de caractéristique correspondant au mot de caractéristique prédéfini; à déterminer un premier modèle de classification; à lire, à partir d'une bibliothèque de vecteurs de mots à fréquence élevée du premier modèle de classification, un vecteur de symbole du symbole de caractéristique et un vecteur de mot des mots restants autres que le mot de caractéristique prédéfini dans le texto, à effectuer une opération pondérée, en fonction du premier modèle de classification, sur le vecteur de symbole et le vecteur de mot ayant été lus, en vue d'obtenir un premier résultat d'opération et de déterminer le type du texto en fonction du premier résultat d'opération. La solution de la présente invention, à l'aide d'un modèle de classification prédéfini, permet de déterminer avec précision le type de texto auquel appartient un texto, de réaliser une gestion intelligente de textos et de faciliter l'interrogation et l'organisation de textos par un utilisateur.
PCT/CN2016/105378 2016-08-11 2016-11-10 Procédé et dispositif de classification de texto et support d'informations informatique WO2018028065A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610659527.4 2016-08-11
CN201610659527.4A CN107734131B (zh) 2016-08-11 2016-08-11 一种短信息分类方法及装置

Publications (1)

Publication Number Publication Date
WO2018028065A1 true WO2018028065A1 (fr) 2018-02-15

Family

ID=61161749

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/105378 WO2018028065A1 (fr) 2016-08-11 2016-11-10 Procédé et dispositif de classification de texto et support d'informations informatique

Country Status (2)

Country Link
CN (1) CN107734131B (fr)
WO (1) WO2018028065A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241269A (zh) * 2018-11-09 2020-06-05 中移(杭州)信息技术有限公司 一种短信文本分类方法、装置、电子设备及存储介质
CN113657106A (zh) * 2021-07-05 2021-11-16 西安理工大学 基于归一化词频权重的特征选择方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110913354A (zh) * 2018-09-17 2020-03-24 阿里巴巴集团控股有限公司 短信分类方法、装置及电子设备
CN110929025B (zh) * 2018-09-17 2023-04-25 阿里巴巴集团控股有限公司 垃圾文本的识别方法、装置、计算设备及可读存储介质
CN111209751B (zh) * 2020-02-14 2023-07-28 全球能源互联网研究院有限公司 一种中文分词方法、装置及存储介质
CN116468037A (zh) * 2023-03-17 2023-07-21 北京深维智讯科技有限公司 一种基于nlp的数据处理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013061757A (ja) * 2011-09-13 2013-04-04 Hitachi Solutions Ltd 文書分類方法
JP2013120534A (ja) * 2011-12-08 2013-06-17 Mitsubishi Electric Corp 関連語分類装置及びコンピュータプログラム及び関連語分類方法
CN103778226A (zh) * 2014-01-23 2014-05-07 北京奇虎科技有限公司 构建语言信息识别模型的方法及语言信息识别装置
CN104834747A (zh) * 2015-05-25 2015-08-12 中国科学院自动化研究所 基于卷积神经网络的短文本分类方法
CN104978354A (zh) * 2014-04-10 2015-10-14 中电长城网际系统应用有限公司 文本分类方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103024746B (zh) * 2012-12-30 2015-06-17 清华大学 一种电信运营商垃圾短信处理系统及处理方法
CN105447750B (zh) * 2015-11-17 2022-06-03 小米科技有限责任公司 信息识别方法、装置、终端及服务器
CN105488025B (zh) * 2015-11-24 2019-02-12 小米科技有限责任公司 模板构建方法和装置、信息识别方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013061757A (ja) * 2011-09-13 2013-04-04 Hitachi Solutions Ltd 文書分類方法
JP2013120534A (ja) * 2011-12-08 2013-06-17 Mitsubishi Electric Corp 関連語分類装置及びコンピュータプログラム及び関連語分類方法
CN103778226A (zh) * 2014-01-23 2014-05-07 北京奇虎科技有限公司 构建语言信息识别模型的方法及语言信息识别装置
CN104978354A (zh) * 2014-04-10 2015-10-14 中电长城网际系统应用有限公司 文本分类方法和装置
CN104834747A (zh) * 2015-05-25 2015-08-12 中国科学院自动化研究所 基于卷积神经网络的短文本分类方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241269A (zh) * 2018-11-09 2020-06-05 中移(杭州)信息技术有限公司 一种短信文本分类方法、装置、电子设备及存储介质
CN111241269B (zh) * 2018-11-09 2024-02-23 中移(杭州)信息技术有限公司 一种短信文本分类方法、装置、电子设备及存储介质
CN113657106A (zh) * 2021-07-05 2021-11-16 西安理工大学 基于归一化词频权重的特征选择方法

Also Published As

Publication number Publication date
CN107734131B (zh) 2021-02-12
CN107734131A (zh) 2018-02-23

Similar Documents

Publication Publication Date Title
WO2018028065A1 (fr) Procédé et dispositif de classification de texto et support d'informations informatique
US20230222366A1 (en) Systems and methods for semantic analysis based on knowledge graph
CN111177319B (zh) 风险事件的确定方法、装置、电子设备和存储介质
US10447635B2 (en) Filtering electronic messages
US20160379281A1 (en) Compliance violation early warning system
CN110765101B (zh) 标签的生成方法、装置、计算机可读存储介质及服务器
WO2017173093A1 (fr) Procédé et dispositif d'identification d'un courrier électronique indésirable
CN112818692B (zh) 命名实体识别和处理方法、装置、设备及可读存储介质
CN111783432A (zh) 信用证审单检查要点清单的生成方法及装置
CN111259207A (zh) 短信的识别方法、装置及设备
WO2018028164A1 (fr) Procédé et dispositif d'extraction d'informations textuelles, et terminal mobile
US8620918B1 (en) Contextual text interpretation
CN109101487A (zh) 对话角色区分方法、装置、终端设备及存储介质
CN114741501A (zh) 舆情预警方法、装置、可读存储介质及电子设备
CN112699949B (zh) 一种基于社交平台数据的潜在用户识别方法及装置
CN113901817A (zh) 文档分类方法、装置、计算机设备和存储介质
CN113051396A (zh) 文档的分类识别方法、装置和电子设备
CN113472686A (zh) 信息识别方法、装置、设备及存储介质
CN113157948A (zh) 非结构化数据的审计方法、电子设备及存储介质
CN110610213A (zh) 一种邮件分类方法、装置、设备及计算机可读存储介质
CN114091431B (zh) 事项信息提取方法、装置、计算机设备及存储介质
Minhas et al. Linguistic correlates of deception in financial text a corpus linguistics based approach
KR102713581B1 (ko) 인공지능 기반의 투자 지표 결정 및 종목 정보 제공 방법 및 이를 수행하는 컴퓨팅 시스템
KR102451168B1 (ko) 사기피해 정보 제공 방법 및 프로그램
CN116886817A (zh) 业务操作提醒方法、装置、设备、介质和产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16912519

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16912519

Country of ref document: EP

Kind code of ref document: A1