CN107239440B - 一种垃圾文本识别方法和装置 - Google Patents
一种垃圾文本识别方法和装置 Download PDFInfo
- Publication number
- CN107239440B CN107239440B CN201710273503.XA CN201710273503A CN107239440B CN 107239440 B CN107239440 B CN 107239440B CN 201710273503 A CN201710273503 A CN 201710273503A CN 107239440 B CN107239440 B CN 107239440B
- Authority
- CN
- China
- Prior art keywords
- text
- probability
- features
- semantic
- junk
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Transfer Between Computers (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710273503.XA CN107239440B (zh) | 2017-04-21 | 2017-04-21 | 一种垃圾文本识别方法和装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710273503.XA CN107239440B (zh) | 2017-04-21 | 2017-04-21 | 一种垃圾文本识别方法和装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107239440A CN107239440A (zh) | 2017-10-10 |
CN107239440B true CN107239440B (zh) | 2021-05-25 |
Family
ID=59984086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710273503.XA Active CN107239440B (zh) | 2017-04-21 | 2017-04-21 | 一种垃圾文本识别方法和装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107239440B (zh) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228704B (zh) * | 2017-11-03 | 2021-07-13 | 创新先进技术有限公司 | 识别风险内容的方法及装置、设备 |
CN108804413B (zh) * | 2018-04-28 | 2022-03-22 | 百度在线网络技术(北京)有限公司 | 文本作弊的识别方法及装置 |
CN108650546B (zh) * | 2018-05-11 | 2021-07-23 | 武汉斗鱼网络科技有限公司 | 弹幕处理方法、计算机可读存储介质及电子设备 |
CN109036570B (zh) * | 2018-05-31 | 2021-08-31 | 云知声智能科技股份有限公司 | 超声科非病历内容的过滤方法及系统 |
CN109255069A (zh) * | 2018-07-31 | 2019-01-22 | 阿里巴巴集团控股有限公司 | 一种离散文本内容风险识别方法和系统 |
CN110875959B (zh) * | 2018-08-13 | 2022-10-18 | 阿里巴巴集团控股有限公司 | 识别数据的方法、识别垃圾邮箱的方法及文件识别的方法 |
CN110929530B (zh) * | 2018-09-17 | 2023-04-25 | 阿里巴巴集团控股有限公司 | 一种多语言垃圾文本的识别方法、装置和计算设备 |
CN109766435A (zh) * | 2018-11-06 | 2019-05-17 | 武汉斗鱼网络科技有限公司 | 弹幕类别识别方法、装置、设备及存储介质 |
CN109582788A (zh) * | 2018-11-09 | 2019-04-05 | 北京京东金融科技控股有限公司 | 垃圾评论训练、识别方法、装置、设备及可读存储介质 |
CN109783804B (zh) * | 2018-12-17 | 2023-07-07 | 北京百度网讯科技有限公司 | 低质言论识别方法、装置、设备及计算机可读存储介质 |
CN109873755B (zh) * | 2019-03-02 | 2021-01-01 | 北京亚鸿世纪科技发展有限公司 | 一种基于变体词识别技术的垃圾短信分类引擎 |
CN110717328B (zh) * | 2019-07-04 | 2021-06-18 | 北京达佳互联信息技术有限公司 | 文本识别方法、装置、电子设备及存储介质 |
CN112287100A (zh) * | 2019-07-12 | 2021-01-29 | 阿里巴巴集团控股有限公司 | 文本识别方法、拼写纠错方法及语音识别方法 |
CN110543632B (zh) * | 2019-08-23 | 2024-04-16 | 北京粉笔蓝天科技有限公司 | 一种文本信息识别方法、装置、储存介质及电子设备 |
CN113626561A (zh) * | 2021-08-16 | 2021-11-09 | 深圳市云采网络科技有限公司 | 一种元器件的型号识别方法、装置、介质和设备 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101184259B (zh) * | 2007-11-01 | 2010-06-23 | 浙江大学 | 垃圾短信中的关键词自动学习及更新方法 |
CN101477544B (zh) * | 2009-01-12 | 2011-09-21 | 腾讯科技(深圳)有限公司 | 一种识别垃圾文本的方法和系统 |
CN103678373B (zh) * | 2012-09-17 | 2017-11-17 | 腾讯科技(深圳)有限公司 | 一种垃圾模板文章识别方法和设备 |
CN104702492B (zh) * | 2015-03-19 | 2019-10-18 | 百度在线网络技术(北京)有限公司 | 垃圾消息模型训练方法、垃圾消息识别方法及其装置 |
CN104731772B (zh) * | 2015-04-14 | 2017-05-24 | 辽宁大学 | 基于改进特征评估函数的贝叶斯垃圾邮件过滤方法 |
-
2017
- 2017-04-21 CN CN201710273503.XA patent/CN107239440B/zh active Active
Also Published As
Publication number | Publication date |
---|---|
CN107239440A (zh) | 2017-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107239440B (zh) | 一种垃圾文本识别方法和装置 | |
Gupta et al. | Study of Twitter sentiment analysis using machine learning algorithms on Python | |
CN108829822B (zh) | 媒体内容的推荐方法和装置、存储介质、电子装置 | |
CN106649818B (zh) | 应用搜索意图的识别方法、装置、应用搜索方法和服务器 | |
CN104408093B (zh) | 一种新闻事件要素抽取方法与装置 | |
CN109446404B (zh) | 一种网络舆情的情感极性分析方法和装置 | |
US8386240B2 (en) | Domain dictionary creation by detection of new topic words using divergence value comparison | |
WO2009026850A1 (en) | Domain dictionary creation | |
CN105183717A (zh) | 一种基于随机森林和用户关系的osn用户情感分析方法 | |
Susanti et al. | Twitter’s sentiment analysis on GSM services using Multinomial Naïve Bayes | |
CN108009297B (zh) | 基于自然语言处理的文本情感分析方法与系统 | |
CN110298041B (zh) | 垃圾文本过滤方法、装置、电子设备及存储介质 | |
CN109829151B (zh) | 一种基于分层狄利克雷模型的文本分割方法 | |
Swanson et al. | Extracting the native language signal for second language acquisition | |
CN111782793A (zh) | 智能客服处理方法和系统及设备 | |
CN112883734A (zh) | 区块链安全事件舆情监测方法及系统 | |
CN113282754A (zh) | 针对新闻事件的舆情检测方法、装置、设备和存储介质 | |
CN112270191A (zh) | 提取工单文本主题的方法及装置 | |
CN114756675A (zh) | 文本分类方法、相关设备及可读存储介质 | |
CN107797981B (zh) | 一种目标文本识别方法及装置 | |
Andriotis et al. | Smartphone message sentiment analysis | |
CN112395881B (zh) | 物料标签的构建方法、装置、可读存储介质及电子设备 | |
CN110019763B (zh) | 文本过滤方法、系统、设备及计算机可读存储介质 | |
KR20200064490A (ko) | 프로필 자동생성서버 및 방법 | |
Hussain et al. | A technique for perceiving abusive bangla comments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20180524 Address after: 310000 704, room 18, 998 West Wen Yi Road, Wuchang Street, Yuhang District, Hangzhou, Zhejiang. Applicant after: Tong shield Holdings Limited Address before: 311100 18 Yuhang 207, Wen Yi Xi Road, Yuhang District, Hangzhou, Zhejiang. Applicant before: With Shield Technology Co., Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210908 Address after: 311121 room 210, building 18, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Bodun Xiyan Technology Co.,Ltd. Address before: 310000 704, room 18, 998 West Wen Yi Road, Wuchang Street, Yuhang District, Hangzhou, Zhejiang. Patentee before: TONGDUN HOLDINGS Co.,Ltd. |
|
TR01 | Transfer of patent right |