WO2012030049A3 - 동적 임계값이 적용된 유사문서 분류화 장치 및 방법 - Google Patents
동적 임계값이 적용된 유사문서 분류화 장치 및 방법 Download PDFInfo
- Publication number
- WO2012030049A3 WO2012030049A3 PCT/KR2011/003590 KR2011003590W WO2012030049A3 WO 2012030049 A3 WO2012030049 A3 WO 2012030049A3 KR 2011003590 W KR2011003590 W KR 2011003590W WO 2012030049 A3 WO2012030049 A3 WO 2012030049A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- threshold value
- document
- applying
- dynamic threshold
- documents
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
본 발명은 동적 임계값이 적용된 문서 브라우징 장치 및 방법에 관한 것으로, 입력되거나 저장된 문서에 대하여 각 문서 사이의 유사도를 저장하는 문서 관리 모듈, 문서 관리 모듈을 검색하여 기준 문서와 다른 문서 사이의 유사도가 설정된 임계값 이상의 유사도를 가지는 문서를 검색하는 유사 문서 검색 모듈 및 검색된 문서들을 하나의 군집으로 군집화하는 유사 문서 분류화 모듈을 포함한다.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2010-0085384 | 2010-09-01 | ||
KR1020100085384A KR101035037B1 (ko) | 2010-09-01 | 2010-09-01 | 동적 임계값이 적용된 유사문서 분류화 장치 및 방법 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2012030049A2 WO2012030049A2 (ko) | 2012-03-08 |
WO2012030049A3 true WO2012030049A3 (ko) | 2012-04-26 |
Family
ID=44366141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2011/003590 WO2012030049A2 (ko) | 2010-09-01 | 2011-05-16 | 동적 임계값이 적용된 유사문서 분류화 장치 및 방법 |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR101035037B1 (ko) |
WO (1) | WO2012030049A2 (ko) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101847847B1 (ko) | 2016-11-15 | 2018-04-12 | 주식회사 와이즈넛 | 딥러닝을 이용한 비정형 텍스트 데이터의 문서 군집화 방법 |
US11176179B2 (en) | 2019-09-24 | 2021-11-16 | International Business Machines Corporation | Assigning a new problem record based on a similarity to previous problem records |
KR102376489B1 (ko) * | 2019-11-22 | 2022-03-18 | 주식회사 와이즈넛 | 단어 랭킹 기반의 텍스트 문서 군집 및 주제 생성 장치 및 그 방법 |
KR102373146B1 (ko) * | 2020-03-24 | 2022-03-14 | 경북대학교 산학협력단 | 군집 기반 중복문서 제거 장치 및 제거 방법 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4828091B2 (ja) * | 2003-03-05 | 2011-11-30 | ヒューレット・パッカード・カンパニー | クラスタリング方法プログラム及び装置 |
-
2010
- 2010-09-01 KR KR1020100085384A patent/KR101035037B1/ko not_active IP Right Cessation
-
2011
- 2011-05-16 WO PCT/KR2011/003590 patent/WO2012030049A2/ko active Application Filing
Non-Patent Citations (3)
Title |
---|
HEWLETT PACKARD CO, HP, PUBLICATION NO. 2004-78896, 13 September 2004 (2004-09-13) * |
LG ELECTRONICS INC., PUBLICATION NO. 2007-0102036, 18 October 2007 (2007-10-18) * |
SUNG HO JANG: "Design and Implementation of Keyword-based Document Clustering System", GRADUATE SCHOOL OF KOOKMIN UNIVERSITY MASTER'S DEGREE THESIS, 31 July 2003 (2003-07-31), pages 21 - 39 * |
Also Published As
Publication number | Publication date |
---|---|
WO2012030049A2 (ko) | 2012-03-08 |
KR101035037B1 (ko) | 2011-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012070840A3 (ko) | 컨센서스 검색 장치 및 방법 | |
EP3748629A4 (en) | IDENTIFICATION METHOD AND DEVICE FOR LANGUAGE KEYWORDS, COMPUTER READABLE STORAGE MEDIUM AND COMPUTER DEVICE | |
MX2019001112A (es) | Sistema y metodo para la implementacion de contenedores que extraen y aplican conocimiento de pagina de semantica. | |
WO2014183956A3 (en) | Social media content analysis and output | |
GB2482630A (en) | A data retrieval and indexing method and apparatus | |
WO2013173826A3 (en) | Populating and searching a drug informatics database | |
WO2012129149A3 (en) | Aggregating search results based on associating data instances with knowledge base entities | |
EP3051432A4 (en) | Semantic information acquisition method, keyword expansion method thereof, and search method and system | |
WO2011097066A3 (en) | Semantic table of contents for search results | |
WO2015170191A3 (en) | Method and apparatus for screening promotion keywords | |
WO2011112744A3 (en) | User role based customizable semantic search | |
WO2009140272A3 (en) | Search results with most clicked next objects | |
WO2014085776A3 (en) | Web search ranking | |
WO2010141799A3 (en) | Feature engineering and user behavior analysis | |
WO2014025705A3 (en) | Search result ranking and presentation | |
WO2013163644A3 (en) | Updating a search index used to facilitate application searches | |
WO2010008800A3 (en) | Query identification and association | |
WO2011149961A3 (en) | Systems and methods for identifying intersections using content metadata | |
WO2011159516A3 (en) | Semantic content searching | |
GB201209093D0 (en) | Method of searching for document data files based on keywords,and computer system and computer program thereof | |
WO2014043200A3 (en) | Dynamic data acquisition method and system | |
CA2879417A1 (en) | Structured search queries based on social-graph information | |
WO2007089289A3 (en) | Method for ranking and sorting electronic documents in a search result list based on relevance | |
GB2490070A (en) | Systems and methods for ranking documents | |
WO2009029675A3 (en) | Method and system for data context service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11822024 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11822024 Country of ref document: EP Kind code of ref document: A2 |