CN108241745B - 样本集的处理方法及装置、样本的查询方法及装置 - Google Patents
样本集的处理方法及装置、样本的查询方法及装置 Download PDFInfo
- Publication number
- CN108241745B CN108241745B CN201810014815.3A CN201810014815A CN108241745B CN 108241745 B CN108241745 B CN 108241745B CN 201810014815 A CN201810014815 A CN 201810014815A CN 108241745 B CN108241745 B CN 108241745B
- Authority
- CN
- China
- Prior art keywords
- vector
- sample
- segment
- index
- distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
样本ID | 第一索引字段 | 第二索引字段 |
样本Y1 | C2 | S1-3,S2-1,S3-10,……S50-16 |
样本Y2 | C5 | S1-16,S2-13,S3-1,……,S50-5 |
… | … | …… |
样本R | C9 | S1-3,S2-11,S3-8,…,S50-5 |
样本Ym | Im | IIm-1,IIm-2,IIm-3,…,IIm-j,…,IIm-50 |
粗聚类的类簇 | 对应样本 |
C1 | Y1,Y21,Y23,Y61…… |
C2 | Y3,Y8,Y9,Y34…… |
C3 | Y2,Y5,Y11,Y24…… |
… | … |
C20 | Y4,Y10,Y13,Y52…… |
Claims (27)
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810014815.3A CN108241745B (zh) | 2018-01-08 | 2018-01-08 | 样本集的处理方法及装置、样本的查询方法及装置 |
TW107143437A TWI696081B (zh) | 2018-01-08 | 2018-12-04 | 樣本集的處理方法及裝置、樣本的查詢方法及裝置 |
EP18898858.8A EP3709184B1 (en) | 2018-01-08 | 2018-12-26 | Sample set processing method and apparatus, and sample querying method and apparatus |
PCT/CN2018/123855 WO2019134567A1 (zh) | 2018-01-08 | 2018-12-26 | 样本集的处理方法及装置、样本的查询方法及装置 |
US16/878,482 US10896164B2 (en) | 2018-01-08 | 2020-05-19 | Sample set processing method and apparatus, and sample querying method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810014815.3A CN108241745B (zh) | 2018-01-08 | 2018-01-08 | 样本集的处理方法及装置、样本的查询方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108241745A CN108241745A (zh) | 2018-07-03 |
CN108241745B true CN108241745B (zh) | 2020-04-28 |
Family
ID=62699465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810014815.3A Active CN108241745B (zh) | 2018-01-08 | 2018-01-08 | 样本集的处理方法及装置、样本的查询方法及装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US10896164B2 (zh) |
EP (1) | EP3709184B1 (zh) |
CN (1) | CN108241745B (zh) |
TW (1) | TWI696081B (zh) |
WO (1) | WO2019134567A1 (zh) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108241745B (zh) * | 2018-01-08 | 2020-04-28 | 阿里巴巴集团控股有限公司 | 样本集的处理方法及装置、样本的查询方法及装置 |
CN110889424B (zh) * | 2018-09-11 | 2023-06-30 | 阿里巴巴集团控股有限公司 | 向量索引建立方法及装置和向量检索方法及装置 |
CN111177438B (zh) * | 2018-11-12 | 2023-05-12 | 深圳云天励飞技术有限公司 | 图像特征值的搜索方法、装置、电子设备及存储介质 |
CN110209895B (zh) * | 2019-06-06 | 2023-09-05 | 创新先进技术有限公司 | 向量检索方法、装置和设备 |
CN110443229A (zh) * | 2019-08-22 | 2019-11-12 | 国网四川省电力公司信息通信公司 | 一种基于人工智能的设备显示内容识别方法 |
CN110633379B (zh) * | 2019-08-29 | 2023-04-28 | 北京睿企信息科技有限公司 | 一种基于gpu并行运算的以图搜图系统及方法 |
CN110825894A (zh) * | 2019-09-18 | 2020-02-21 | 平安科技(深圳)有限公司 | 数据索引建立、数据检索方法、装置、设备和存储介质 |
CN110674328A (zh) * | 2019-09-27 | 2020-01-10 | 长城计算机软件与系统有限公司 | 一种商标图像检索方法、系统、介质及设备 |
CN112819018A (zh) * | 2019-10-31 | 2021-05-18 | 北京沃东天骏信息技术有限公司 | 生成样本的方法、装置、电子设备和存储介质 |
CN111026922B (zh) * | 2019-12-26 | 2024-05-28 | 新长城科技有限公司 | 一种分布式向量索引方法、系统、插件及电子设备 |
CN111309984B (zh) * | 2020-03-10 | 2023-09-05 | 支付宝(杭州)信息技术有限公司 | 利用索引从数据库中进行节点向量检索的方法及装置 |
CN111368133B (zh) * | 2020-04-16 | 2021-09-14 | 腾讯科技(深圳)有限公司 | 一种视频库的索引表建立方法、装置、服务器及存储介质 |
CN113626471B (zh) * | 2021-08-05 | 2024-02-23 | 北京达佳互联信息技术有限公司 | 数据检索方法、装置、电子设备及存储介质 |
EP4160434A4 (en) * | 2021-08-16 | 2023-12-13 | Baidu Online Network Technology (Beijing) Co., Ltd | METHOD AND DEVICE FOR CONSTRUCTING A SEARCH DATABASE AND DEVICE AND STORAGE MEDIUM |
CN113449132B (zh) * | 2021-08-26 | 2022-02-25 | 阿里云计算有限公司 | 一种向量检索方法及装置 |
CN114049508B (zh) * | 2022-01-12 | 2022-04-01 | 成都无糖信息技术有限公司 | 一种基于图片聚类和人工研判的诈骗网站识别方法及系统 |
CN114399058B (zh) * | 2022-03-25 | 2022-06-10 | 腾讯科技(深圳)有限公司 | 一种模型更新的方法、相关装置、设备以及存储介质 |
US20230418883A1 (en) * | 2022-06-28 | 2023-12-28 | Open Text Holdings, Inc. | Systems and methods for document analysis to produce, consume and analyze content-by-example logs for documents |
CN116028500B (zh) * | 2023-01-17 | 2023-07-14 | 黑龙江大学 | 一种基于高维数据的范围查询索引方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4041081B2 (ja) * | 2004-03-23 | 2008-01-30 | 東芝ソリューション株式会社 | 分割クラスタリング装置及び分割データ数決定方法 |
CN102663100A (zh) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | 一种两阶段混合粒子群优化聚类方法 |
CN102831225A (zh) * | 2012-08-27 | 2012-12-19 | 南京邮电大学 | 云环境下的多维索引结构、其构建方法及相似性查询方法 |
CN103136337A (zh) * | 2013-02-01 | 2013-06-05 | 北京邮电大学 | 用于复杂网络的分布式知识数据挖掘装置和挖掘方法 |
CN103699771A (zh) * | 2013-09-27 | 2014-04-02 | 广东工业大学 | 一种冷负荷预测的情景-聚类方法 |
CN107451200A (zh) * | 2017-07-06 | 2017-12-08 | 西安交通大学 | 使用随机量化词汇树的检索方法及基于其的图像检索方法 |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57153754U (zh) | 1981-03-23 | 1982-09-27 | ||
US5577135A (en) * | 1994-03-01 | 1996-11-19 | Apple Computer, Inc. | Handwriting signal processing front-end for handwriting recognizers |
US6684186B2 (en) * | 1999-01-26 | 2004-01-27 | International Business Machines Corporation | Speaker recognition using a hierarchical speaker model tree |
CN101283353B (zh) * | 2005-08-03 | 2015-11-25 | 搜索引擎科技有限责任公司 | 通过分析标签找到相关文档的系统和方法 |
JP4550074B2 (ja) * | 2007-01-23 | 2010-09-22 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 不均質な情報源からの情報トラッキングのためのシステム、方法およびコンピュータ実行可能プログラム |
US9589056B2 (en) * | 2011-04-05 | 2017-03-07 | Microsoft Technology Licensing Llc | User information needs based data selection |
JP4976578B1 (ja) * | 2011-09-16 | 2012-07-18 | 楽天株式会社 | 画像検索装置およびプログラム |
KR101348904B1 (ko) * | 2012-01-20 | 2014-01-09 | 한국과학기술원 | 고차 상관 클러스터링을 이용한 이미지 분할 방법, 이를 처리하는 시스템 및 기록매체 |
US10176246B2 (en) * | 2013-06-14 | 2019-01-08 | Microsoft Technology Licensing, Llc | Fast grouping of time series |
US10140353B2 (en) * | 2013-12-23 | 2018-11-27 | Teredata, US, Inc. | Techniques for query processing using high dimension histograms |
US20160328654A1 (en) * | 2015-05-04 | 2016-11-10 | Agt International Gmbh | Anomaly detection for context-dependent data |
CN106557521B (zh) * | 2015-09-29 | 2020-07-14 | 佳能株式会社 | 对象索引方法、对象搜索方法及对象索引系统 |
US10599731B2 (en) * | 2016-04-26 | 2020-03-24 | Baidu Usa Llc | Method and system of determining categories associated with keywords using a trained model |
US10719509B2 (en) * | 2016-10-11 | 2020-07-21 | Google Llc | Hierarchical quantization for fast inner product search |
US10789298B2 (en) * | 2016-11-16 | 2020-09-29 | International Business Machines Corporation | Specialist keywords recommendations in semantic space |
CN106650806B (zh) * | 2016-12-16 | 2019-07-26 | 北京大学深圳研究生院 | 一种用于行人检测的协同式深度网络模型方法 |
US20190065833A1 (en) * | 2017-08-30 | 2019-02-28 | Qualcomm Incorporated | Detecting false positives in face recognition |
CN108241745B (zh) * | 2018-01-08 | 2020-04-28 | 阿里巴巴集团控股有限公司 | 样本集的处理方法及装置、样本的查询方法及装置 |
-
2018
- 2018-01-08 CN CN201810014815.3A patent/CN108241745B/zh active Active
- 2018-12-04 TW TW107143437A patent/TWI696081B/zh active
- 2018-12-26 EP EP18898858.8A patent/EP3709184B1/en active Active
- 2018-12-26 WO PCT/CN2018/123855 patent/WO2019134567A1/zh unknown
-
2020
- 2020-05-19 US US16/878,482 patent/US10896164B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4041081B2 (ja) * | 2004-03-23 | 2008-01-30 | 東芝ソリューション株式会社 | 分割クラスタリング装置及び分割データ数決定方法 |
CN102663100A (zh) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | 一种两阶段混合粒子群优化聚类方法 |
CN102831225A (zh) * | 2012-08-27 | 2012-12-19 | 南京邮电大学 | 云环境下的多维索引结构、其构建方法及相似性查询方法 |
CN103136337A (zh) * | 2013-02-01 | 2013-06-05 | 北京邮电大学 | 用于复杂网络的分布式知识数据挖掘装置和挖掘方法 |
CN103699771A (zh) * | 2013-09-27 | 2014-04-02 | 广东工业大学 | 一种冷负荷预测的情景-聚类方法 |
CN107451200A (zh) * | 2017-07-06 | 2017-12-08 | 西安交通大学 | 使用随机量化词汇树的检索方法及基于其的图像检索方法 |
Also Published As
Publication number | Publication date |
---|---|
TW201931169A (zh) | 2019-08-01 |
WO2019134567A1 (zh) | 2019-07-11 |
EP3709184A1 (en) | 2020-09-16 |
EP3709184A4 (en) | 2020-12-09 |
CN108241745A (zh) | 2018-07-03 |
US10896164B2 (en) | 2021-01-19 |
EP3709184B1 (en) | 2022-11-09 |
US20200278953A1 (en) | 2020-09-03 |
TWI696081B (zh) | 2020-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108241745B (zh) | 样本集的处理方法及装置、样本的查询方法及装置 | |
US8676725B1 (en) | Method and system for entropy-based semantic hashing | |
JP2014505313A (ja) | 類似画像を識別する方法および装置 | |
CN113850281B (zh) | 一种基于meanshift优化的数据处理方法和装置 | |
CN112395487B (zh) | 信息推荐方法、装置、计算机可读存储介质及电子设备 | |
CN110598061A (zh) | 一种多元图融合的异构信息网嵌入方法 | |
Carbonera | An efficient approach for instance selection | |
Zhang et al. | Dynamic time warping under product quantization, with applications to time-series data similarity search | |
US10671663B2 (en) | Generation device, generation method, and non-transitory computer-readable recording medium | |
Tiwari et al. | Entropy weighting genetic k-means algorithm for subspace clustering | |
CN110209895B (zh) | 向量检索方法、装置和设备 | |
Haripriya et al. | Multi label prediction using association rule generation and simple k-means | |
TW201243627A (en) | Multi-label text categorization based on fuzzy similarity and k nearest neighbors | |
Yu et al. | A classifier chain algorithm with k-means for multi-label classification on clouds | |
US20230267175A1 (en) | Systems and methods for sample efficient training of machine learning models | |
CN115146103A (zh) | 图像检索方法、装置、计算机设备、存储介质和程序产品 | |
Chen et al. | Effective and efficient content redundancy detection of web videos | |
CN114896514A (zh) | 一种基于图神经网络的Web API标签推荐方法 | |
CN113901278A (zh) | 一种基于全局多探测和适应性终止的数据搜索方法和装置 | |
da Silva et al. | Audio plugin recommendation systems for music production | |
US10311084B2 (en) | Method and system for constructing a classifier | |
Alshaibanee et al. | A proposed class labeling approach: From unsupervised to supervised learning | |
Kumar et al. | ARSkNN-A k-NN classifier using mass based similarity measure | |
Wang et al. | Efficient sampling of training set in large and noisy multimedia data | |
Athira et al. | An efficient solution for multi-label classification problem using apriori algorithm (MLC-A) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1256518 Country of ref document: HK |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20201020 Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Patentee after: Innovative advanced technology Co.,Ltd. Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Patentee before: Advanced innovation technology Co.,Ltd. Effective date of registration: 20201020 Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands Patentee after: Advanced innovation technology Co.,Ltd. Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands Patentee before: Alibaba Group Holding Ltd. |