HK1259448A1 - 數據分群、分段、以及並行化 - Google Patents

數據分群、分段、以及並行化

Info

Publication number
HK1259448A1
HK1259448A1 HK19101853.3A HK19101853A HK1259448A1 HK 1259448 A1 HK1259448 A1 HK 1259448A1 HK 19101853 A HK19101853 A HK 19101853A HK 1259448 A1 HK1259448 A1 HK 1259448A1
Authority
HK
Hong Kong
Prior art keywords
parallelization
segmentation
data clustering
clustering
data
Prior art date
Application number
HK19101853.3A
Other languages
English (en)
Inventor
阿倫‧安德森
Original Assignee
起元科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 起元科技有限公司 filed Critical 起元科技有限公司
Publication of HK1259448A1 publication Critical patent/HK1259448A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
HK19101853.3A 2011-11-15 2015-02-10 數據分群、分段、以及並行化 HK1259448A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161560257P 2011-11-15 2011-11-15
US201261660259P 2012-06-15 2012-06-15

Publications (1)

Publication Number Publication Date
HK1259448A1 true HK1259448A1 (zh) 2019-11-29

Family

ID=47258118

Family Applications (4)

Application Number Title Priority Date Filing Date
HK15101462.0A HK1200942A1 (zh) 2011-11-15 2015-02-10 數據分群、分段、以及並行化
HK15101463.9A HK1200943A1 (zh) 2011-11-15 2015-02-10 基於變體標記網絡的數據分群
HK19101853.3A HK1259448A1 (zh) 2011-11-15 2015-02-10 數據分群、分段、以及並行化
HK15101522.8A HK1201096A1 (zh) 2011-11-15 2015-02-11 基於候選項查詢的數據分群

Family Applications Before (2)

Application Number Title Priority Date Filing Date
HK15101462.0A HK1200942A1 (zh) 2011-11-15 2015-02-10 數據分群、分段、以及並行化
HK15101463.9A HK1200943A1 (zh) 2011-11-15 2015-02-10 基於變體標記網絡的數據分群

Family Applications After (1)

Application Number Title Priority Date Filing Date
HK15101522.8A HK1201096A1 (zh) 2011-11-15 2015-02-11 基於候選項查詢的數據分群

Country Status (9)

Country Link
US (6) US9037589B2 (zh)
EP (6) EP3591538B1 (zh)
JP (3) JP6113740B2 (zh)
KR (3) KR102029514B1 (zh)
CN (4) CN104054073B (zh)
AU (3) AU2012340418C1 (zh)
CA (4) CA2855710C (zh)
HK (4) HK1200942A1 (zh)
WO (3) WO2013074770A1 (zh)

Families Citing this family (178)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8775441B2 (en) 2008-01-16 2014-07-08 Ab Initio Technology Llc Managing an archive for approximate string matching
CA3014839C (en) 2008-10-23 2019-01-08 Arlen Anderson Fuzzy data operations
US20110153737A1 (en) * 2009-12-17 2011-06-23 Chu Thomas P Method and apparatus for decomposing a peer-to-peer network and using a decomposed peer-to-peer network
US10084856B2 (en) * 2009-12-17 2018-09-25 Wsou Investments, Llc Method and apparatus for locating services within peer-to-peer networks
US8468119B2 (en) * 2010-07-14 2013-06-18 Business Objects Software Ltd. Matching data from disparate sources
EP2727247B1 (en) * 2011-06-30 2017-04-05 Openwave Mobility, Inc. Database compression system and method
WO2013074770A1 (en) 2011-11-15 2013-05-23 Ab Initio Technology Llc Data clustering, segmentation, and parallelization
US8949199B2 (en) * 2011-12-29 2015-02-03 Dell Products L.P. Systems and methods for de-duplication in storage systems
WO2013123097A1 (en) * 2012-02-13 2013-08-22 SkyKick, Inc. Migration project automation, e.g., automated selling, planning, migration and configuration of email systems
US10467322B1 (en) * 2012-03-28 2019-11-05 Amazon Technologies, Inc. System and method for highly scalable data clustering
US20130268526A1 (en) * 2012-04-06 2013-10-10 Mark E. Johns Discovery engine
US9684395B2 (en) * 2012-06-02 2017-06-20 Tara Chand Singhal System and method for context driven voice interface in handheld wireless mobile devices
EP3654200A1 (en) * 2012-08-17 2020-05-20 Twitter, Inc. Search infrastructure
US10223697B2 (en) 2012-08-30 2019-03-05 Oracle International Corporation Method and system for implementing a CRM quote and order capture context service
US9251133B2 (en) 2012-12-12 2016-02-02 International Business Machines Corporation Approximate named-entity extraction
US10949752B1 (en) * 2013-01-30 2021-03-16 Applied Predictive Technologies, Inc. System and method of portfolio matching
US9830353B1 (en) * 2013-02-27 2017-11-28 Google Inc. Determining match type for query tokens
US20140282396A1 (en) * 2013-03-14 2014-09-18 Syntel, Inc. Computerized system and method for extracting business rules from source code
US20140280239A1 (en) * 2013-03-15 2014-09-18 Sas Institute Inc. Similarity determination between anonymized data items
US8844050B1 (en) 2013-03-15 2014-09-23 Athoc, Inc. Personnel crisis communications management and personnel status tracking system
US10803102B1 (en) * 2013-04-30 2020-10-13 Walmart Apollo, Llc Methods and systems for comparing customer records
US9411632B2 (en) * 2013-05-30 2016-08-09 Qualcomm Incorporated Parallel method for agglomerative clustering of non-stationary data
US11093521B2 (en) * 2013-06-27 2021-08-17 Sap Se Just-in-time data quality assessment for best record creation
KR20150020385A (ko) * 2013-08-13 2015-02-26 에스케이하이닉스 주식회사 데이터 저장 장치, 그것의 동작 방법 및 그것을 포함하는 데이터 처리 시스템
CA2921245C (en) 2013-09-20 2023-08-22 Fulcrum Management Solutions Ltd. Processing qualitative responses
CN103455641B (zh) * 2013-09-29 2017-02-22 北大医疗信息技术有限公司 交叉多次检索的系统和方法
US8831969B1 (en) * 2013-10-02 2014-09-09 Linkedin Corporation System and method for determining users working for the same employers in a social network
US10043182B1 (en) * 2013-10-22 2018-08-07 Ondot System, Inc. System and method for using cardholder context and preferences in transaction authorization
US10423890B1 (en) 2013-12-12 2019-09-24 Cigna Intellectual Property, Inc. System and method for synthesizing data
US10685037B2 (en) 2013-12-18 2020-06-16 Amazon Technology, Inc. Volume cohorts in object-redundant storage systems
CA2934041C (en) * 2013-12-18 2021-04-13 Amazon Technologies, Inc. Reconciling volumelets in volume cohorts
US10620830B2 (en) 2013-12-18 2020-04-14 Amazon Technologies, Inc. Reconciling volumelets in volume cohorts
US10026114B2 (en) * 2014-01-10 2018-07-17 Betterdoctor, Inc. System for clustering and aggregating data from multiple sources
US10055747B1 (en) * 2014-01-20 2018-08-21 Acxiom Corporation Consumer Portal
US9690844B2 (en) * 2014-01-24 2017-06-27 Samsung Electronics Co., Ltd. Methods and systems for customizable clustering of sub-networks for bioinformatics and health care applications
US9779146B2 (en) * 2014-02-07 2017-10-03 Sap Se Graphical user interface for a data record matching application
US20150269700A1 (en) 2014-03-24 2015-09-24 Athoc, Inc. Exchange of crisis-related information amongst multiple individuals and multiple organizations
US9268597B2 (en) * 2014-04-01 2016-02-23 Google Inc. Incremental parallel processing of data
US10482490B2 (en) 2014-04-09 2019-11-19 Sailthru, Inc. Behavioral tracking system and method in support of high-engagement communications
US20150348052A1 (en) * 2014-05-30 2015-12-03 Sachin Rekhi Crm-based discovery of contacts and accounts
WO2015192106A1 (en) * 2014-06-12 2015-12-17 Shpanya Arie Real-time dynamic pricing system
US20150379033A1 (en) * 2014-06-27 2015-12-31 International Business Machines Corporation Parallel matching of hierarchical records
US10318983B2 (en) * 2014-07-18 2019-06-11 Facebook, Inc. Expansion of targeting criteria based on advertisement performance
US10528981B2 (en) 2014-07-18 2020-01-07 Facebook, Inc. Expansion of targeting criteria using an advertisement performance metric to maintain revenue
US20160019284A1 (en) * 2014-07-18 2016-01-21 Linkedln Corporation Search engine using name clustering
US10296616B2 (en) 2014-07-31 2019-05-21 Splunk Inc. Generation of a search query to approximate replication of a cluster of events
US9922290B2 (en) * 2014-08-12 2018-03-20 Microsoft Technology Licensing, Llc Entity resolution incorporating data from various data sources which uses tokens and normalizes records
US10614912B2 (en) * 2014-08-17 2020-04-07 Hyperfine, Llc Systems and methods for comparing networks, determining underlying forces between the networks, and forming new metaclusters when saturation is met
US20160062979A1 (en) * 2014-08-27 2016-03-03 Google Inc. Word classification based on phonetic features
WO2016048295A1 (en) * 2014-09-24 2016-03-31 Hewlett Packard Enterprise Development Lp Assigning a document to partial membership in communities
US11461319B2 (en) * 2014-10-06 2022-10-04 Business Objects Software, Ltd. Dynamic database query efficiency improvement
US9600548B2 (en) * 2014-10-10 2017-03-21 Salesforce.Com Row level security integration of analytical data store with cloud architecture
JP6050800B2 (ja) * 2014-10-28 2016-12-21 Necパーソナルコンピュータ株式会社 情報処理装置、方法及びプログラム
CN105701118B (zh) 2014-11-28 2019-05-28 国际商业机器公司 用于归一化文件的非数值特征的方法和装置
US9483546B2 (en) * 2014-12-15 2016-11-01 Palantir Technologies Inc. System and method for associating related records to common entities across multiple lists
US9727906B1 (en) * 2014-12-15 2017-08-08 Amazon Technologies, Inc. Generating item clusters based on aggregated search history data
JP6129815B2 (ja) * 2014-12-24 2017-05-17 Necパーソナルコンピュータ株式会社 情報処理装置、方法及びプログラム
US20160239499A1 (en) * 2015-02-12 2016-08-18 Red Hat, Inc. Object Creation Based on Copying Objects Corresponding to Similar Entities
US10339502B2 (en) * 2015-04-06 2019-07-02 Adp, Llc Skill analyzer
US10742731B2 (en) 2015-06-10 2020-08-11 International Business Machines Corporation Maintaining service configuration consistency across nodes of a clustered file system
US9940213B2 (en) 2015-06-10 2018-04-10 International Business Machines Corporation Integrating external services with a clustered file system
WO2017015751A1 (en) * 2015-07-24 2017-02-02 Fulcrum Management Solutions Ltd. Processing qualitative responses and visualization generation
US10140327B2 (en) 2015-08-24 2018-11-27 Palantir Technologies Inc. Feature clustering of users, user correlation database access, and user interface generation system
US10417337B2 (en) 2015-09-02 2019-09-17 Canon Kabushiki Kaisha Devices, systems, and methods for resolving named entities
US11392582B2 (en) * 2015-10-15 2022-07-19 Sumo Logic, Inc. Automatic partitioning
US10783268B2 (en) 2015-11-10 2020-09-22 Hewlett Packard Enterprise Development Lp Data allocation based on secure information retrieval
US10242021B2 (en) * 2016-01-12 2019-03-26 International Business Machines Corporation Storing data deduplication metadata in a grid of processors
US10261946B2 (en) 2016-01-12 2019-04-16 International Business Machines Corporation Rebalancing distributed metadata
US10255288B2 (en) * 2016-01-12 2019-04-09 International Business Machines Corporation Distributed data deduplication in a grid of processors
WO2017197526A1 (en) 2016-05-20 2017-11-23 Roman Czeslaw Kordasiewicz Systems and methods for graphical exploration of forensic data
US10740409B2 (en) 2016-05-20 2020-08-11 Magnet Forensics Inc. Systems and methods for graphical exploration of forensic data
JP6072334B1 (ja) * 2016-06-09 2017-02-01 株式会社Cygames 情報処理システム及び方法、並びにプログラム
US20180025093A1 (en) * 2016-07-21 2018-01-25 Ayasdi, Inc. Query capabilities of topological data analysis graphs
US11023475B2 (en) * 2016-07-22 2021-06-01 International Business Machines Corporation Testing pairings to determine whether they are publically known
US10558669B2 (en) * 2016-07-22 2020-02-11 National Student Clearinghouse Record matching system
US11106692B1 (en) * 2016-08-04 2021-08-31 Palantir Technologies Inc. Data record resolution and correlation system
CN106875167B (zh) * 2016-08-18 2020-08-04 阿里巴巴集团控股有限公司 电子支付过程中资金交易路径的检测方法和装置
US10650008B2 (en) * 2016-08-26 2020-05-12 International Business Machines Corporation Parallel scoring of an ensemble model
US10817540B2 (en) 2016-09-02 2020-10-27 Snowflake Inc. Incremental clustering maintenance of a table
US11080301B2 (en) * 2016-09-28 2021-08-03 Hewlett Packard Enterprise Development Lp Storage allocation based on secure data comparisons via multiple intermediaries
US20180096018A1 (en) * 2016-09-30 2018-04-05 Microsoft Technology Licensing, Llc Reducing processing for comparing large metadata sets
WO2018067467A1 (en) 2016-10-03 2018-04-12 Ocient Llc Infrastructure improvements for use in a massively parallel database management system
US10127268B2 (en) * 2016-10-07 2018-11-13 Microsoft Technology Licensing, Llc Repairing data through domain knowledge
US10713316B2 (en) 2016-10-20 2020-07-14 Microsoft Technology Licensing, Llc Search engine using name clustering
US10585864B2 (en) 2016-11-11 2020-03-10 International Business Machines Corporation Computing the need for standardization of a set of values
US10353928B2 (en) * 2016-11-30 2019-07-16 International Business Machines Corporation Real-time clustering using multiple representatives from a cluster
US11205103B2 (en) 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis
EP3336691B1 (en) 2016-12-13 2022-04-06 ARM Limited Replicate elements instruction
EP3336692B1 (en) 2016-12-13 2020-04-29 Arm Ltd Replicate partition instruction
US10902070B2 (en) 2016-12-15 2021-01-26 Microsoft Technology Licensing, Llc Job search based on member transitions from educational institution to company
US10671757B1 (en) * 2016-12-22 2020-06-02 Allscripts Software, Llc Converting an alphanumerical character string into a signature
US20180181646A1 (en) * 2016-12-26 2018-06-28 Infosys Limited System and method for determining identity relationships among enterprise data entities
US20180203917A1 (en) * 2017-01-19 2018-07-19 Acquire Media Ventures Inc. Discovering data similarity groups in linear time for data science applications
US10679187B2 (en) 2017-01-30 2020-06-09 Microsoft Technology Licensing, Llc Job search with categorized results
US10783497B2 (en) 2017-02-21 2020-09-22 Microsoft Technology Licensing, Llc Job posting data search based on intercompany worker migration
US11138269B1 (en) 2017-03-14 2021-10-05 Wells Fargo Bank, N.A. Optimizing database query processes with supervised independent autonomy through a dynamically scaling matching and priority engine
US10803064B1 (en) * 2017-03-14 2020-10-13 Wells Fargo Bank, N.A. System and method for dynamic scaling and modification of a rule-based matching and prioritization engine
US11010675B1 (en) 2017-03-14 2021-05-18 Wells Fargo Bank, N.A. Machine learning integration for a dynamically scaling matching and prioritization engine
KR102594625B1 (ko) * 2017-03-19 2023-10-25 오펙-에슈콜롯 리서치 앤드 디벨롭먼트 엘티디 K-부정합 검색을 위한 필터를 생성하는 시스템 및 방법
US10607189B2 (en) 2017-04-04 2020-03-31 Microsoft Technology Licensing, Llc Ranking job offerings based on growth potential within a company
US20180315019A1 (en) * 2017-04-27 2018-11-01 Linkedin Corporation Multinodal job-search control system
US11640436B2 (en) * 2017-05-15 2023-05-02 Ebay Inc. Methods and systems for query segmentation
US10740338B2 (en) * 2017-07-23 2020-08-11 International Business Machines Corporation Systems and methods for query performance prediction using reference lists
US9934287B1 (en) * 2017-07-25 2018-04-03 Capital One Services, Llc Systems and methods for expedited large file processing
US20190034475A1 (en) * 2017-07-28 2019-01-31 Enigma Technologies, Inc. System and method for detecting duplicate data records
EP3460808A1 (en) * 2017-09-21 2019-03-27 Koninklijke Philips N.V. Determining patient status based on measurable medical characteristics
US11475209B2 (en) 2017-10-17 2022-10-18 Handycontract Llc Device, system, and method for extracting named entities from sectioned documents
WO2019077405A1 (en) * 2017-10-17 2019-04-25 Handycontract, LLC METHOD, DEVICE AND SYSTEM FOR IDENTIFYING DATA ELEMENTS IN DATA STRUCTURES
US11250040B2 (en) * 2017-10-19 2022-02-15 Capital One Services, Llc Systems and methods for extracting information from a text string generated in a distributed computing operation
US11429642B2 (en) 2017-11-01 2022-08-30 Walmart Apollo, Llc Systems and methods for dynamic hierarchical metadata storage and retrieval
US10839018B2 (en) * 2017-11-15 2020-11-17 International Business Machines Corporation Evaluation of plural expressions corresponding to input data
US10910112B2 (en) 2017-12-04 2021-02-02 Koninklijke Philips N.V. Apparatus for patient record identification
US11061811B2 (en) * 2017-12-15 2021-07-13 International Business Machines Corporation Optimizing software testing via group testing
CN110019274B (zh) 2017-12-29 2023-09-26 阿里巴巴集团控股有限公司 一种数据库系统以及查询数据库的方法和装置
US10579707B2 (en) * 2017-12-29 2020-03-03 Konica Minolta Laboratory U.S.A., Inc. Method for inferring blocks of text in electronic documents
US10817542B2 (en) 2018-02-28 2020-10-27 Acronis International Gmbh User clustering based on metadata analysis
US10956610B2 (en) * 2018-03-06 2021-03-23 Micro Focus Llc Cycle walking-based tokenization
US10719375B2 (en) * 2018-03-13 2020-07-21 Servicenow, Inc. Systems and method for event parsing
US11182395B2 (en) * 2018-05-15 2021-11-23 International Business Machines Corporation Similarity matching systems and methods for record linkage
US11244013B2 (en) * 2018-06-01 2022-02-08 International Business Machines Corporation Tracking the evolution of topic rankings from contextual data
US11106675B2 (en) * 2018-06-12 2021-08-31 Atos Syntel Inc. System and method for identifying optimal test cases for software development
US11263202B2 (en) 2018-11-30 2022-03-01 Microsoft Technology Licensing, Llc Scalable implementations of exact distinct counts and multiple exact distinct counts in distributed query processing systems
US11321359B2 (en) * 2019-02-20 2022-05-03 Tamr, Inc. Review and curation of record clustering changes at large scale
US10740347B1 (en) * 2019-03-04 2020-08-11 Capital One Services, Llc Methods and systems for determining sets and subsets of parametric data
US10922337B2 (en) * 2019-04-30 2021-02-16 Amperity, Inc. Clustering of data records with hierarchical cluster IDs
US11003643B2 (en) * 2019-04-30 2021-05-11 Amperity, Inc. Multi-level conflict-free entity clusterings
US11586659B2 (en) * 2019-05-03 2023-02-21 Servicenow, Inc. Clustering and dynamic re-clustering of similar textual documents
US11651032B2 (en) 2019-05-03 2023-05-16 Servicenow, Inc. Determining semantic content of textual clusters
CN110162672B (zh) * 2019-05-10 2021-07-27 上海赜睿信息科技有限公司 数据处理方法及装置、电子设备和可读存储介质
US11321771B1 (en) * 2019-06-03 2022-05-03 Intuit Inc. System and method for detecting unseen overdraft transaction events
US11042555B1 (en) * 2019-06-28 2021-06-22 Bottomline Technologies, Inc. Two step algorithm for non-exact matching of large datasets
WO2021079230A1 (ja) * 2019-10-25 2021-04-29 株式会社半導体エネルギー研究所 文書検索システム
EP4057585A4 (en) * 2019-12-06 2022-12-28 Huawei Cloud Computing Technologies Co., Ltd. EDGE SYSTEM AND PROCEDURES FOR PROCESSING DATA OPERATION REQUESTS
JP2021097353A (ja) * 2019-12-18 2021-06-24 キヤノン株式会社 データ送信装置、データ送信装置の制御方法、及びプログラム
CN111064796B (zh) * 2019-12-19 2023-03-24 北京明略软件系统有限公司 伴随关系的分析方法及装置、分析模型的训练方法
US11405482B2 (en) * 2020-02-15 2022-08-02 Near Intelligence Holdings, Inc. Method for linking identifiers to generate a unique entity identifier for deduplicating high-speed data streams in real time
US11176137B2 (en) * 2020-02-19 2021-11-16 Bank Of America Corporation Query processing platform for performing dynamic cluster compaction and expansion
US11768824B2 (en) 2020-03-31 2023-09-26 Wipro Limited Method and system for performing real-time data validation
TWI722859B (zh) * 2020-04-07 2021-03-21 中華誠信資產管理顧問股份有限公司 不動產估價比較案例篩選方法及其系統
US11442990B2 (en) 2020-04-08 2022-09-13 Liveramp, Inc. Asserted relationship data structure
EP4088217A4 (en) * 2020-05-18 2023-09-06 Google LLC INFERENCE PROCESSES FOR SEGMENTATION INTO WORDS OR PARTS OF WORDS
US11201737B1 (en) * 2020-05-19 2021-12-14 Acronis International Gmbh Systems and methods for generating tokens using secure multiparty computation engines
US20230230707A1 (en) * 2020-06-10 2023-07-20 Koninklijke Philips N.V. Methods and systems for searching an ecg database
KR102199704B1 (ko) * 2020-06-26 2021-01-08 주식회사 이스트시큐리티 다중 백신의 탐지명으로부터 대표 토큰을 선정하기 위한 장치, 이를 위한 방법 및 이 방법을 수행하기 위한 프로그램이 기록된 컴퓨터 판독 가능한 기록매체
WO2022006151A1 (en) * 2020-06-29 2022-01-06 6Sense Insights, Inc. Aggregation of noisy datasets into master firmographic database
US11720601B2 (en) * 2020-07-02 2023-08-08 Sap Se Active entity resolution model recommendation system
US11615094B2 (en) 2020-08-12 2023-03-28 Hcl Technologies Limited System and method for joining skewed datasets in a distributed computing environment
EP4204979A4 (en) * 2020-09-30 2024-10-02 Liveramp Inc SYSTEM AND METHOD FOR MATCHING IN A COMPLEX DATA SET
US20220114624A1 (en) * 2020-10-09 2022-04-14 Adobe Inc. Digital Content Text Processing and Review Techniques
CN112990654B (zh) * 2021-02-03 2021-11-02 北京大学 基于人口流动数据的城乡基础设施系统协同规划方法
US11783269B1 (en) 2021-02-05 2023-10-10 Palantir Technologies Inc. Systems and methods for rule management
EP4054145B1 (en) * 2021-03-05 2024-01-10 Cédric Iggiotti Document-based access control system
CN112948943B (zh) * 2021-03-22 2022-11-18 西南交通大学 格栅式地下连续墙基础的OpenSees软件的前、后处理方法
CN113064870B (zh) * 2021-03-22 2021-11-30 中国人民大学 一种基于压缩数据直接计算的大数据处理方法
KR20220134328A (ko) 2021-03-26 2022-10-05 주식회사 팬스컴스 콘텐츠 저작권을 생성하는 사이니지 장치
US20220335075A1 (en) * 2021-04-14 2022-10-20 International Business Machines Corporation Finding expressions in texts
US20220342909A1 (en) * 2021-04-22 2022-10-27 Salesforce.Com, Inc. Evaluating clustering in case of data stewardship actions
US12020170B2 (en) * 2021-05-24 2024-06-25 Liveperson, Inc. Systems and methods for intent discovery and process execution
US11687559B1 (en) * 2021-06-09 2023-06-27 Morgan Stanley Services Group, Inc. Computer systems and methods for reconciling data across data sources
US20220414171A1 (en) * 2021-06-28 2022-12-29 Flipkart Internet Private Limited System and method for generating a user query based on a target context aware token
US11693821B2 (en) * 2021-07-07 2023-07-04 Collibra Belgium Bv Systems and methods for performant data matching
US11848824B2 (en) * 2021-07-23 2023-12-19 Vmware, Inc. Distributed auto discovery service
US20230034741A1 (en) * 2021-07-28 2023-02-02 Palo Alto Networks, Inc. Token frequency based data matching
US11630855B2 (en) * 2021-08-04 2023-04-18 Capital One Services, Llc Variable density-based clustering on data streams
US20230052619A1 (en) * 2021-08-10 2023-02-16 Intuit Inc. Real-time error prevention during invoice creation
US11841965B2 (en) * 2021-08-12 2023-12-12 EMC IP Holding Company LLC Automatically assigning data protection policies using anonymized analytics
US20240070321A1 (en) * 2021-08-12 2024-02-29 EMC IP Holding Company LLC Automatically creating data protection roles using anonymized analytics
US11841769B2 (en) * 2021-08-12 2023-12-12 EMC IP Holding Company LLC Leveraging asset metadata for policy assignment
US11704312B2 (en) * 2021-08-19 2023-07-18 Microsoft Technology Licensing, Llc Conjunctive filtering with embedding models
US11934468B2 (en) 2021-09-16 2024-03-19 Microsoft Tech nology Licensing, LLC Content distribution control
US11803569B2 (en) * 2021-10-05 2023-10-31 Procore Technologies, Inc. Computer system and method for accessing user data that is distributed within a multi-zone computing platform
AU2022396138A1 (en) * 2021-11-24 2024-06-06 Visa International Service Association Method, system, and computer program product for community detection
JP2023086507A (ja) * 2021-12-10 2023-06-22 キオクシア株式会社 情報処理装置および方法
US20230297623A1 (en) * 2022-03-17 2023-09-21 Yext, Inc. Multi-record projection search platform
USD1032628S1 (en) * 2022-03-18 2024-06-25 Ab Initio Technology Llc Display panel portion with an animated computer icon
US11983162B2 (en) 2022-04-26 2024-05-14 Truist Bank Change management process for identifying potential regulatory violations for improved processing efficiency
US20240121154A1 (en) * 2022-09-30 2024-04-11 Intuit Inc. Modeling and managing affinity networks
US12026140B1 (en) 2023-02-21 2024-07-02 Snowflake Inc. Performance indexing of production databases

Family Cites Families (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02129756A (ja) 1988-11-10 1990-05-17 Nippon Telegr & Teleph Corp <Ntt> 単語照合装置
US5179643A (en) 1988-12-23 1993-01-12 Hitachi, Ltd. Method of multi-dimensional analysis and display for a large volume of record information items and a system therefor
US5388259A (en) 1992-05-15 1995-02-07 Bell Communications Research, Inc. System for accessing a database with an iterated fuzzy query notified by retrieval response
JPH0644309A (ja) 1992-07-01 1994-02-18 Nec Corp データベース管理方式
JPH0944518A (ja) 1995-08-02 1997-02-14 Adoin Kenkyusho:Kk 画像データベースの構築方法と、画像データベースの検索方法及び検索装置
US5832182A (en) 1996-04-24 1998-11-03 Wisconsin Alumni Research Foundation Method and system for data clustering for very large databases
JPH10275159A (ja) 1997-03-31 1998-10-13 Nippon Telegr & Teleph Corp <Ntt> 情報検索方法及び装置
US6026398A (en) 1997-10-16 2000-02-15 Imarket, Incorporated System and methods for searching and matching databases
JPH11184884A (ja) 1997-12-24 1999-07-09 Ntt Data Corp 同一人判定システムおよび方法
US6581058B1 (en) 1998-05-22 2003-06-17 Microsoft Corporation Scalable system for clustering of large databases having mixed data attributes
US6285995B1 (en) * 1998-06-22 2001-09-04 U.S. Philips Corporation Image retrieval system using a query image
US6742003B2 (en) 2001-04-30 2004-05-25 Microsoft Corporation Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications
JP2000029899A (ja) 1998-07-14 2000-01-28 Hitachi Software Eng Co Ltd 建物と地図とのマッチング方法および記録媒体
US6493709B1 (en) 1998-07-31 2002-12-10 The Regents Of The University Of California Method and apparatus for digitally shredding similar documents within large document sets in a data processing environment
US6658626B1 (en) 1998-07-31 2003-12-02 The Regents Of The University Of California User interface for displaying document comparison information
US7356462B2 (en) 2001-07-26 2008-04-08 At&T Corp. Automatic clustering of tokens from a corpus for grammar acquisition
US6317707B1 (en) * 1998-12-07 2001-11-13 At&T Corp. Automatic clustering of tokens from a corpus for grammar acquisition
US6456995B1 (en) 1998-12-31 2002-09-24 International Business Machines Corporation System, method and computer program products for ordering objects corresponding to database operations that are performed on a relational database upon completion of a transaction by an object-oriented transaction system
AU780926B2 (en) 1999-08-03 2005-04-28 Bally Technologies, Inc. Method and system for matching data sets
AU1051101A (en) 1999-10-27 2001-05-08 Zapper Technologies Inc. Context-driven information retrieval
US7328211B2 (en) 2000-09-21 2008-02-05 Jpmorgan Chase Bank, N.A. System and methods for improved linguistic pattern matching
DE10048478C2 (de) 2000-09-29 2003-05-28 Siemens Ag Verfahren zum Zugriff auf eine Speichereinheit bei der Suche nach Teilzeichenfolgen
US6931390B1 (en) 2001-02-27 2005-08-16 Oracle International Corporation Method and mechanism for database partitioning
JP3605052B2 (ja) 2001-06-20 2004-12-22 本田技研工業株式会社 あいまい検索機能を備える図面管理システム
US20030033138A1 (en) 2001-07-26 2003-02-13 Srinivas Bangalore Method for partitioning a data set into frequency vectors for clustering
US20030041047A1 (en) 2001-08-09 2003-02-27 International Business Machines Corporation Concept-based system for representing and processing multimedia objects with arbitrary constraints
US7043647B2 (en) * 2001-09-28 2006-05-09 Hewlett-Packard Development Company, L.P. Intelligent power management for a rack of servers
US7213025B2 (en) 2001-10-16 2007-05-01 Ncr Corporation Partitioned database system
US20030120630A1 (en) * 2001-12-20 2003-06-26 Daniel Tunkelang Method and system for similarity search and clustering
AU2003210795A1 (en) * 2002-02-01 2003-09-02 John Fairweather System and method for analyzing data
CA2475319A1 (en) 2002-02-04 2003-08-14 Cataphora, Inc. A method and apparatus to visually present discussions for data mining purposes
WO2003107321A1 (en) 2002-06-12 2003-12-24 Jena Jordahl Data storage, retrieval, manipulation and display tools enabling multiple hierarchical points of view
US6961721B2 (en) * 2002-06-28 2005-11-01 Microsoft Corporation Detecting duplicate records in database
US20050226511A1 (en) * 2002-08-26 2005-10-13 Short Gordon K Apparatus and method for organizing and presenting content
US7043476B2 (en) 2002-10-11 2006-05-09 International Business Machines Corporation Method and apparatus for data mining to discover associations and covariances associated with data
US20040139072A1 (en) 2003-01-13 2004-07-15 Broder Andrei Z. System and method for locating similar records in a database
US7912842B1 (en) 2003-02-04 2011-03-22 Lexisnexis Risk Data Management Inc. Method and system for processing and linking data records
US7287019B2 (en) * 2003-06-04 2007-10-23 Microsoft Corporation Duplicate data elimination system
US20050120011A1 (en) 2003-11-26 2005-06-02 Word Data Corp. Code, method, and system for manipulating texts
US7526464B2 (en) 2003-11-28 2009-04-28 Manyworlds, Inc. Adaptive fuzzy network system and method
US7283999B1 (en) 2003-12-19 2007-10-16 Ncr Corp. Similarity string filtering
US7472113B1 (en) * 2004-01-26 2008-12-30 Microsoft Corporation Query preprocessing and pipelining
GB0413743D0 (en) * 2004-06-19 2004-07-21 Ibm Method and system for approximate string matching
US8407239B2 (en) 2004-08-13 2013-03-26 Google Inc. Multi-stage query processing system and method for use with tokenspace repository
US7917480B2 (en) * 2004-08-13 2011-03-29 Google Inc. Document compression system and method for use with tokenspace repository
US20080040342A1 (en) * 2004-09-07 2008-02-14 Hust Robert M Data processing apparatus and methods
US7523098B2 (en) 2004-09-15 2009-04-21 International Business Machines Corporation Systems and methods for efficient data searching, storage and reduction
US8725705B2 (en) 2004-09-15 2014-05-13 International Business Machines Corporation Systems and methods for searching of storage data with reduced bandwidth requirements
US8224830B2 (en) 2005-03-19 2012-07-17 Activeprime, Inc. Systems and methods for manipulation of inexact semi-structured data
US9110985B2 (en) 2005-05-10 2015-08-18 Neetseer, Inc. Generating a conceptual association graph from large-scale loosely-grouped content
JP2007012039A (ja) * 2005-05-31 2007-01-18 Itochu Techno-Science Corp 検索システムおよびコンピュータプログラム
US7584205B2 (en) 2005-06-27 2009-09-01 Ab Initio Technology Llc Aggregating data with complex operations
US7672833B2 (en) 2005-09-22 2010-03-02 Fair Isaac Corporation Method and apparatus for automatic entity disambiguation
US7454449B2 (en) * 2005-12-20 2008-11-18 International Business Machines Corporation Method for reorganizing a set of database partitions
US20070162506A1 (en) * 2006-01-12 2007-07-12 International Business Machines Corporation Method and system for performing a redistribute transparently in a multi-node system
US7516279B2 (en) * 2006-02-28 2009-04-07 International Business Machines Corporation Method using stream prefetching history to improve data prefetching performance.
US20070244925A1 (en) 2006-04-12 2007-10-18 Jean-Francois Albouze Intelligent image searching
US7890533B2 (en) 2006-05-17 2011-02-15 Noblis, Inc. Method and system for information extraction and modeling
US7809769B2 (en) * 2006-05-18 2010-10-05 Google Inc. Database partitioning by virtual partitions
US8175875B1 (en) 2006-05-19 2012-05-08 Google Inc. Efficient indexing of documents with similar content
US7634464B2 (en) 2006-06-14 2009-12-15 Microsoft Corporation Designing record matching queries utilizing examples
US20080140653A1 (en) 2006-12-08 2008-06-12 Matzke Douglas J Identifying Relationships Among Database Records
US7630972B2 (en) 2007-01-05 2009-12-08 Yahoo! Inc. Clustered search processing
US7739247B2 (en) * 2006-12-28 2010-06-15 Ebay Inc. Multi-pass data organization and automatic naming
WO2008083504A1 (en) * 2007-01-10 2008-07-17 Nick Koudas Method and system for information discovery and text analysis
US8694472B2 (en) * 2007-03-14 2014-04-08 Ca, Inc. System and method for rebuilding indices for partitioned databases
US7711747B2 (en) * 2007-04-06 2010-05-04 Xerox Corporation Interactive cleaning for automatic document clustering and categorization
US8069129B2 (en) 2007-04-10 2011-11-29 Ab Initio Technology Llc Editing and compiling business rules
WO2008146456A1 (ja) 2007-05-28 2008-12-04 Panasonic Corporation 情報探索支援方法および情報探索支援装置
CN101079896B (zh) * 2007-06-22 2010-05-19 西安交通大学 一种构建并行存储系统多可用性机制并存架构的方法
US7769778B2 (en) 2007-06-29 2010-08-03 United States Postal Service Systems and methods for validating an address
US7788276B2 (en) 2007-08-22 2010-08-31 Yahoo! Inc. Predictive stemming for web search with statistical machine translation models
US7925652B2 (en) 2007-12-31 2011-04-12 Mastercard International Incorporated Methods and systems for implementing approximate string matching within a database
US8775441B2 (en) * 2008-01-16 2014-07-08 Ab Initio Technology Llc Managing an archive for approximate string matching
US8032546B2 (en) 2008-02-15 2011-10-04 Microsoft Corp. Transformation-based framework for record matching
US8266168B2 (en) * 2008-04-24 2012-09-11 Lexisnexis Risk & Information Analytics Group Inc. Database systems and methods for linking records and entity representations with sufficiently high confidence
US7958125B2 (en) * 2008-06-26 2011-06-07 Microsoft Corporation Clustering aggregator for RSS feeds
US20120191973A1 (en) 2008-09-10 2012-07-26 National Ict Australia Limited Online presence of users
US8150169B2 (en) * 2008-09-16 2012-04-03 Viewdle Inc. System and method for object clustering and identification in video
CA3014839C (en) 2008-10-23 2019-01-08 Arlen Anderson Fuzzy data operations
CN101751400A (zh) * 2008-12-09 2010-06-23 财团法人工业技术研究院 技术数据分析的系统与方法以及专利分析的系统
US20100169311A1 (en) 2008-12-30 2010-07-01 Ashwin Tengli Approaches for the unsupervised creation of structural templates for electronic documents
JP5173898B2 (ja) 2009-03-11 2013-04-03 キヤノン株式会社 画像処理方法、画像処理装置、及びプログラム
US8161048B2 (en) 2009-04-24 2012-04-17 At&T Intellectual Property I, L.P. Database analysis using clusters
US20100274770A1 (en) 2009-04-24 2010-10-28 Yahoo! Inc. Transductive approach to category-specific record attribute extraction
CN102067128A (zh) * 2009-04-27 2011-05-18 松下电器产业株式会社 数据处理装置、数据处理方法、程序及集成电路
US8195626B1 (en) * 2009-06-18 2012-06-05 Amazon Technologies, Inc. Compressing token-based files for transfer and reconstruction
US8285681B2 (en) * 2009-06-30 2012-10-09 Commvault Systems, Inc. Data object store and server for a cloud storage environment, including data deduplication and data management across multiple cloud storage sites
US8572084B2 (en) * 2009-07-28 2013-10-29 Fti Consulting, Inc. System and method for displaying relationships between electronically stored information to provide classification suggestions via nearest neighbor
US8429179B1 (en) * 2009-12-16 2013-04-23 Board Of Regents, The University Of Texas System Method and system for ontology driven data collection and processing
CN101727502A (zh) * 2010-01-25 2010-06-09 中兴通讯股份有限公司 一种数据查询方法及装置、系统
US8375061B2 (en) 2010-06-08 2013-02-12 International Business Machines Corporation Graphical models for representing text documents for computer analysis
US8346772B2 (en) * 2010-09-16 2013-01-01 International Business Machines Corporation Systems and methods for interactive clustering
US8463742B1 (en) * 2010-09-17 2013-06-11 Permabit Technology Corp. Managing deduplication of stored data
US8606771B2 (en) 2010-12-21 2013-12-10 Microsoft Corporation Efficient indexing of error tolerant set containment
US9535954B2 (en) 2011-02-02 2017-01-03 Nec Corporation Join processing device, data management device, and string similarity join system
US8612386B2 (en) 2011-02-11 2013-12-17 Alcatel Lucent Method and apparatus for peer-to-peer database synchronization in dynamic networks
WO2013074770A1 (en) * 2011-11-15 2013-05-23 Ab Initio Technology Llc Data clustering, segmentation, and parallelization

Also Published As

Publication number Publication date
US20130124474A1 (en) 2013-05-16
CN108388632A (zh) 2018-08-10
WO2013074770A1 (en) 2013-05-23
EP3855321A1 (en) 2021-07-28
CA2855710A1 (en) 2013-05-23
CN104054074B (zh) 2019-03-08
AU2012340418B2 (en) 2017-06-01
AU2012340429A1 (en) 2014-05-29
CN104040544A (zh) 2014-09-10
AU2012340418C1 (en) 2017-11-16
CA3098038C (en) 2022-11-29
EP3432169B1 (en) 2021-02-24
HK1200942A1 (zh) 2015-08-14
EP2780836A1 (en) 2014-09-24
US20200356579A1 (en) 2020-11-12
EP2780833A1 (en) 2014-09-24
AU2012340429B2 (en) 2016-12-01
JP2014533417A (ja) 2014-12-11
EP3591538A1 (en) 2020-01-08
WO2013074781A1 (en) 2013-05-23
US20160283574A1 (en) 2016-09-29
KR102031392B1 (ko) 2019-11-08
KR20140094003A (ko) 2014-07-29
US20200320102A1 (en) 2020-10-08
CN108388632B (zh) 2021-11-19
CA3098038A1 (en) 2013-05-23
KR20140096127A (ko) 2014-08-04
US10503755B2 (en) 2019-12-10
US9037589B2 (en) 2015-05-19
KR20140094002A (ko) 2014-07-29
JP6190817B2 (ja) 2017-08-30
CA2855715C (en) 2019-02-19
AU2012340423B2 (en) 2017-02-09
EP2780835B1 (en) 2019-08-07
JP6113740B2 (ja) 2017-04-12
HK1200943A1 (zh) 2015-08-14
CA2855715A1 (en) 2013-05-23
JP2014533408A (ja) 2014-12-11
JP2014533409A (ja) 2014-12-11
AU2012340418A1 (en) 2014-05-29
EP3432169A1 (en) 2019-01-23
WO2013074774A4 (en) 2013-08-29
KR102029514B1 (ko) 2019-10-07
US9361355B2 (en) 2016-06-07
EP3591538B1 (en) 2021-01-20
US20130124525A1 (en) 2013-05-16
US10572511B2 (en) 2020-02-25
KR102048597B1 (ko) 2019-11-25
HK1201096A1 (zh) 2015-08-21
CN104054073A (zh) 2014-09-17
CA2855710C (en) 2020-03-10
CA2855701C (en) 2021-01-12
CA2855701A1 (en) 2013-05-23
CN104054073B (zh) 2018-10-30
CN104054074A (zh) 2014-09-17
EP2780835A1 (en) 2014-09-24
JP6125520B2 (ja) 2017-05-10
WO2013074774A1 (en) 2013-05-23
CN104040544B (zh) 2018-06-26
AU2012340423A1 (en) 2014-05-29
US20130124524A1 (en) 2013-05-16

Similar Documents

Publication Publication Date Title
HK1259448A1 (zh) 數據分群、分段、以及並行化
PL2778983T3 (pl) Grupowanie danych
HK1204112A1 (zh) 文檔分類系統,文檔分類方法及文檔分類程序
GB201401147D0 (en) Information identification method, program and system
EP2786284A4 (en) GROUPING EVENT DATA BY MULTIPLE TIME DIMENSIONS
GB201404457D0 (en) Data clustering
EP2750110A4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING PROCESS AND PROGRAM
EP2813963A4 (en) INFORMATION PROCESSING SYSTEM
EP2701117A4 (en) INFORMATION PROCESSING DEVICE AND METHOD AND PROGRAM
EP2690401A4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
EP2901144A4 (en) METHOD OF PROCESSING INFORMATION
EP2786221A4 (en) CLASSIFICATION OF ATTRIBUTE DATA INTERVALS
EP2911388A4 (en) INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND PROGRAM
EP2731024A4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
HK1213352A1 (zh) 可配置的訂單錄入、配套、協調和市場數據間隔
GB201409109D0 (en) Device, program and method for clustering documents
EP2873009A4 (en) MULTILINGUAL DOCUMENT CLUSTERING
EP2749066A4 (en) PROCESSING STATUS INFORMATION
EP2722733A4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
EP2725755A4 (en) DATA CARD AND ACTIVATION METHOD THEREFOR
GB201114418D0 (en) Data processing
GB201403292D0 (en) Information search system, method and program
EP2722735A4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
EP2717223A4 (en) INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM
EP2691863A4 (en) INFORMATION PROCESSING DEVICE AND METHOD, AND CORRESPONDING PROGRAM