CN106909679B - Asymptotic entity identification method based on multi-path block division - Google Patents

Asymptotic entity identification method based on multi-path block division Download PDF

Info

Publication number
CN106909679B
CN106909679B CN201710122912.XA CN201710122912A CN106909679B CN 106909679 B CN106909679 B CN 106909679B CN 201710122912 A CN201710122912 A CN 201710122912A CN 106909679 B CN106909679 B CN 106909679B
Authority
CN
China
Prior art keywords
candidate
block
pairs
pair
credit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710122912.XA
Other languages
Chinese (zh)
Other versions
CN106909679A (en
Inventor
申德荣
孙琛琛
寇月
聂铁铮
于戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201710122912.XA priority Critical patent/CN106909679B/en
Publication of CN106909679A publication Critical patent/CN106909679A/en
Application granted granted Critical
Publication of CN106909679B publication Critical patent/CN106909679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a asymptotic entity identification method based on multipath block division, which comprises the following steps: generating intersected blocks by multiple paths of blocks, eliminating block redundancy by constructing a block diagram, initializing block credit and candidate pair credit, sequencing candidate pairs according to the credit, and sequentially inserting the candidate pairs into a candidate queue; then, the following three steps are carried out iteratively, (1) candidate pairs of the candidate queue are processed, (2) the credit of a part of the candidate pairs is updated according to the identification result, and (3) the sequence of the candidate queue is adjusted according to the updated credit of the candidate pairs, and the identified repeated data object pairs are output gradually, and the three steps are repeated until the candidate queue is empty. By adopting the asymptotic entity identification method, more repeated data objects can be identified given a shorter time budget; the credit of the candidate pairs is updated by dynamically estimating the redundancy of the blocks, and the candidate pair which is most likely to be matched is selected in real time for identification, so that high asymptotic property is ensured.

Description

Asymptotic entity identification method based on multi-path block division
Technical Field
The invention belongs to the field of data quality and data integration, and mainly relates to a progressive entity identification method based on multi-path block division.
Background
In the big data era, an important characteristic of data is diversity, and data objects describing the same entity in the real world may repeatedly appear in different forms in a single or multiple data sources, thereby resulting in low quality of data, reducing the usability and value of data, and becoming a bottleneck in big data integration, processing, analysis and mining. Entity identification is an important aspect of data quality, and repeated data objects describing the same entity are divided into the same group by analyzing a dirty data set, so that the aim of improving the data quality is fulfilled. Entity recognition typically deals with structured data objects, including data records in relational databases, data records in CSV files, data records in XML files, and the like. Entity identification, also known as entity parsing, entity matching, record joining, duplicate detection, record deduplication, entity parsing, reference disambiguation, deduplication, merging and purging, and the like. Entity identification has wide application requirements in a plurality of fields, including customer relationship management, census, medical health, online shopping price comparison, national security, citation database, spam detection, associated data, machine reading and the like. The existence of data redundancy is a direct reason for entity identification. Data redundancy can be divided into two categories: (1) data objects describing the same real-world entity may be added to the same data source multiple times, and such redundancy is referred to as redundancy of a single data source; (2) when integrating data from multiple data sources, data objects from different data sources may correspond to the same entity, and this type of redundancy is referred to as redundancy across data sources.
Entity identification mainly comprises three steps: data blocking, data object similarity calculation and data object pair matching decision. Firstly, data partitioning is also called as data indexing and is used for reducing a search space, reducing useless data object comparison and improving the identification speed; data chunking is an optional step. Secondly, calculating the similarity between the data objects is an important link of entity identification, and if the similarity of a data object pair is higher, the matching possibility of the data object pair is higher; the similarity calculation function is used for the similarity calculation. Finally, after the data object similarity is obtained, it is necessary to determine whether the data objects are matched (repeated) by using the data object similarity, and there are various methods for determining matching currently.
As a necessary preprocessing step for data mining and data analysis, the conventional entity recognition method takes the whole dirty data set as input, and outputs the recognition result after the processing is completed. However, many applications currently require (near) real-time data analysis, which conventional entity recognition techniques cannot meet. Asymptotic entity recognition can optimize recognition results to the greatest extent possible given the shorter time, thereby addressing the foregoing need. For example, the information flow application of financial news generally emphasizes real-time performance so that listeners can perform corresponding financial business processing in time. Financial data of stock market changes very fast, and new data can be generated at intervals; financial data may involve the names of a large number of companies and individuals, and it is not possible to identify all of these data in a short period of time. Information streaming applications require a large number of company and individual names to be identified in a short amount of time before financial news is released. To address such application requirements, new entity identification methods should process as many matching data object pairs as possible within a given short period of time.
Asymptotic entity identification. Compared with the traditional entity identification, the asymptotic entity identification needs to additionally satisfy the following two conditions: (1) early recognition results are better. Given any shorter time t, the asymptotic entity identification method is able to identify more repeated data object pairs than the conventional entity identification method. The time period t is much less than the full entity identification runtime. (2) The same final recognition result. If both the conventional entity recognition method and the asymptotic entity recognition method are operated to terminate naturally, both should produce the same recognition result.
Disclosure of Invention
Aiming at the defects of the existing asymptotic entity identification method, the invention provides an efficient asymptotic entity identification method based on multi-path blocks.
The technical scheme adopted by the invention is as follows:
a asymptotic entity identification method based on multipath block division comprises the following steps:
step 1, multi-path blocking. The main purpose of this step is to utilize a plurality of blocking keys K ═ KiI is more than or equal to |0 and less than or equal to | K | }, and a multi-path blocking result is generated and mainly divided into the following two substeps.
Substep 1. single-pass blocking. Given a dirty data set R ═ { R } and a chunk key k, R is partitioned into chunks b ═ d according to the key value r.k for a data object R (r.k). d () is an allocation function. The blocking result sets B are disjoint,
and substep 2. demultiplexing. Demultiplexing using the result of substep 1And (5) blocking. Given a dirty data set R and a set of block keys Ki|0≤i≤|K|},BiIs by means of a key kiThe resulting one-way chunking result set, then the multi-way chunking result set generated according to K is Bm=B1∪B2∪…∪B|K|。BmIs a set of intersections, each data object is most likely to appear in | K | different blocks.
And 2, generating a candidate queue. The main purpose of this step is to remove the block redundancy and generate a candidate queue, which is mainly divided into the following four sub-steps.
And 1, initializing block credit. A pair of data objects in a block is called a candidate data object pair or candidate pair and is denoted as<ri,rj>,ri,rjE.g. b. A block is a collection of data objects, and the potential of block b is the total number of pairs of different data objects in b, counted as
Figure BDA0001237576830000032
In the entity identification process, all identified pairs of data objects in a block b form an identified set denoted xi (b). Xi (b) all matching (repeating) data object pairs constitute a set of matches denoted xi+(b) In that respect Given a block b, the confidence of the block is positively correlated to the current matching set size of the block, negatively correlated to the potential of the block,
σd(b)=(|Ξ+(b)|+1)/(||b||+1) (1)
calculating the set of multi-path block results as B using the above formulamBlock credits of each block in (1).
And substep 2. block redundancy elimination. Redundancy due to multipath block partitioning is eliminated by constructing a block diagram. Given a result set B of multi-way blocksmThere is one undirected graph G ═ V, E, called a chunky graph. V is a node set, and any node V belongs to V and corresponds to BmOf the data object. E is the set of edges, for any edge E (v)i,vj) E.g. E (denoted as E)ij) Data object vi,vjAt least commonly present in BmIn one blockAnd (4) the following steps. Both R and V may represent data objects and both R and V may represent data sets. And generating a candidate pair set P by traversing the edges of the block diagram, wherein two data objects corresponding to each edge correspond to a unique candidate pair.
And 3. initializing the credit of the candidate pair. Given a candidate pair<ri,rj>Its confidence estimates the likelihood of a match for the candidate pair. The confidence of the candidate pair is to aggregate the matching possibilities provided by the co-occurrence blocks of the candidate pair, and to reduce with the total number of keys,
Figure BDA0001237576830000041
the confidence of each candidate pair in the candidate pair set P is calculated using the above formula.
And substep 4. sorting the candidate pairs. And arranging the candidate pairs in the candidate pair set P in a descending order according to the credit degrees, and sequentially inserting the candidate pairs into the candidate queue Q.
And 4, processing the candidate data object pair in an iterative manner. The main purpose of this step is to iteratively process candidate data object pairs and gradually output the most recently identified duplicate data object pairs.
Substep 1. candidate pair comparison. Taking a candidate pair from the head of the candidate queue Q<vi,vj>Identifying pairs of matching functions by entities<vi,vj>A comparison is made. If it is not<vi,vj>Is determined to be repetitive, the following operations are performed: the Look-around function is called to directly identify more candidate pairs. And outputting the identified repeated data object pairs.
Look-around function: when a duplicate data object pair is identified<vi,vj>And<vj,vk>then directly will<vi,vk>And the data object pair is determined to be the repeated data object pair, so that one time of data object pair comparison is saved.
And step 2, updating the credit of the candidate pair. Based on the recognition result, the block in which the most recent data object pair recognized as a duplicate is located is found, referred to as the affected block. Since the proportion of the affected blocks identified as duplicate data objects is increased, the dynamic block credits of the affected blocks are updated according to equation (1). The unidentified candidate pairs contained in these affected blocks are called affected candidate pairs, and the credits of these affected candidate pairs are updated with the new block credits according to equation (2).
And 3, adjusting the candidate queue. And re-sorting the candidate queues in a descending order according to the credit degrees of the new candidate pairs.
The above three sub-steps are repeated until the time budget is exhausted or the candidate queue is empty.
The invention has the advantages that: by adopting the asymptotic entity identification method, more repeated data objects can be identified given a shorter time budget (which is far lower than the total time of entity identification); the credit of the candidate pairs is updated by dynamically estimating the redundancy of the blocks, and the candidate pair which is most likely to be matched is selected in real time for identification, so that high asymptotic property is ensured.
Drawings
FIG. 1 is a general flow diagram of the present invention.
FIG. 2 is a block diagram G corresponding to the block set in step 2 of the detailed implementation mannerB
FIG. 3 is a graph comparing the real-time recall rate of the present invention with two other methods known in the art.
FIG. 4 is a graph comparing the asymptotic behavior of the present invention with that of two other methods.
Detailed Description
The following is an example of one embodiment of the present invention.
As shown in table 1, there is a sample data set containing 7 records. This is a dirty data set and the corresponding real recognition result is { { r { (R) }1,r2,r3,r4},{r5},{r6},{r7}}. It is currently desirable to identify this dirty data set asymptotically, that is, to try to identify the most duplicate record pairs given a shorter run time.
Table 1 sample dirty data set containing 7 personal records with attributes of name, age, work and city.
ID Name (I) Age (age) Work by City
r1 John Young 29 Waiter Poston
r2 John Joung 29 Waiter Boston
r3 Jon Young - Waiter Boston
r4 John Young 29 Waiter Boston
r5 Bob Brown 27 Waiter Austin
r6 Jeff Allen 29 - Boston
r7 Will Green 29 Teacher Boston
1. First, demultiplexing is performed. For the dirty data set in table 1, the name, age, work, and city are separately multi-chunked as keys, resulting in a result set,
Bm=Bsurname∪Bage∪Bjob∪Bcity
Bsurname={bs1={r1,r3,r4},bs2={r2},bs3={r5},bs4={r6},bs5={r7}}
Bage={ba1={r1,r2,r4,r6,r7},ba2={r5}}
Bjob={bj1={r1,r2,r3,r4,r5},bj2={r7}}
Bcity={bc1={r2,r3,r4,r6,r7},bc2={r1},bc3={r5}}
2. redundancy is then eliminated by building block maps. The above block set BmThere are 33 candidate pairs in total, and there is redundancy. For example, candidate pairs<r1,r4>Simultaneously appear in block bs1,ba1And bj1In (1). Building a block diagram, thereby removing BmBlock redundancy. As shown in FIG. 3, B is obtainedmCorresponding block diagram GBGraph GBEach edge in (1) corresponds to a unique candidate pair. As can be seen from fig. 3, after removing the blocking redundancy, the number of candidate pairs is reduced from 33 to 19.
3. Then, block credits and credits of candidate pairs are initialized and candidate queues are generated. The initial block credits and the credits of the candidate pairs of the computer may be calculated, as follows,
block credit: sigmad(bs1)=1/4,σd(ba1)=1/11,σd(bj1)=1/11,σd(bc1)=1/11。
Confidence (descending order) of candidate pairs: sigmad(<r1,r4>)=0.108,σd(<r3,r4>)=0.108,σd(<r1,r3>)=0.085,σd(<r2,r4>)=0.068,σd(<r6,r7>)=0.045,σd(<r2,r3>)=0.045,σd(<r1,r2>)=0.045,…
According to the credit degree of the candidate pair, the following should be processed firstly<r1,r4>Or<r3,r4>. According to the real recognition result { { r { (R) }1,r2,r3,r4},{r5},{r6},{r7} canIt is known that both candidate pairs are repetitive (matching). It follows that the ordering of the initial candidate pairs is very efficient.
And arranging the candidate pairs in descending order according to the credit degrees and inserting the candidate pairs into the candidate queue in sequence.
4. And entering an iterative asymptotic processing stage. And observing the iteration stage by turns. Table 2 presents the first 6 iterations of the present invention to process the dirty data set of table 1. According to the real recognition result { { r { (R) }1,r2,r3,r4},{r5},{r6},{r7}, each of the first 6 rounds identifies a duplicate data object pair. Therefore, if the entity identification budget is set to 6 times of data object pair comparison, the method of the present invention can identify all the repeated data object pairs within the budget range, which indicates that the asymptotic performance of the method of the present invention is very high.
Table 2 in each iteration, the block credits and the candidate pair at the head of the candidate queue.
Number of iteration rounds σd(bs1) σd(ba1) σd(bj1) σd(bc1) Head of lineCandidate pair
1 1/4 1/11 1/11 1/11 <r1,r4
2 2/4 2/11 2/11 1/11 <r3,r4
3 3/4 2/11 3/11 2/11 <r1,r3
4 1 2/11 4/11 2/11 <r2,r4
5 1 3/11 5/11 3/11 <r2,r3
6 1 3/11 6/11 4/11 <r1,r2

Claims (2)

1. A asymptotic entity identification method based on multipath block division is characterized in that: the method comprises the following steps:
step 1. multi-way blocking, using multiple blocking keys K ═ KiI is more than or equal to |0 and less than or equal to | K | }, and a multi-path blocking result is generated, which is specifically as follows:
step 1-1. given a dirty data set R ═ { R } and a chunk key k, divide R into a chunk b ═ d according to the key value r.k of a data object R (r.k); d (, x) is an allocation function, the set of blocking results B are disjoint,
Figure FDA0002191827700000011
b1,b2∈B;
step 1-2. using the result of substep 1-1, demultiplexing, giving a dirty data set R ═ { R } and a set of blocking keys K ═ { K }i|0≤i≤|K|};BiIs by means of a key kiThe resulting one-way chunking result set, then the multi-way chunking result set generated according to K is Bm=B1∪B2∪…∪B|K|;BmIs a set of intersections, each data object is most likely to appear in | K | different blocks;
step 2, generating a candidate queue, removing block redundancy and generating the candidate queue, wherein the method comprises the following steps:
step 2-1. Block letterDegree of use is initialized, and a pair of data objects in a block is called a candidate data object pair or a candidate pair and is written as<ri,rj>,ri,rjE.g. b, a block is a collection of data objects, the potential of block b is the total number of pairs of different data objects in b, and is taken as
Figure FDA0002191827700000012
In the entity identification process, all identified pairs of data objects in a block b constitute an identified set denoted xi (b), all pairs of matching data objects in a block b constitute a matched set denoted xi (b), and+(b) given a block b, the credit of the block is positively correlated with the current matching set size of the block and negatively correlated with the potential of the block;
σd(b)=(|Ξ+(b)|+1)/(||b||+1) (1)
calculating the set of multi-path block results as B using the above formulamA block credit for each block in (a);
step 2-2, block redundancy elimination, namely, eliminating redundancy brought by multi-path blocks by constructing a block diagram, and giving a result set B of the multi-path blocksmThere is an undirected graph G ═ V, E, called a blockgraph, where V is a set of nodes, and any node V ∈ V corresponds to BmE is a set of edges, E (v) for any edgei,vj) E, data object vi,vjAt least commonly present in BmWithin one block, R and V may both represent data objects, and R and V may both represent data sets; generating a candidate pair set P by traversing edges of the block diagram, wherein two data objects corresponding to each edge correspond to a unique candidate pair;
step 2-3. initializing credit of candidate pair, and giving a candidate pair<ri,rj>Its credit estimates the matching probability of the candidate pair, the credit of the candidate pair is to aggregate the matching probability provided by the co-occurrence blocks of the candidate pair and reduce by the total number of keys;
Figure FDA0002191827700000021
calculating the credit degree of each candidate pair in the candidate pair set P by using the formula;
step 2-4, sorting the candidate pairs, namely sorting the candidate pairs in the candidate pair set P in a descending order according to the credit degree, and sequentially inserting the candidate pairs into a candidate queue Q;
step 3. iteratively processing the candidate data object pairs, and gradually outputting the newly identified duplicate data object pairs, the method being as follows:
step 3-1, comparing the candidate pairs, and taking one candidate pair from the head of the candidate queue Q<vi,vj>Identifying pairs of matching functions by entities<vi,vj>Comparing; if it is not<vi,vj>Is determined to be repetitive, the following operations are performed: calling a Look-around function to directly identify more candidate pairs; outputting the identified duplicate data object pairs;
look-around function: when a duplicate data object pair is identified<vi,vj>And<vj,vk>then directly will<vi,vk>The data object pair is determined to be the repeated data object pair, so that one time of data object pair comparison is saved;
step 3-2, updating the credit of the candidate pairs, finding out the blocks where the newly identified repeated data object pairs are located according to the identification result, namely the affected blocks, wherein the dynamic block credits of the affected blocks are updated according to formula (1) as the proportion of the affected blocks identified as repeated data objects is increased, the unidentified candidate pairs contained in the affected blocks are called the affected candidate pairs, and the credits of the affected candidate pairs are updated by using the new block credits according to formula (2);
step 3-3, adjusting the candidate queues, and rearranging the candidate queues in a descending order according to the credit degree of the new candidate pairs;
the above three sub-steps are repeated until the time budget is exhausted or the candidate queue is empty.
2. The method of claim 1, wherein the method comprises: and 3, iteratively identifying the candidate pairs in the candidate queue, and dynamically adjusting the sequence of the candidate queue according to the identification result, so as to select the most possibly matched candidate pair in real time for identification.
CN201710122912.XA 2017-03-03 2017-03-03 Asymptotic entity identification method based on multi-path block division Active CN106909679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710122912.XA CN106909679B (en) 2017-03-03 2017-03-03 Asymptotic entity identification method based on multi-path block division

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710122912.XA CN106909679B (en) 2017-03-03 2017-03-03 Asymptotic entity identification method based on multi-path block division

Publications (2)

Publication Number Publication Date
CN106909679A CN106909679A (en) 2017-06-30
CN106909679B true CN106909679B (en) 2020-02-07

Family

ID=59186709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710122912.XA Active CN106909679B (en) 2017-03-03 2017-03-03 Asymptotic entity identification method based on multi-path block division

Country Status (1)

Country Link
CN (1) CN106909679B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090053104A (en) * 2007-11-22 2009-05-27 한양대학교 산학협력단 Information extracting apparatus using block grouping and method of extracting information using block grouping
CN105608067A (en) * 2014-11-07 2016-05-25 华东师范大学 Automatic knowledge extraction method and apparatus for network teaching system
CN106097043A (en) * 2016-06-01 2016-11-09 腾讯科技(深圳)有限公司 The processing method of a kind of credit data and server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8953684B2 (en) * 2007-05-16 2015-02-10 Microsoft Corporation Multiview coding with geometry-based disparity prediction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090053104A (en) * 2007-11-22 2009-05-27 한양대학교 산학협력단 Information extracting apparatus using block grouping and method of extracting information using block grouping
CN105608067A (en) * 2014-11-07 2016-05-25 华东师范大学 Automatic knowledge extraction method and apparatus for network teaching system
CN106097043A (en) * 2016-06-01 2016-11-09 腾讯科技(深圳)有限公司 The processing method of a kind of credit data and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向实体识别的聚类算法;孙琛琛等;《软件学报》;20160915;第2303-2319页 *

Also Published As

Publication number Publication date
CN106909679A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
US11507601B2 (en) Matching a first collection of strings with a second collection of strings
CN107391772B (en) Text classification method based on naive Bayes
CN104820708B (en) A kind of big data clustering method and device based on cloud computing platform
WO2015035864A1 (en) Method, apparatus and system for data analysis
US11238364B2 (en) Learning from distributed data
CN104809244B (en) Data digging method and device under a kind of big data environment
CN107480694B (en) Weighting selection integration three-branch clustering method adopting two-time evaluation based on Spark platform
Hariharakrishnan et al. Survey of pre-processing techniques for mining big data
CN110020176A (en) A kind of resource recommendation method, electronic equipment and computer readable storage medium
CN104112026A (en) Short message text classifying method and system
CN116762069A (en) Metadata classification
CN110442618B (en) Convolutional neural network review expert recommendation method fusing expert information association relation
CN111831629A (en) Data processing method and device
CN110633371A (en) Log classification method and system
Zhang et al. A Robust k‐Means Clustering Algorithm Based on Observation Point Mechanism
Mohammed et al. Feature reduction based on hybrid efficient weighted gene genetic algorithms with artificial neural network for machine learning problems in the big data
CN104199924B (en) The method and device of network form of the selection with snapshot relation
Joh et al. Applying Sequence Alignment Methods to Large Activity–Travel Data Sets: Heuristic Approach
Gialampoukidis et al. Probabilistic density-based estimation of the number of clusters using the DBSCAN-martingale process
CN106909679B (en) Asymptotic entity identification method based on multi-path block division
Schild et al. Linking survey data with administrative social security data-the project'interactions between capabilities in work and private life'
Liu et al. Lsdh: a hashing approach for large-scale link prediction in microblogs
US20240152818A1 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
CN114332745A (en) Near-repetitive video big data cleaning method based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant