CN106372668A - Data matching method and device - Google Patents

Data matching method and device Download PDF

Info

Publication number
CN106372668A
CN106372668A CN201610797496.9A CN201610797496A CN106372668A CN 106372668 A CN106372668 A CN 106372668A CN 201610797496 A CN201610797496 A CN 201610797496A CN 106372668 A CN106372668 A CN 106372668A
Authority
CN
China
Prior art keywords
data
acquisition system
matched
undetermined
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610797496.9A
Other languages
Chinese (zh)
Inventor
苗泽民
方庆安
崔世起
范羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sina Technology China Co Ltd
Original Assignee
Sina Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sina Technology China Co Ltd filed Critical Sina Technology China Co Ltd
Priority to CN201610797496.9A priority Critical patent/CN106372668A/en
Publication of CN106372668A publication Critical patent/CN106372668A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a data matching method and aims to solve a problem of relatively low accuracy of a matching result acquired through data matching carried out according to an IP address in the prior art. The method comprises steps that a first data set and a second data set are acquired, the first data set and the second data set respectively comprise at least one data group, each data group comprises at least two data; the data groups included by the first data set are matched with the data groups included by the second data set to acquire to-be-matched data group pairs; a matching accuracy degree of each to-be-matched data group pair is determined; at least one matching data group pair from the to-be-matched data group pairs is determined according to the matching accuracy degree of each to-be-matched data group pairs. The invention further discloses a data matching device.

Description

A kind of data matching method and device
Technical field
The application is related to field of computer technology, more particularly, to a kind of data matching method and device.
Background technology
With the continuous development of Internet information technique, carrying out information recommendation by internet channels to user becomes more next More universal, for example, it is possible to by internet channels to user's advertisement information, etc..
When carrying out information recommendation by way of the Internet, information recommendation side often compares the recommendation of concern information recommendation Effect.Recommendation effect mentioned here may refer to whether recommendation information produces impact to the user receiving this information, such as, User after receiving recommendation information, look into by the object that by clicking on this recommendation information, recommendation message may be recommended See, or recommendation information sender website can be conducted interviews, etc..
At present, information recommendation side can reach preferable recommendation effect for recommendation information, often through to user to pushing away Recommend click data (for example, the model of used equipment, user's click recommendation information when institute during user's click recommendation information of information Use equipment the network address, etc.) and this recommendation information produce recommendation effect (e.g., have accessed information recommendation side website Or download the application of information recommendation side, etc.) mated, to determine which user has carried out point to which recommendation information Hit, create which type of recommendation effect, and by analyzing to matching result, can know which type of this recommendation information is directed to User can produce preferable recommendation effect.
Taking carry out advertisement recommendation by internet channels as a example, user receive advertisement recommendation side push advertisement after, Following effect of advertising may be produced, user passes through to the click receiving advertisement, jumps under application market (app store) Carry the application recommended in advertisement.
For such effect of advertising, in the prior art, user is often being clicked on advertisement by advertisement sender When the network address (internet protocol address, ip address) that the uses and ip address that uses when downloading of application As basis for estimation, to determine the corresponding relation of the click data to advertisement for the user and effect of advertising.
But in actual use, because a lot of companies or school may share same public ip address, thus Determine the corresponding relation of the click data to advertisement for the user and effect of advertising according to ip address, the matching result accuracy rate obtaining Poor, and then advertisement sender cannot obtain desired result by the analysis to matching result.
It can be seen that, how to avoid prior art according to ip address, user click data and recommendation information effect data to be carried out During coupling, the matching result accuracy rate obtaining is relatively low, becomes prior art problem demanding prompt solution.
Content of the invention
The embodiment of the present application provides a kind of data matching method, data is entered according only to ip address in order to solve prior art Row coupling, the relatively low problem of the matching result accuracy rate obtaining.
The embodiment of the present application also provides a kind of data matching device, in order to solve prior art according only to ip address to data Mated, the relatively low problem of the matching result accuracy rate obtaining.
The embodiment of the present application adopts following technical proposals:
A kind of data matching method, comprising:
Obtain the first data acquisition system and the second data acquisition system, described first data acquisition system, the second data acquisition system comprise respectively At least one data set, each described data set comprises at least two data;
The data set comprising in the data set comprising in described first data acquisition system and described second data acquisition system is carried out Coupling, obtains matched data group pair undetermined;
Determine the matching accuracy of each described matched data group pair undetermined;
According to the matching accuracy of each described matched data group pair undetermined, determine from each described matched data group centering undetermined Go out at least one matched data group pair.
A kind of data matching device, comprising:
Data acquisition system acquiring unit, for obtaining the first data acquisition system and the second data acquisition system, described first data acquisition system, Second data acquisition system comprises at least one data set respectively, and each described data set comprises at least two data;
Data matching unit, in the data set that will comprise in described first data acquisition system and described second data acquisition system The data set comprising is mated, and obtains matched data group pair undetermined;
Accuracy determining unit, for determining the matching accuracy of each described matched data group pair undetermined;
Matched data determining unit, for the matching accuracy according to each described matched data group pair undetermined, from each described At least one matched data group pair is determined in matched data group centering undetermined.
At least one technical scheme above-mentioned that the embodiment of the present application adopts can reach following beneficial effect:
Due to the data matching method being provided using the embodiment of the present application, every in the first data acquisition system and the second data acquisition system Individual data set all comprises at least two data, by the data set comprising in described first data acquisition system and described second data The data set comprising in set is mated, and obtains matched data group pair undetermined, and determine each matched data group pair undetermined Registering exactness, according to the described matching accuracy determining, determines at least one coupling number from each matched data group centering undetermined Right according to organizing, therefore with respect to only passing through in prior art to judge in two data sets, whether some data is identical, thus judging The mode whether two data sets mate is compared, the scheme being provided due to the application when determining the data set of coupling, need The coupling determining is according to more, also more reasonable, thus when determining the data set of coupling, the accuracy of matching result is higher.
Brief description
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used for explaining the application, does not constitute the improper restriction to the application.In the accompanying drawings:
A kind of idiographic flow schematic diagram of data matching method that Fig. 1 provides for the embodiment of the present application;
Fig. 2 illustrates for a kind of idiographic flow that click data is mated with effect data that the embodiment of the present application provides Figure;
A kind of concrete structure schematic diagram of data matching device that Fig. 3 provides for the embodiment of the present application.
Specific embodiment
Purpose, technical scheme and advantage for making the application are clearer, below in conjunction with the application specific embodiment and Corresponding accompanying drawing is clearly and completely described to technical scheme.Obviously, described embodiment is only the application one Section Example, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing The every other embodiment being obtained under the premise of going out creative work, broadly falls into the scope of the application protection.
Below in conjunction with accompanying drawing, describe the technical scheme that each embodiment of the application provides in detail.
A kind of data matching method that the embodiment of the present application provides, in order to solve prior art according only to ip address to data Mated, the relatively low problem of the matching result accuracy rate obtaining.
The embodiment of the present application provide data matching method executive agent, can be, but not limited to for mobile phone, panel computer, PC (personal computer, pc) and intelligent television, wait at least one in terminal unit.In addition the method Executive agent can also be server, for example, the server of shopping website, the server of advertiser website, application download site Server, etc..
For ease of description, the reality as a example hereafter executive agent in this way is the server of advertiser website, to the method The mode of applying is introduced.It is appreciated that the server that the executive agent of the method is advertiser website is a kind of exemplary saying Bright, it is not construed as the restriction to the method.
The method implement schematic flow sheet as shown in figure 1, mainly comprising the steps:
Step 11, obtains the first data acquisition system and the second data acquisition system;
Wherein, described first data acquisition system and the second data acquisition system comprise at least one data set respectively, described in each Data set comprises at least two data.
, advertiser is in order to understand effect of advertising it may be necessary to advertiser website is to user taking the server of advertiser website as a example The effect data of the data and advertisement of clicking on advertisement is mated, and will be anti-for the effect of advertising being obtained according to matching result analysis It is fed to advertiser, in order to achieve the above object, advertiser website is firstly the need of the click data of the advertisement to advertiser and advertisement Effect data be collected.
The click data of described advertisement, when such as can include user's click advertisement, the model of the terminal being used, terminal Operating system, the operating system version of terminal, ip address and click on the time, etc..Access in user's using terminal equipment Advertiser website, and when clicking on the advertisement shown in advertiser website, the information of terminal that advertiser website can be used to user, point The temporal information hit and the ip address of user are recorded, and using record those information as user click data.
And effect of advertising, usually may refer to advertisement and whether impact is produced on the user receiving this advertisement, such as, use Family is after the advertisement watching advertiser website to show, if by clicking on advertisement, and the product shown in advertisement can be visited Ask, or the website of advertiser is conducted interviews, etc..And advertisement effectiveness data mentioned here is being clicked on it is simply that referring to user After advertisement, the data that produces when accessing to advertiser's recommended products, advertisement effectiveness data can also include the end of user's use Client information, temporal information (such as, user accesses time of advertisement main web site, user buys the time of product in advertisement, etc.) And the ip address of user's using terminal equipment, etc..
It is assumed that clicking on certain advertisement with user, the effect of advertising of generation are: jump in app store downloads ad and recommend Application as a example, then now user click on advertisement produce effect of advertising typically can for user download app time, under user Carry the end message of app, and ip address etc..In such a case, it is possible to it is embedded in the installation kit of the app of advertisement promotion Information code, after user downloads the installation kit of this app and this app is installed in terminal, by app installation kit Embedded information code, can to install the model of terminal of application, the operating system of terminal, operating system version number, And download time of app and other effects data is collected, and those effect datas collected are sent to advertiser website Server.
Still collected as a example ad click data and advertisement effectiveness data by above-mentioned advertiser website, advertiser website can will be received Ad click data and the advertisement effectiveness data collecting is retained separately into two independent data acquisition systems.In each data acquisition system Click on every time or each effect of advertising, as the data set in data acquisition system, above-mentioned terminal in each data set, can be comprised Information, time, ip address, etc. data.
Step 12, the data that will comprise in the data set comprising in described first data acquisition system and described second data acquisition system Group is mated, and obtains matched data group pair undetermined;
In one embodiment, can be by each data set comprising in the first data acquisition system be counted with second respectively Mated according to each data set comprising in set, obtained matched data group pair undetermined.
Due to all comprising at least two data in data set in the data acquisition system that obtains by execution step 11, then to When the data set comprising in one data acquisition system is mated with the data set comprising in the second data acquisition system, can be by first The data pair comprising in the data set comprising in the data comprising in the data set comprising in data acquisition system and the second data acquisition system Should be mated on ground, to obtain matched data group pair undetermined, in one embodiment, the specific implementation of step 12 is permissible Including: for any data comprising in any data group comprising in described first data acquisition system and described second data acquisition system Group, mates to type identical data in two data sets respectively;When at least a pair of data in described two data sets When the match is successful, obtain the matched data group pair undetermined being made up of described two data sets.
For example, the click data to advertisement for the user that the server with the first data acquisition system as advertiser website is collected, the As a example two data acquisition systems are the advertisement effectiveness data collected of server of advertiser website, advertiser website can be by hits Mated with the effect data in effect data set according to the click data in set, to determine corresponding to certain one click Effect.It is assumed that certain click data is in click data set: mobile phone model aaa, mobile phone operating system a, mobile phone operation system System version a1, ip address 192.168.1.122, ad click time 15:20;Some effect data in effect data set For: when mobile phone model aaa, mobile phone operating system a, mobile phone operating system version a1, ip address 192.168.1.122, app download Between 15:23, by believing to mobile phone in the cellphone information comprising in this click data, ip address and click time and effect data Breath, ip address and download time are mated, and obtain mobile phone model, mobile phone operating system, mobile phone operation in this click data System version, ip address information all same corresponding with effect data, and click time of comprising in click data and effect The app download time comprising in data is spaced in preset duration scope, then can determine that this click data with this effect data is Matched data group pair undetermined.
It should be noted that the effect of advertising clicking on advertisement triggering with user for jumping to app store download application are Example, under this scene, user must be first click advertisement, after app in downloads ad in app store, that is, wide when clicking on Accuse with downloading app when being same user triggering, user the click time one of advertisement is scheduled on user download app time it Before, and user also will not be oversize to the click time of advertisement and the time interval of app download time, thus the time can be will click on Whether before app download time, and the time of clicking on is less than preset duration with the time interval of app download time, as judgement The basis for estimation whether click data is mated with effect data.
By said method, the data comprising in the data set comprising in the first data acquisition system and the second data acquisition system When group is mated, it is not the mode using single coupling according to (such as, ip address) coupling completely, but by the first data The every number comprising in the data set comprising in each data comprising in the data set comprising in set and the second data acquisition system According to all accordingly being mated, and in all coupling all successes, determine the data set and second comprising in the first data acquisition system The data set coupling comprising in data acquisition system, this programme is not rely on single coupling foundation to determine matched data, but To determine matched data by multiple couplings according to all couplings, thus with respect to prior art, the side being provided using step 12 Method, determines that the accuracy of matched data group pair undetermined is higher.
Step 13, determines the matching accuracy of each described matched data group pair undetermined;
It should be noted that pass through execution step 12, the matched data group centering undetermined obtaining, include at least two to The data joined, then the matching accuracy between each pair matched data that can be comprised according to matched data group centering undetermined, come really The matching accuracy of fixed matched data group pair undetermined, in one embodiment, the specific implementation of step 13 may include that Determine the Data Matching accuracy of the data of every a pair of the coupling of matched data group centering undetermined;According to matched data group centering undetermined The Data Matching accuracy of each matched data, determines the matching accuracy of described matched data group pair undetermined.
Due to the data comprising in the data set that comprises in the first data acquisition system and the second data acquisition system, it is not unique In presence and some data set, for example, taking click data set with effect data set as a example, can in click data set The equipment type that can have multiple click datas is huawei, and is also possible to setting of multiple effect datas in effect data set Standby host type is huawei, and in one embodiment, the embodiment of the present application can be according to matched data group centering undetermined coupling The number of times that data occurs respectively in the first data acquisition system and the second data acquisition system, determines the data of described coupling described first The all match condition that can occur in data acquisition system and described second data acquisition system;And according to the described match condition determining, Determine the Data Matching accuracy between the data of every a pair of coupling of described matched data group centering undetermined.
In one embodiment, the coupling number that matched data group centering undetermined comprises can be calculated according to arrangement formula According to all match condition that can occur in the first data acquisition system and the second data acquisition system, and according to the match condition determining, Using such as following formula [1], to determine the data between the data of every a pair of coupling of described matched data group centering undetermined Registering exactness p:
p = 1 a m n - - - [ 1 ]
Wherein, m represents the number of times that the data of matched data group centering coupling undetermined occurs in the first data acquisition system, n table Show the number of times that the data of matched data group centering coupling undetermined occurs in the second data acquisition system.
It is assumed that one having 10 huawei types, effect data set in the click data comprising in click data set In one have 2 huawei types in the effect data that comprises, then only using type as coupling according to click data and effect When fruit data is mated, comprise in click data that click data set comprises huawei type and effect data set The effect data of huawei type can occur altogetherPlant match condition, then assume that and comprise treating of huawei type Determining matched data to the accuracy rate being true match data is
Equally, for the click time in data acquisition system and download time, can be according to the time of click in download time Before, and click on time interval between time and download time in default duration scope, click time and download are determined The Data Matching accuracy of time, in one embodiment, when can calculate the click time and download by below equation Between Data Matching accuracy,Wherein, teffectRepresent application download time, tclickWhen representing click Between.
In one embodiment, determining every a pair of the matched data of matched data group centering undetermined by above-mentioned steps After Data Matching accuracy, can be to the Data Matching accuracy between the data of each coupling of described matched data group centering undetermined Weighted sum, weighted sum result is defined as the matching accuracy of described matched data group pair undetermined.
For example, with the first data acquisition system for click data set, the second data acquisition system is effect data set, coupling undetermined The matched data that data set centering comprises be respectively as follows: type (being represented with model hereinafter), operating system (being represented with os hereinafter), Operating system version (being represented with osversion hereinafter) and ip address (being represented with ip hereinafter), then matched data group pair undetermined In each coupling data between Data Matching accuracy be respectively as follows: p(model)、p(os)、p(osversion)、p(ip), then number to be matched According to group to matching accuracy be: p=p(model)+p(os)+p(osversion)+p(ip).
By execution step 13 it may be determined that the coupling going out the data set pair all undetermined obtaining by execution step 12 is accurate Exactness, and then the matching accuracy according to each matched data group pair undetermined, determine at least one from matched data group centering undetermined Individual matched data group pair, detailed process step 14 as described below.
Step 14, according to the matching accuracy of each described matched data group pair undetermined, from each described matched data group undetermined At least one matched data group pair is determined in centering.
The matching accuracy of the matched data group pair each undetermined being determined by execution step 13 is suitable according to from high to low Sequence arranges, and chooses the several matched data group pair undetermined of accuracy highest, as the matched data group pair determining.
In one embodiment, the accuracy of the matched data group pair undetermined in order to avoid being obtained by execution step 13 More than 1, the matching accuracy obtaining can be processed, it is accurate the bottom of for 2 for example the matching accuracy that obtain can be taken The logarithm of degree, i.e. log2p.
It should be noted that working as by execution step 14, determine matched data group to rear, in order to avoid have determined The impact to the matching accuracy of other matched data groups pair undetermined in data acquisition system for the data that matched data group centering comprises, In a kind of embodiment, when determine matched data group to after, can incite somebody to action in the first data acquisition system and the second data acquisition system The data set that matched data group centering comprises is deleted, and the first data acquisition system after deleting and the second data acquisition system are continued to hold Row above-mentioned steps 11, step 12, step 13 and step 14, specifically, the embodiment of the present application provide method may include that from Reject, in described first data acquisition system and described second data acquisition system, the data set that described matched data group centering comprises, obtain more The first data acquisition system after new and the second data acquisition system after renewal;After the first data acquisition system after described renewal and renewal At least one matched data group pair is determined, until meeting pre-conditioned in second data acquisition system;Wherein, described pre-conditioned bag Include: the data set comprising in described first data acquisition system or described second data acquisition system is less than one.
It should be noted that described pre-conditioned can as needed, flexibly setting is configured, and such as can arrange really The matched data group making specified quantity can terminate, and is not necessarily required to be processed according to above-mentioned steps, until the first number It is not more than one according to the data set comprising in set or the second data acquisition system.
Hereafter mated with effect data with the click data to advertisement, a kind of detailed the embodiment of the present application of introducing carries For a kind of data matching method, the idiographic flow schematic diagram of the method is as shown in Fig. 2 mainly may comprise steps of:
Step 21, data collection clicks on data and effect data;
Step 22, the click data collected and effect data are saved in click data collection and effect data collection respectively In;
Step 23, counts to the occurrence number of click data collection and effect data intensive data respectively;
Step 24, concentrates the data comprising to mate click data collection and effect data, obtains coupling number undetermined According to right;
Step 25, calculates the matching accuracy of each matched data pair undetermined;
Step 26, by matching accuracy order from high to low, the coupling to calculated each matched data degree undetermined Accuracy is ranked up, and judges to whether there is accuracy in matching accuracy in 1 situation, if existing, execution step 27, If not existing, execution step 28;
Step 27, the matched data undetermined that matching accuracy is equal to 1 is defined as matched data pair, and from click data collection Concentrate with effect data and reject the data that matched data centering comprises, and then execution step 23;
Step 28, chooses some matched datas pair undetermined of matching accuracy highest, as matched data pair.
The data matching method being provided using the embodiment of the present application, due to every in the first data acquisition system and the second data acquisition system Individual data set all comprises at least two data, by the data set comprising in described first data acquisition system and described second data The data set comprising in set is mated, and obtains matched data group pair undetermined, and determine each matched data group pair undetermined Registering exactness, according to the described matching accuracy determining, determines at least one coupling number from each matched data group centering undetermined Right according to organizing, therefore with respect to only passing through in prior art to judge in two data sets, whether some data is identical, thus judging The mode whether two data sets mate is compared, the scheme being provided due to the application when determining the data set of coupling, need The coupling determining is according to more, also more reasonable, thus when determining the data set of coupling, the accuracy of matching result is higher.
The embodiment of the present application additionally provides a kind of data matching device, in order to solve prior art according only to ip address logarithm According to being mated, the relatively low problem of the matching result accuracy rate that obtains.The concrete structure schematic diagram of this device is as shown in Fig. 2 wrap Include: data acquisition system acquiring unit 31, data matching unit 32, accuracy determining unit 33 and matched data determining unit 34.
Wherein, data acquisition system acquiring unit 31, for obtaining the first data acquisition system and the second data acquisition system, described first number Comprise at least one data set according to set, the second data acquisition system respectively, each described data set comprises at least two data;
Data matching unit 32, for the data set and described second data acquisition system that will comprise in described first data acquisition system In the data set that comprises mated, obtain matched data group pair undetermined;
Accuracy determining unit 33, for determining the matching accuracy of each described matched data group pair undetermined;
Matched data determining unit 34, for the matching accuracy according to each described matched data group pair undetermined, from each institute State matched data group centering undetermined and determine at least one matched data group pair.
In one embodiment, described device also includes data acquisition system updating block 35, for from described first data Reject, in set and described second data acquisition system, the data set that described matched data group centering comprises, the first number after being updated According to the second data acquisition system after set and renewal;
Data matching unit 32, is additionally operable to from the first data acquisition system after described renewal and the second data acquisition system after renewal In determine at least one matched data group pair, until meet pre-conditioned, described pre-conditioned inclusion: described first data set The data set comprising in conjunction or described second data acquisition system is not more than one.
In one embodiment, data matching unit 32, specifically for every by comprise in described first data acquisition system Individual data component is not mated with each data set comprising in described second data acquisition system, obtains matched data group undetermined Right.
In one embodiment, data matching unit 32, specifically for for comprising in described first data acquisition system Any data group comprising in any data group and described second data acquisition system, respectively to type identical number in two data sets According to being mated;When at least a pair of Data Matching success in described two data sets, obtain by described two data set groups The matched data group pair undetermined becoming.
In one embodiment, accuracy determining unit 33, comprising: the first determination subelement 331, undetermined for determining The Data Matching accuracy of the data of every a pair of the coupling of matched data group centering;Second determination subelement 332, for according to undetermined The Data Matching accuracy of each matched data of matched data group centering, determines that the coupling of described matched data group pair undetermined is accurate Degree.
In one embodiment, the first determination subelement 331, specifically for according to described matched data group centering undetermined The number of times that the data of coupling occurs respectively in described first data acquisition system and described second data acquisition system, determines described coupling All match condition that data can occur in described first data acquisition system with described second data acquisition system;According to the institute determining State match condition, determine the Data Matching accuracy between the data of every a pair of coupling of described matched data group centering undetermined;
And/or, the second determination subelement 332, specifically for the data to each coupling of described matched data group centering undetermined Between Data Matching accuracy weighted sum, will be accurate for coupling that weighted sum result is defined as described matched data group pair undetermined Exactness.
In one embodiment, described first data acquisition system and the second data acquisition system correspond respectively to different terminals;Institute State the data comprising in data set and include following at least two: the hardware information of terminal;The network address of terminal;The operation of terminal System information.
The data matching device being provided using the embodiment of the present application, due to every in the first data acquisition system and the second data acquisition system Individual data set all comprises at least two data, by the data set comprising in described first data acquisition system and described second data The data set comprising in set is mated, and obtains matched data group pair undetermined, and determine each matched data group pair undetermined Registering exactness, according to the described matching accuracy determining, determines at least one coupling number from each matched data group centering undetermined Right according to organizing, therefore with respect to only passing through in prior art to judge in two data sets, whether some data is identical, thus judging The mode whether two data sets mate is compared, the scheme being provided due to the application when determining the data set of coupling, need The coupling determining is according to more, also more reasonable, thus when determining the data set of coupling, the accuracy of matching result is higher.
Those skilled in the art are it should be appreciated that embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using complete hardware embodiment, complete software embodiment or the reality combining software and hardware aspect Apply the form of example.And, the present invention can be using in one or more computers wherein including computer usable program code The upper computer program implemented of usable storage medium (including but not limited to disk memory, cd-rom, optical memory etc.) produces The form of product.
The present invention is the flow process with reference to method according to embodiments of the present invention, equipment (system) and computer program Figure and/or block diagram are describing.It should be understood that can be by each stream in computer program instructions flowchart and/or block diagram Flow process in journey and/or square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor instructing general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device is to produce A raw machine is so that produced for reality by the instruction of computer or the computing device of other programmable data processing device The device of the function of specifying in present one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and can guide computer or other programmable data processing device with spy Determine in the computer-readable memory that mode works so that the instruction generation inclusion being stored in this computer-readable memory refers to Make the manufacture of device, this command device realize in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or The function of specifying in multiple square frames.
These computer program instructions also can be loaded in computer or other programmable data processing device so that counting On calculation machine or other programmable devices, execution series of operation steps to be to produce computer implemented process, thus in computer or On other programmable devices, the instruction of execution is provided for realizing in one flow process of flow chart or multiple flow process and/or block diagram one The step of the function of specifying in individual square frame or multiple square frame.
In a typical configuration, computing device includes one or more processors (cpu), input/output interface, net Network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (ram) and/or The forms such as Nonvolatile memory, such as read only memory (rom) or flash memory (flash ram).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology is realizing information Store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (pram), static RAM (sram), moves State random access memory (dram), other kinds of random access memory (ram), read only memory (rom), electric erasable Programmable read only memory (eeprom), fast flash memory bank or other memory techniques, read-only optical disc read only memory (cd-rom), Digital versatile disc (dvd) or other optical storage, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, can be used for storing the information that can be accessed by a computing device.Define according to herein, calculate Machine computer-readable recording medium does not include temporary computer readable media (transitory media), the such as data signal of modulation and carrier wave.
Also, it should be noted term " inclusion ", "comprising" or its any other variant are intended to nonexcludability Comprising, so that including a series of process of key elements, method, commodity or equipment not only include those key elements, but also wrapping Include other key elements being not expressly set out, or also include for this process, method, commodity or intrinsic the wanting of equipment Element.In the absence of more restrictions, the key element being limited by sentence "including a ..." is it is not excluded that including described wanting Also there is other identical element in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program. Therefore, the application can adopt complete hardware embodiment, complete software embodiment or combine the embodiment of software and hardware aspect Form.And, the application can be deposited using can use in one or more computers wherein including computer usable program code The shape of the upper computer program implemented of storage media (including but not limited to disk memory, cd-rom, optical memory etc.) Formula.
The foregoing is only embodiments herein, be not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All any modifications made within spirit herein and principle, equivalent Replace, improve etc., within the scope of should be included in claims hereof.

Claims (14)

1. a kind of data matching method is it is characterised in that include:
Obtain the first data acquisition system and the second data acquisition system, described first data acquisition system, the second data acquisition system comprise at least respectively One data set, each described data set comprises at least two data;
The data set comprising in described first data acquisition system is mated with the data set comprising in described second data acquisition system, Obtain matched data group pair undetermined;
Determine the matching accuracy of each described matched data group pair undetermined;
According to the matching accuracy of each described matched data group pair undetermined, from each described matched data group centering undetermined determine to A few matched data group pair.
2. the method for claim 1 is it is characterised in that accurate according to the coupling of each described matched data group pair undetermined Degree, determines at least one matched data group to rear from each described matched data group centering undetermined, methods described also includes:
Reject, from described first data acquisition system and described second data acquisition system, the data set that described matched data group centering comprises, The first data acquisition system after being updated and the second data acquisition system after renewal;
Determine at least one matched data from the first data acquisition system after described renewal and the second data acquisition system after renewal Group is right, until meeting pre-conditioned, described pre-conditioned inclusion: wrap in described first data acquisition system or described second data acquisition system The data set containing is not more than one.
3. the method for claim 1 it is characterised in that by the data set comprising in described first data acquisition system with described The data set comprising in second data acquisition system is mated, and obtains matched data group pair undetermined, specifically includes:
By each data set comprising in described first data acquisition system respectively with every number of comprising in described second data acquisition system Mated according to group, obtained matched data group pair undetermined.
4. the method for claim 1 it is characterised in that by the data set comprising in described first data acquisition system with described The data set comprising in second data acquisition system is mated, and obtains matched data group pair undetermined, specifically includes:
For any data comprising in any data group comprising in described first data acquisition system and described second data acquisition system Group, mates to type identical data in two data sets respectively;
When at least a pair of Data Matching success in described two data sets, obtain by described two data sets form undetermined Matched data group pair.
5. the method for claim 1 it is characterised in that determine described matched data group pair undetermined matching accuracy, Specifically include:
Determine the Data Matching accuracy of the data of every a pair of the coupling of matched data group centering undetermined;
According to the Data Matching accuracy of each matched data of matched data group centering undetermined, determine described matched data group pair undetermined Matching accuracy.
6. method as claimed in claim 5 it is characterised in that described determination matched data undetermined group centering every a pair of coupling The Data Matching accuracy of data, specifically includes:
Data according to described matched data group centering coupling undetermined is respectively in described first data acquisition system and described second data The number of times occurring in set, determines that the data of described coupling can in described first data acquisition system with described second data acquisition system The all match condition occurring;
According to the described match condition determining, determine between the data of every a pair of coupling of described matched data group centering undetermined Data Matching accuracy;
And/or, the described Data Matching accuracy according to each matched data of matched data group centering undetermined, determine described undetermined Join the matching accuracy of data set pair, specifically include:
To the Data Matching accuracy weighted sum between the data of each coupling of described matched data group centering undetermined, weighting is asked It is defined as the matching accuracy of described matched data group pair undetermined with result.
7. method as claimed in claim 6 is it is characterised in that determine described matched data group centering undetermined according to below equation Every a pair of coupling data between Data Matching accuracy p:
p = 1 a n m ;
Wherein, n represents the number of times that the data of coupling occurs in described first data acquisition system, and m represents the data of coupling described The number of times occurring in second data acquisition system,Represent that choosing m from n data is arranged.
8. the method as described in any one of claim 1~7 is it is characterised in that described first data acquisition system and the second data set Conjunction corresponds respectively to different terminals;
The data comprising in described data set includes following at least two:
The hardware information of terminal;
The network address of terminal;
The operation system information of terminal.
9. a kind of data matching device is it is characterised in that include:
Data acquisition system acquiring unit, for obtaining the first data acquisition system and the second data acquisition system, described first data acquisition system, second Data acquisition system comprises at least one data set respectively, and each described data set comprises at least two data;
Data matching unit, for comprising in the data set comprising in described first data acquisition system and described second data acquisition system Data set mated, obtain matched data group pair undetermined;
Accuracy determining unit, for determining the matching accuracy of each described matched data group pair undetermined;
Matched data determining unit, for the matching accuracy according to each described matched data group pair undetermined, from each described undetermined At least one matched data group pair is determined in matched data group centering.
10. device as claimed in claim 9 is it is characterised in that described device also includes data acquisition system updating block, for from Reject, in described first data acquisition system and described second data acquisition system, the data set that described matched data group centering comprises, obtain more The first data acquisition system after new and the second data acquisition system after renewal;
Data matching unit, is additionally operable to determine from the first data acquisition system after described renewal and the second data acquisition system after renewal Go out at least one matched data group pair, until meeting pre-conditioned, described pre-conditioned inclusion: described first data acquisition system or institute State the data set comprising in the second data acquisition system and be not more than one.
11. devices as claimed in claim 9 it is characterised in that data matching unit, specifically for by described first data set Each data set comprising in conjunction is mated with each data set comprising in described second data acquisition system respectively, obtains undetermined Matched data group pair.
12. devices as claimed in claim 9 it is characterised in that data matching unit, specifically for for described first data Any data group comprising in any data group comprising in set and described second data acquisition system, respectively in two data sets Type identical data is mated;
When at least a pair of Data Matching success in described two data sets, obtain by described two data sets form undetermined Matched data group pair.
13. devices as claimed in claim 9 are it is characterised in that accuracy determining unit, comprising:
First determination subelement, the Data Matching for determining the data of every a pair of the coupling of matched data group centering undetermined is accurate Degree;
Second determination subelement, for the Data Matching accuracy according to each matched data of matched data group centering undetermined, determines The matching accuracy of described matched data group pair undetermined.
14. devices as claimed in claim 13 it is characterised in that the first determination subelement, specifically for according to described undetermined The number of times that the data of matched data group centering coupling occurs respectively in described first data acquisition system and described second data acquisition system, Determine the data of described coupling described first data acquisition system with can occur in described second data acquisition system all mate feelings Condition;According to the described match condition determining, determine between the data of every a pair of coupling of described matched data group centering undetermined Data Matching accuracy;
And/or, the second determination subelement, specifically for the number between the data to each coupling of described matched data group centering undetermined According to matching accuracy weighted sum, weighted sum result is defined as the matching accuracy of described matched data group pair undetermined.
CN201610797496.9A 2016-08-31 2016-08-31 Data matching method and device Pending CN106372668A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610797496.9A CN106372668A (en) 2016-08-31 2016-08-31 Data matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610797496.9A CN106372668A (en) 2016-08-31 2016-08-31 Data matching method and device

Publications (1)

Publication Number Publication Date
CN106372668A true CN106372668A (en) 2017-02-01

Family

ID=57898854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610797496.9A Pending CN106372668A (en) 2016-08-31 2016-08-31 Data matching method and device

Country Status (1)

Country Link
CN (1) CN106372668A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193884A (en) * 2017-04-27 2017-09-22 北京小米移动软件有限公司 A kind of method and apparatus of matched data

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646110A (en) * 2013-12-26 2014-03-19 中国人民银行征信中心 Natural person basic identity information matching method
CN103678327A (en) * 2012-09-04 2014-03-26 中国移动通信集团四川有限公司 Method and device for information association
CN103810527A (en) * 2008-10-23 2014-05-21 起元技术有限责任公司 Method and system for operating data operations, mesuring data quality and joining data elements
CN104239301A (en) * 2013-06-06 2014-12-24 阿里巴巴集团控股有限公司 Data comparing method and device
CN104298736A (en) * 2014-09-30 2015-01-21 华为软件技术有限公司 Method and device for aggregating and connecting data as well as database system
CN104504021A (en) * 2014-12-11 2015-04-08 北京国双科技有限公司 Data matching method and device
CN105224649A (en) * 2015-09-29 2016-01-06 北京奇艺世纪科技有限公司 A kind of data processing method and device
CN105630867A (en) * 2015-12-01 2016-06-01 广东小天才科技有限公司 Method and device for matching data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810527A (en) * 2008-10-23 2014-05-21 起元技术有限责任公司 Method and system for operating data operations, mesuring data quality and joining data elements
CN103678327A (en) * 2012-09-04 2014-03-26 中国移动通信集团四川有限公司 Method and device for information association
CN104239301A (en) * 2013-06-06 2014-12-24 阿里巴巴集团控股有限公司 Data comparing method and device
CN103646110A (en) * 2013-12-26 2014-03-19 中国人民银行征信中心 Natural person basic identity information matching method
CN104298736A (en) * 2014-09-30 2015-01-21 华为软件技术有限公司 Method and device for aggregating and connecting data as well as database system
CN104504021A (en) * 2014-12-11 2015-04-08 北京国双科技有限公司 Data matching method and device
CN105224649A (en) * 2015-09-29 2016-01-06 北京奇艺世纪科技有限公司 A kind of data processing method and device
CN105630867A (en) * 2015-12-01 2016-06-01 广东小天才科技有限公司 Method and device for matching data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
甄灵敏等: ""基于属性权重的实体解析技术", 《计算机研究与发展》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193884A (en) * 2017-04-27 2017-09-22 北京小米移动软件有限公司 A kind of method and apparatus of matched data

Similar Documents

Publication Publication Date Title
US11444856B2 (en) Systems and methods for configuring a resource for network traffic analysis
CN107562620B (en) Automatic buried point setting method and device
CN106202453B (en) Multimedia resource recommendation method and device
CN105630977B (en) Application program recommended method, apparatus and system
CN104410516B (en) A kind of customer service perceptibility appraisal procedure and device
CN105023165A (en) Method, device and system for controlling release tasks in social networking platform
US11500709B1 (en) Mobile application crash monitoring user interface
CN104462293A (en) Search processing method and method and device for generating search result ranking model
CN108334641B (en) Method, system, electronic equipment and storage medium for collecting user behavior data
US20080270549A1 (en) Extracting link spam using random walks and spam seeds
CN105868256A (en) Method and system for processing user behavior data
CN107578263A (en) A kind of detection method, device and the electronic equipment of advertisement abnormal access
CN105824834A (en) Search traffic cheating behavior identification method and apparatus
CN106326297B (en) Application program recommendation method and device
CN110851583A (en) Novel recommendation method and device
CN107644100A (en) Information processing method, device and system and computer-readable recording medium
WO2013134300A1 (en) Method and apparatus of determining redirection quality, and method and apparatus of placing promotion information
CN106569860A (en) Application management method and terminal
CN113407773A (en) Short video intelligent recommendation method and system, electronic device and storage medium
CN103761228A (en) Ranking threshold determination method and ranking threshold determination system for application program
CN104111970A (en) Method and device for counting page average residence time and method and device for determining page user viscosity
CN109150700A (en) A kind of method and device of data acquisition
CN106789277B (en) User behavior determination method and device based on state machine model
CN111444447A (en) Content recommendation page display method and device
CN105450460B (en) Network operation recording method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170201

RJ01 Rejection of invention patent application after publication