CN105205107A - Internet of Things data similarity processing method - Google Patents
Internet of Things data similarity processing method Download PDFInfo
- Publication number
- CN105205107A CN105205107A CN201510535354.0A CN201510535354A CN105205107A CN 105205107 A CN105205107 A CN 105205107A CN 201510535354 A CN201510535354 A CN 201510535354A CN 105205107 A CN105205107 A CN 105205107A
- Authority
- CN
- China
- Prior art keywords
- attribute
- product
- array
- product record
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title description 4
- 238000004519 manufacturing process Methods 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000000034 method Methods 0.000 abstract description 12
- 239000000047 product Substances 0.000 description 58
- 238000003491 array Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000010354 integration Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种物联网数据相似度处理方法,包括以下步骤:获取多条产品记录,选出具有多个相同属性的第一产品记录和第二产品记录;将第一产品记录的属性保存在第一数组中,将第二产品记录的属性保存在第二数组中;对第一产品记录和第二产品记录的各属性分别按相应的属性函数计算相应的属性相似度值;根据第一产品记录和第二产品记录各属性的重要程度、并通过权重函数计算各属性的权重值;结合属性相似度值的第三数组和权重值的第四数组,通过整体相似度函数计算第一产品记录和第二产品记录的整体相似度。本申请将具有相同属性的两条产品记录按照各自的属性相似度和属性权重值进行整体相似度的计算,其处理速度快、可以节省大量的时间成本。
The present invention provides a method for processing data similarity of the Internet of Things, comprising the following steps: obtaining multiple product records, selecting a first product record and a second product record with multiple identical attributes; storing the attributes of the first product record in In the first array, the attributes of the second product record are stored in the second array; each attribute of the first product record and the second product record is respectively calculated according to the corresponding attribute function; according to the first product Record and the second product record the importance of each attribute, and calculate the weight value of each attribute through the weight function; combine the third array of attribute similarity values and the fourth array of weight values, and calculate the first product record through the overall similarity function The overall similarity to the second product record. This application calculates the overall similarity of two product records with the same attribute according to their respective attribute similarity and attribute weight value, which has a fast processing speed and can save a lot of time and cost.
Description
技术领域technical field
本发明涉及数据处理领域,特别是涉及一种物联网数据相似度处理方法。The invention relates to the field of data processing, in particular to a method for processing data similarity of the Internet of Things.
背景技术Background technique
自Internet出现以来,互联网上的WEB页面的数量就飞速增长着,也恰是由于其这种增长速度,形成了世界上最大的信息资源库。WEB信息整合技术就是对这一个信息资源库进行有效处理,整合相关信息,为数据挖掘提供数据方面的支撑,以便更好地应用于专业领域中的信息服务。在当前飞速发展的网络时代,信息资源日益丰富,WEB信息整合已成为信息时代的重要内容,在多个领域中都有WEB信息整合的应用。Since the emergence of the Internet, the number of WEB pages on the Internet has grown rapidly, and it is precisely because of this growth rate that the world's largest information resource library has been formed. WEB information integration technology is to effectively process this information resource library, integrate relevant information, and provide data support for data mining, so that it can be better applied to information services in professional fields. In the current rapidly developing network age, information resources are increasingly abundant, and WEB information integration has become an important content in the information age, and WEB information integration has applications in many fields.
如在物联网领域中,产品供货商可以通过多个WEB交易平台发布产品信息,而买家可以从WEB交易平台中获取信息,并通过产品供货商所发布的信息可以联系到产品供货商进行购买;在这一过程中,就涉及到大量数据的处理。但是,由于每个WEB交易平台对信息的表述方式不尽相同,从而给信息整合带来了一定的困难。另外,同一个产品供货商去不同的WEB交易平台发布同一个产品可能会出现不同的表现形式,其会造成这些WEB产品交易平台上使用数据爬虫获取数据,进而会产生很多重复数据,因此,针对来自不同WEB数据源的、表述形式不一样的产品数据进行重复数据的清洗是非常有必要的,其是通过机器判断是否有重复数据的重要保障。For example, in the field of the Internet of Things, product suppliers can publish product information through multiple WEB trading platforms, and buyers can obtain information from WEB trading platforms, and can contact product suppliers through the information released by product suppliers. In this process, it involves the processing of a large amount of data. However, because each WEB trading platform expresses information in different ways, it brings certain difficulties to information integration. In addition, when the same product supplier releases the same product on different WEB trading platforms, it may appear in different forms, which will cause the use of data crawlers to obtain data on these WEB product trading platforms, which will generate a lot of duplicate data. Therefore, It is very necessary to clean the duplicate data of product data from different WEB data sources and with different expressions, which is an important guarantee for judging whether there is duplicate data through the machine.
产品数据的清洗过程中,最主要的是清除产品多条记录中的相似重复记录,以保证建立一个全面、准确、专业、符合数据质量条件的产品数据库;此时,就需要对多条记录进行相似度计算。目前,数据相似度的计算主要是通过一一比对来实现的,其运算速度非常慢,消耗大量的时间成本。In the cleaning process of product data, the most important thing is to clear similar duplicate records in multiple product records, so as to ensure the establishment of a comprehensive, accurate, professional, and data quality-compliant product database; at this time, multiple records need to be Similarity calculation. At present, the calculation of data similarity is mainly realized by one-to-one comparison, which is very slow and consumes a lot of time.
发明内容Contents of the invention
鉴于以上所述现有技术的缺陷和各种不足之处,本发明要解决的技术问题在于提供一种能够节省大量时间成本的物联网数据相似度处理方法。In view of the defects and various deficiencies of the prior art described above, the technical problem to be solved by the present invention is to provide a method for processing similarity of IoT data that can save a lot of time and cost.
为实现上述目的,本发明提供一种物联网数据相似度处理方法,包括以下步骤:In order to achieve the above object, the present invention provides a method for processing similarity of Internet of Things data, comprising the following steps:
S1、从WEB交易平台中获取多条产品记录,选出具有多个相同属性的两条产品记录,分别为第一产品记录和第二产品记录;S1. Obtain multiple product records from the WEB trading platform, and select two product records with multiple identical attributes, namely the first product record and the second product record;
S2、将第一产品记录的属性保存在第一数组中,将第二产品记录的属性保存在第二数组中;S2. Store the attributes of the first product record in the first array, and store the attributes of the second product record in the second array;
S3、对第一产品记录和第二产品记录的各属性分别按相应的属性函数计算相应的属性相似度值,并将多个属性的属性相似度值保存在第三数组中;S3. Calculate corresponding attribute similarity values for each attribute of the first product record and the second product record according to the corresponding attribute function, and store the attribute similarity values of multiple attributes in the third array;
S4、根据第一产品记录和第二产品记录各属性的重要程度、并通过权重函数计算各属性的权重值,并将多个属性的权重值保存在第四数组中;S4. According to the importance of each attribute of the first product record and the second product record, and calculate the weight value of each attribute through a weight function, and store the weight values of multiple attributes in the fourth array;
S5、结合属性相似度值的第三数组和权重值的第四数组,通过整体相似度函数计算第一产品记录和第二产品记录的整体相似度。S5. Combining the third array of attribute similarity values and the fourth array of weight values, calculate the overall similarity between the first product record and the second product record through the overall similarity function.
进一步地,所述步骤S3中,属性函数包括产品别称匹配策略函数、产品价格转换匹配策略函数、规范化日期匹配策略函数、规范化产地匹配策略函数和编辑距离算法函数。Further, in the step S3, the attribute functions include a product nickname matching strategy function, a product price conversion matching strategy function, a normalized date matching strategy function, a normalized origin matching strategy function and an edit distance algorithm function.
优选地,所述步骤S2中,第一产品记录的属性按照产品名称、价格、生产日期、产地的顺序先后放入多个第一属性数组中,多个第一属性数组构成所述第一数组。Preferably, in the step S2, the attributes of the first product record are successively put into multiple first attribute arrays in the order of product name, price, production date, and place of production, and multiple first attribute arrays constitute the first array .
优选地,所述步骤S2中,第二产品记录的属性按照产品名称、价格、生产日期、产地的顺序先后放入多个第二属性数组中,多个第二属性数组构成所述第二数组。Preferably, in the step S2, the attributes of the second product record are successively put into multiple second attribute arrays in the order of product name, price, production date, and place of production, and multiple second attribute arrays constitute the second array .
本发明涉及的一种物联网数据相似度处理方法具有以下有益效果:A kind of Internet of things data similarity processing method that the present invention relates to has following beneficial effect:
本申请将具有相同属性的两条产品记录按照各自的属性相似度和属性权重值进行整体相似度的计算,其处理速度快,计算精度高,从而可以节省大量的时间成本。This application calculates the overall similarity of two product records with the same attribute according to their respective attribute similarity and attribute weight value. The processing speed is fast and the calculation accuracy is high, thereby saving a lot of time and cost.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,并可依照说明书的内容予以实施,以下以本发明的较佳实施例并配合附图对本专利进行详细说明。The above description is only an overview of the technical solution of the present invention. In order to understand the technical means of the present invention more clearly and implement it according to the contents of the specification, the patent will be described in detail below with preferred embodiments of the present invention and accompanying drawings.
附图说明Description of drawings
图1为本申请的流程图。Fig. 1 is the flow chart of this application.
图2为本申请中产品别称匹配策略函数的流程图。FIG. 2 is a flow chart of the product alias matching strategy function in this application.
图3为本申请中产品价格转换匹配策略函数的流程图。Fig. 3 is a flow chart of the product price conversion matching strategy function in this application.
图4为本申请中规范化日期匹配策略函数的流程图。Fig. 4 is a flow chart of the normalized date matching strategy function in this application.
图5为本申请中规范化产地匹配策略函数的流程图。Fig. 5 is a flow chart of the standardized origin matching strategy function in this application.
具体实施方式detailed description
下面结合附图对本发明的优选实施例进行详细介绍。Preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings.
如图1所示,本发明提供一种数据相似度处理方法,包括以下步骤:As shown in Figure 1, the present invention provides a kind of data similarity processing method, comprises the following steps:
S1、从WEB交易平台中获取多条产品记录,选出具有多个相同属性的两条产品记录,分别为第一产品记录A和第二产品记录B。S1. Obtain multiple product records from the WEB trading platform, and select two product records with multiple identical attributes, namely the first product record A and the second product record B.
S2、将第一产品记录A的属性保存在第一数组a[]中,将第二产品记录B的属性保存在第二数组b[]中。S2. Store the attributes of the first product record A in the first array a[], and store the attributes of the second product record B in the second array b[].
第一产品记录A和第二产品记录B都具有n个属性,故第一数组a[]由n个第一属性数组a[0]、a[1]、a[2]、a[3]、a[4]~a[n]构成,第二数组b]由n个第二属性数组b[0]、b[1]、b[2]、b[3]、b[4]~b[n]构成。同时,第一产品记录A的多个属性按照产品名称、价格、生产日期、产地的顺序先后依次保存在第一属性数组中a[0]、a[1]、a[2]、a[3]中,而第一属性数组a[4]~a[n]用于保存第一产品记录A的其他次要属性;同理,第二产品记录B的多个属性按照产品名称、价格、生产日期、产地的顺序先后依次保存在第一属性数组中b[0]、b[1]、b[2]、b[3]中,而第二属性数组b[4]~b[n]用于保存第二产品记录B的其他次要属性。Both the first product record A and the second product record B have n attributes, so the first array a[] consists of n first attribute arrays a[0], a[1], a[2], a[3] , a[4]~a[n], the second array b] consists of n second attribute arrays b[0], b[1], b[2], b[3], b[4]~b [n] constitute. At the same time, multiple attributes of the first product record A are successively stored in the first attribute array a[0], a[1], a[2], a[3] according to the order of product name, price, production date and place of origin ], and the first attribute array a[4]~a[n] is used to save other secondary attributes of the first product record A; similarly, multiple attributes of the second product record B are classified according to product name, price, production The order of date and place of origin is successively stored in b[0], b[1], b[2], b[3] in the first attribute array, while the second attribute array b[4]~b[n] is used Other secondary attributes for saving the second product record B.
S3、对第一产品记录A和第二产品记录B的各属性分别按相应的属性函数计算相应的属性相似度值,并将多个属性的属性相似度值保存在第三数组c[]中,该第三数组c[]为double型数组。S3. For each attribute of the first product record A and the second product record B, calculate the corresponding attribute similarity value according to the corresponding attribute function, and save the attribute similarity values of multiple attributes in the third array c[] , the third array c[] is an array of double type.
所述步骤S3中,属性函数包括产品别称匹配策略函数Strategy_Name()、产品价格转换匹配策略函数Strategy_Price()、规范化日期匹配策略函数Strategy_Date()、规范化产地匹配策略函数Strategy_Origin()和编辑距离算法函数Edit_Distance()。In the step S3, the attribute function includes a product nickname matching strategy function Strategy_Name(), a product price conversion matching strategy function Strategy_Price(), a standardized date matching strategy function Strategy_Date(), a standardized origin matching strategy function Strategy_Origin() and an edit distance algorithm function Edit_Distance().
S4、根据第一产品记录A和第二产品记录B各属性的重要程度、并通过权重函数Weight()计算各属性的权重值,并将多个属性的权重值保存在第四数组w[]中,该第四数组w[]为double型数组。S4, according to the importance of each attribute of the first product record A and the second product record B, and calculate the weight value of each attribute through the weight function Weight (), and save the weight values of multiple attributes in the fourth array w[] Among them, the fourth array w[] is an array of double type.
S5、结合属性相似度值的第三数组c[]和权重值的第四数组w[],通过整体相似度函数Sim()计算第一产品记录A和第二产品记录B的整体相似度Sim(A、B)。S5. Combining the third array c[] of the attribute similarity value and the fourth array w[] of the weight value, calculate the overall similarity Sim of the first product record A and the second product record B through the overall similarity function Sim() (A, B).
本申请将具有相同属性的两条产品记录按照各自的属性相似度和属性权重值进行整体相似度的计算,其处理速度快,计算精度高,从而可以节省大量的时间成本。所以,本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。This application calculates the overall similarity of two product records with the same attribute according to their respective attribute similarity and attribute weight value. The processing speed is fast and the calculation accuracy is high, thereby saving a lot of time and cost. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial application value.
进一步地,如图2所示,所述产品别称匹配策略函数Strategy_Name()包括以下步骤:Further, as shown in Figure 2, the product alias matching strategy function Strategy_Name () includes the following steps:
N1、从文档中选取一组数据,放到集合S中;N1. Select a set of data from the document and put it into the set S;
N2、从集合的第一个元素开始,每一个元素都用C++STL中的map容器保存起来,与第一个元素形成映射;N2. Starting from the first element of the collection, each element is stored in a map container in C++STL to form a mapping with the first element;
N3、对于记录A、B的农产品名称这个属性值,在map容器中找到对应的映射值,对它们进行替换;N3. For the attribute value of the agricultural product name of records A and B, find the corresponding mapping value in the map container, and replace them;
N4、对替换后的农产品名称进行比较,完全相等则两者相似度Sim(Ak,Bk)=1,否则Sim(Ak,Bk)=0。N4. Comparing the replaced agricultural product names, if they are completely equal, the similarity Sim(Ak, Bk)=1, otherwise Sim(Ak,Bk)=0.
优选地,如图3所示,所述产品价格转换匹配策略函数Strategy_Price()包括以下步骤:Preferably, as shown in Figure 3, the product price conversion matching strategy function Strategy_Price() includes the following steps:
P1、先定义一个map实体:map<string,double>price;P1. First define a map entity: map<string,double>price;
P2、执行以下语句使单位与换算值对应起来:P2. Execute the following statement to match the unit with the converted value:
price["元/公斤"]=1;price["yuan/kg"]=1;
price["元/斤"]=2;price["yuan/jin"]=2;
price["元/千克"]=1;price["yuan/kg"]=1;
price["元/1000克"]=1;price["yuan/1000g"]=1;
price["元/500克"]=2;price["yuan/500g"]=2;
price["元/100克"]=10;price["yuan/100g"]=10;
price["元/克"]=1000;price["Yuan/gram"]=1000;
price["元/吨"]=0.001。price["yuan/ton"]=0.001.
表示的意思就是如果“x元/公斤”要转化为单位“元/千克”的话,就要用x乘以1,如果“x元/斤”要转化为单位“元/千克”的话,就要用x乘以2,其它依此类推;The meaning is that if "x yuan/kg" is to be converted into the unit "yuan/kg", x must be multiplied by 1, and if "x yuan/jin" is to be converted into the unit "yuan/kg", it must be Multiply x by 2, and so on;
P3、对于记录A的价格属性值Ak,先把价格的数值和单位分割开来,分割的方法是从字符串的第一位开始,依次往后面搜索,直到找到第一个不属于‘0’到‘9’之间,并且不是‘.’的字符p[i]为止,此时p[0]到p[i]这部分是价格的数值,把它们保存在字符串a中,剩下的一部分就是单位,把它们保存在字符串b中;P3. For the price attribute value Ak of record A, first separate the value and unit of the price. The method of segmentation is to start from the first character of the string and search backwards until the first one that does not belong to '0' is found. Between '9' and not the character p[i] of '.', at this time, the part from p[0] to p[i] is the value of the price, save them in the string a, and the rest One part is the unit, save them in the string b;
P4、使用atof()函数把字符串a转化为double型数值,保存在double型变量c1中;P4. Use the atof() function to convert the string a into a double value, and store it in the double variable c1;
P5、执行c1*=price[b]语句,使c1乘以单位b的换算值,并保存在c1当中,此时c1就是转换后的输入的价格的数值;P5. Execute the c1*=price[b] statement to multiply c1 by the conversion value of unit b and save it in c1, at this time c1 is the value of the converted input price;
P6、对记录B使用同样的方法得到价格属性值Bk的最终转换后的数值c2;P6. Use the same method for record B to obtain the final converted value c2 of the price attribute value Bk;
P7、判断c1-c2<=0.000001的值是否为真来确定输入的两个价格是否一样。如果为真Sim(Ak,Bk)=1,否则Sim(Ak,Bk)=0。P7. Determine whether the value of c1-c2<=0.000001 is true to determine whether the two input prices are the same. Sim(Ak,Bk)=1 if true, Sim(Ak,Bk)=0 otherwise.
进一步地,如图4所示,所述规范化日期匹配策略函数Strategy_Date()包括以下步骤:Further, as shown in Figure 4, the normalized date matching strategy function Strategy_Date () includes the following steps:
D1、从r1的第一个字符开始依次往后面搜索,如果找到一个不属于'0'-'9'之间的字符r1[i],那么它就是第一个分隔符,把它转化为'/',即r1[i]='/',这时从第一个字符到第i-1个字符就是年份;D1. Search from the first character of r1 to the back, if you find a character r1[i] that does not belong to '0'-'9', then it is the first separator, convert it to' /', that is, r1[i]='/', at this time, the year is from the first character to the i-1th character;
D2、如果r1[i+1]不为字符'0',那么直接进入步骤3;如果r1[i+1]为字符'0',那么从i+2位置开始到字符串的最后把它们全部前移一位,即r1[i+1、i+2...]=r1[i+2、i+3....];D2. If r1[i+1] is not the character '0', then go directly to step 3; if r1[i+1] is the character '0', then start from the i+2 position to the end of the string and put them all Move forward one bit, that is, r1[i+1, i+2...]=r1[i+2, i+3....];
D3、把i+1的值保存到j当中,从第i+1个字符开始,依次往后面搜索,直到找到一个不属于'0'-'9'之间的字符r1[i]为止,那么它就是第二个分隔符,把它转化为'/',即r1[i]='/',这时从第j个字符到第i-1个字符就是月份,并且是已经去掉了前缀0的月份;D3. Save the value of i+1 into j, start from the i+1th character, and search backwards one by one until a character r1[i] that does not belong to '0'-'9' is found, then It is the second separator, convert it to '/', that is, r1[i]='/', at this time, the month from the jth character to the i-1th character is the month, and the prefix 0 has been removed the month of
D4、重复步骤D2来消除日期号当中的前缀0;到这里,第一个输入的日期字符串r1就完成了分隔符的转化以及前缀0的消除;D4, repeat step D2 to eliminate the prefix 0 in the date number; here, the first input date string r1 has completed the transformation of the separator and the elimination of the prefix 0;
D5、对于输入日期字符串r2,然后用上面同样的方法完成对r2的处理;处理完后,利用公式
进一步地,如图5所示,所述规范化产地匹配策略函数Strategy_Origin()包括以下步骤:Further, as shown in Figure 5, the standardized origin matching strategy function Strategy_Origin() includes the following steps:
O1、创建集合Sprov、Scity、Scoun分别保存所有省级行政区划、市级行政区划以及县级行政区划;O1. Create collections Sprov, Scity, and Scoun to save all provincial administrative divisions, city administrative divisions and county administrative divisions respectively;
O2、把记录A的产地属性值进行中文分词,分好后的词放到集合Sprov、Scity、Scoun中检索其属于哪一级别的行政区划,以区分省、市、县,然后让记录A的省、市、县分别保存在Aprov、Acity、Acoun中,对其中缺失的行政区划级别赋空值NULL。对记录B的产地属性值做同样的处理,使B的省、市、县分别保存在Bprov、Bcity、Bcoun中;O2. Segment the origin attribute value of record A into Chinese words, put the divided words into the sets Sprov, Scity, and Scoun to retrieve which level of administrative division it belongs to, to distinguish provinces, cities, and counties, and then let the records of A Provinces, cities, and counties are stored in Aprov, Acity, and Acoun respectively, and NULL is assigned to the missing administrative division level. Do the same process for the origin attribute value of record B, so that the province, city, and county of B are stored in Bprov, Bcity, and Bcoun respectively;
O3、把缺失的行政区划级别补充完整。利用行政区划的特征从下至上补全缺失的行政区划级别,对于不可以补全的部分,不做处理。O3. Complete the missing administrative division levels. Use the characteristics of administrative divisions to complete the missing administrative division levels from bottom to top, and do not deal with the parts that cannot be completed.
以上对本发明实施例所提供的一种一种数据相似度处理方法进行了详细介绍,对于本领域的一般技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制,凡依本发明设计思想所做的任何改变都在本发明的保护范围之内。The data similarity processing method provided by the embodiment of the present invention has been introduced in detail above. For those of ordinary skill in the art, according to the idea of the embodiment of the present invention, there will be changes in the specific implementation and application scope. In summary, the contents of this specification should not be construed as limiting the present invention, and any changes made according to the design concept of the present invention are within the scope of protection of the present invention.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510535354.0A CN105205107A (en) | 2015-08-27 | 2015-08-27 | Internet of Things data similarity processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510535354.0A CN105205107A (en) | 2015-08-27 | 2015-08-27 | Internet of Things data similarity processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105205107A true CN105205107A (en) | 2015-12-30 |
Family
ID=54952791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510535354.0A Pending CN105205107A (en) | 2015-08-27 | 2015-08-27 | Internet of Things data similarity processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105205107A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193860A (en) * | 2017-03-31 | 2017-09-22 | 苏州艾隆信息技术有限公司 | Medicine information multidimensional identification method and system |
CN109614614A (en) * | 2018-12-03 | 2019-04-12 | 焦点科技股份有限公司 | A Self-Attention-Based BILSTM-CRF Product Name Recognition Method |
CN111898035A (en) * | 2020-06-19 | 2020-11-06 | 深圳奇迹智慧网络有限公司 | Data processing strategy configuration method and device based on Internet of things and computer equipment |
CN113946722A (en) * | 2021-10-22 | 2022-01-18 | 北京钢研新材科技有限公司 | Intelligent welding material matching method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080027932A1 (en) * | 2006-07-27 | 2008-01-31 | International Business Machines Corporation | Apparatus of generating browsing paths for data and method for browsing data |
CN101286156A (en) * | 2007-05-29 | 2008-10-15 | 北大方正集团有限公司 | A Method for Deduplicating Objects Based on Metadata |
CN101814082A (en) * | 2010-01-20 | 2010-08-25 | 中国人民解放军总参谋部第六十三研究所 | Method for automatic feature weighting and selection in detection of similar and duplicate record based on ant colony optimization |
CN102456203A (en) * | 2010-10-22 | 2012-05-16 | 阿里巴巴集团控股有限公司 | Method for determining candidate product linked list and related device |
CN103455555A (en) * | 2013-08-06 | 2013-12-18 | 北京大学深圳研究生院 | Recommendation method and device based on mobile terminal similarity |
CN104035983A (en) * | 2014-05-29 | 2014-09-10 | 西安理工大学 | Classified variable clustering method based on attribute weight similarity |
CN104615600A (en) * | 2013-11-04 | 2015-05-13 | 深圳中兴力维技术有限公司 | Similar case comparison implementation method and device thereof |
-
2015
- 2015-08-27 CN CN201510535354.0A patent/CN105205107A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080027932A1 (en) * | 2006-07-27 | 2008-01-31 | International Business Machines Corporation | Apparatus of generating browsing paths for data and method for browsing data |
CN101286156A (en) * | 2007-05-29 | 2008-10-15 | 北大方正集团有限公司 | A Method for Deduplicating Objects Based on Metadata |
CN101814082A (en) * | 2010-01-20 | 2010-08-25 | 中国人民解放军总参谋部第六十三研究所 | Method for automatic feature weighting and selection in detection of similar and duplicate record based on ant colony optimization |
CN102456203A (en) * | 2010-10-22 | 2012-05-16 | 阿里巴巴集团控股有限公司 | Method for determining candidate product linked list and related device |
CN103455555A (en) * | 2013-08-06 | 2013-12-18 | 北京大学深圳研究生院 | Recommendation method and device based on mobile terminal similarity |
CN104615600A (en) * | 2013-11-04 | 2015-05-13 | 深圳中兴力维技术有限公司 | Similar case comparison implementation method and device thereof |
CN104035983A (en) * | 2014-05-29 | 2014-09-10 | 西安理工大学 | Classified variable clustering method based on attribute weight similarity |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193860A (en) * | 2017-03-31 | 2017-09-22 | 苏州艾隆信息技术有限公司 | Medicine information multidimensional identification method and system |
CN107193860B (en) * | 2017-03-31 | 2021-03-02 | 苏州艾隆信息技术有限公司 | Medicine information multidimensional identification method and system |
CN109614614A (en) * | 2018-12-03 | 2019-04-12 | 焦点科技股份有限公司 | A Self-Attention-Based BILSTM-CRF Product Name Recognition Method |
CN109614614B (en) * | 2018-12-03 | 2021-04-02 | 焦点科技股份有限公司 | BILSTM-CRF product name identification method based on self-attention |
CN111898035A (en) * | 2020-06-19 | 2020-11-06 | 深圳奇迹智慧网络有限公司 | Data processing strategy configuration method and device based on Internet of things and computer equipment |
CN111898035B (en) * | 2020-06-19 | 2023-10-31 | 深圳奇迹智慧网络有限公司 | Data processing strategy configuration method and device based on Internet of things and computer equipment |
CN113946722A (en) * | 2021-10-22 | 2022-01-18 | 北京钢研新材科技有限公司 | Intelligent welding material matching method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103761080B (en) | Structured query language (SQL) based MapReduce operation generating method and system | |
CN103646032B (en) | A kind of based on body with the data base query method of limited natural language processing | |
CN102456050B (en) | Method and device for extracting data from webpage | |
CN110472068A (en) | Big data processing method, equipment and medium based on heterogeneous distributed knowledge mapping | |
CN106570148A (en) | Convolutional neutral network-based attribute extraction method | |
CN103440287B (en) | A kind of Web question and answer searching system based on product information structure | |
CN102426582B (en) | Data manipulation management devices and data manipulation management method | |
Morozov et al. | Distributed contour trees | |
CN104866593A (en) | Database searching method based on knowledge graph | |
CN101699444B (en) | Ontology Construction Method of Remote Sensing Information Processing Service Classification Based on Formal Concept Analysis | |
Liang et al. | Express supervision system based on NodeJS and MongoDB | |
CN103793422A (en) | Methods for generating cube metadata and query statements on basis of enhanced star schema | |
CN104462227A (en) | Automatic construction method of graphic knowledge genealogy | |
CN105630881A (en) | Data storage method and query method for RDF (Resource Description Framework) | |
CN105138526A (en) | Method and system used for automatically generating semantic mapping for relational databases | |
CN101706813B (en) | Map symbol library management system and method based on self-adaptation mechanism | |
CN106021523B (en) | Data warehouse storage and query method based on JSON | |
CN105205107A (en) | Internet of Things data similarity processing method | |
CN107491476B (en) | Data model conversion and query analysis method suitable for various big data management systems | |
CN102799627B (en) | Data association method based on first-order logic and nerve network | |
CN105824855A (en) | Method and device for screening and classifying data objects and electronic equipment | |
CN109635119B (en) | Industrial big data integration system based on ontology fusion | |
CN110389953B (en) | Data storage method, storage medium, storage device and server based on compressed graph | |
CN102508971A (en) | Method for establishing product function model in concept design stage | |
CN102129457A (en) | Method for inquiring large-scale semantic data paths |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151230 |