CN115860769A - Hazardous waste tracing method based on matching degree and cross entropy - Google Patents

Hazardous waste tracing method based on matching degree and cross entropy Download PDF

Info

Publication number
CN115860769A
CN115860769A CN202310139981.7A CN202310139981A CN115860769A CN 115860769 A CN115860769 A CN 115860769A CN 202310139981 A CN202310139981 A CN 202310139981A CN 115860769 A CN115860769 A CN 115860769A
Authority
CN
China
Prior art keywords
waste
similarity
indicators
index
unknown
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310139981.7A
Other languages
Chinese (zh)
Other versions
CN115860769B (en
Inventor
杨玉飞
杨金忠
李雪冰
迭庆杞
黄启飞
王菲
于天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Research Academy of Environmental Sciences
Original Assignee
Chinese Research Academy of Environmental Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Research Academy of Environmental Sciences filed Critical Chinese Research Academy of Environmental Sciences
Priority to CN202310139981.7A priority Critical patent/CN115860769B/en
Publication of CN115860769A publication Critical patent/CN115860769A/en
Application granted granted Critical
Publication of CN115860769B publication Critical patent/CN115860769B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02WCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO WASTEWATER TREATMENT OR WASTE MANAGEMENT
    • Y02W90/00Enabling technologies or technologies with a potential or indirect contribution to greenhouse gas [GHG] emissions mitigation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of data information processing, and particularly relates to a dangerous waste tracing method based on matching degree and cross entropy, which comprises the following steps: s1, constructing a dangerous waste fingerprint characteristic database, wherein the database contains characteristic information of dangerous waste; s2, a user inputs a corresponding retrieval index, and the database matches corresponding characteristic information according to the input retrieval index; s3, after matching, similarity calculation is carried out according to the number of input retrieval indexes; when the number N of the retrieval indexes is 1, calculating the similarity by adopting a single index calculation model; when N is more than or equal to 2, calculating the similarity by adopting a multi-index calculation model; and S4, displaying the tracing result according to the calculated similarity. The method can not only give quantitative matching results, but also reduce the calculated amount, improve the matching efficiency and increase the accuracy of database matching; the method is beneficial to the rapid identification of the waste, realizes the rapid tracing of the hazardous waste, and further assists the subsequent decision.

Description

一种基于匹配度和交叉熵的危险废物溯源方法A hazardous waste tracing method based on matching degree and cross entropy

技术领域Technical Field

本发明属于数据信息处理技术领域,具体涉及一种基于匹配度和交叉熵的危险废物溯源方法。The present invention belongs to the technical field of data information processing, and in particular relates to a hazardous waste tracing method based on matching degree and cross entropy.

背景技术Background Art

随着社会的快速发展,带来的废物垃圾等日益增多,尤其工业生产中产生的危险废物,全世界每年的危险废物产生量高达3亿多吨。危险废物通常具有腐蚀性、毒性、易燃性、反应性或感染性等其中一种或多种危险特性,化工领域中产生的废有机溶剂、蒸馏废液等,农药领域中产生的母液、废盐等,石油领域中产生的浮渣、含油污泥等,有色金属冶炼领域中产生的收尘灰、冶炼废渣等。这些危险废物不仅危害人们健康,对环境也会造成长期破坏,因此,对危险废物的合理处置及科学管理极其重要。With the rapid development of society, waste and garbage are increasing day by day, especially hazardous wastes generated in industrial production. The annual amount of hazardous wastes generated in the world is as high as more than 300 million tons. Hazardous wastes usually have one or more hazardous characteristics such as corrosiveness, toxicity, flammability, reactivity or infectivity. They include waste organic solvents and distillation waste liquids generated in the chemical industry, mother liquor and waste salt generated in the pesticide field, scum and oily sludge generated in the petroleum field, and dust ash and smelting waste residue generated in the non-ferrous metal smelting field. These hazardous wastes not only endanger people's health, but also cause long-term damage to the environment. Therefore, the reasonable disposal and scientific management of hazardous wastes are extremely important.

近年来频发的固废特别是危废倾倒事件中,对固废的快速且准确的溯源及特性鉴定是对这类事件罚责认定及安全处置的重大难题。然而,由于缺乏对危废产生源和产生特性匹配的快速定性和精准识别技术,极大地阻碍了对这类备受关注的倾倒事件综合治理方案的及时提出。同时,危废利用处置单位在接收危废时,由于对危废类别准确识别能力的不足,容易导致生产安全事故以及超能力处置危废的风险产生。因而,建立数据库并开发不明固废的溯源技术与方法,对打击固废倾倒,维护生态环境安全以及防控固废利用处置过程的风险意义十分重大。In the frequent solid waste dumping incidents in recent years, especially hazardous waste dumping incidents, the rapid and accurate tracing and characteristic identification of solid waste is a major problem in the determination of penalties and safe disposal of such incidents. However, the lack of rapid qualitative and precise identification technology for matching the source and characteristics of hazardous waste has greatly hindered the timely proposal of comprehensive management plans for such highly concerned dumping incidents. At the same time, when hazardous waste utilization and disposal units receive hazardous waste, due to the lack of accurate identification of hazardous waste categories, it is easy to cause production safety accidents and the risk of over-capacity disposal of hazardous waste. Therefore, establishing a database and developing traceability technologies and methods for unknown solid waste are of great significance to combating solid waste dumping, maintaining ecological and environmental safety, and preventing and controlling risks in the solid waste utilization and disposal process.

此外,对于危险废物的精细化管理技术较为薄弱,因此,亟需提供一种危险废物溯源方法,利用不明废物的相关信息结合危险废物溯源系统,即可得知危险废物的预估类型以及预估特性。In addition, the refined management technology for hazardous waste is relatively weak. Therefore, there is an urgent need to provide a hazardous waste traceability method. By using the relevant information of unknown wastes and combining it with the hazardous waste traceability system, the estimated type and estimated characteristics of the hazardous waste can be obtained.

发明内容Summary of the invention

为了解决现有技术中危险废物溯源困难的技术问题,本发明提供一种基于匹配度和交叉熵的危险废物溯源方法。In order to solve the technical problem of difficulty in tracing the source of hazardous waste in the prior art, the present invention provides a hazardous waste tracing method based on matching degree and cross entropy.

为实现上述目的,本发明的技术方案如下:To achieve the above object, the technical solution of the present invention is as follows:

一种基于匹配度和交叉熵的危险废物溯源方法,包括:A hazardous waste tracing method based on matching degree and cross entropy, comprising:

S1、构建危险废物指纹特征数据库,数据库中包含危险废物的特征信息;S1. Construct a hazardous waste fingerprint feature database, which contains the feature information of hazardous waste;

S2、用户输入相应检索指标,数据库根据输入的检索指标匹配相应的特征信息;S2. The user inputs the corresponding search index, and the database matches the corresponding feature information according to the input search index;

S3、匹配后根据输入的检索指标的数量进行相似度计算;检索指标的数量N为1时,采用单指标计算模型计算相似度;N≥2时,采用多指标计算模型计算相似度;S3. After matching, similarity calculation is performed according to the number of search indicators input; when the number of search indicators N is 1, a single indicator calculation model is used to calculate similarity; when N ≥ 2, a multi-indicator calculation model is used to calculate similarity;

S4、按照计算后的相似度显示溯源结果。S4. Display the traceability result according to the calculated similarity.

进一步地,所述数据库中包括若干行信息和若干列信息,其中,每一行表示一种危险废物,每一列表示一个特征信息。Furthermore, the database includes a plurality of rows of information and a plurality of columns of information, wherein each row represents a type of hazardous waste and each column represents a characteristic information.

更进一步地,所述特征信息包括行业分类、废物类别、物理形态、形状、磁性、气味、颜色、表观形貌、物质组成、特征指标、数值指标。Furthermore, the characteristic information includes industry classification, waste category, physical form, shape, magnetism, odor, color, appearance, material composition, characteristic indicators, and numerical indicators.

进一步地,所述单指标计算模型为:Furthermore, the single indicator calculation model is:

Figure SMS_1
Figure SMS_1

其中,

Figure SMS_2
表示未知废物与数据库中已知废物的相似度,t表示用于输入的检索指标与匹配到的特征信息之间的匹配度。in,
Figure SMS_2
It represents the similarity between the unknown waste and the known waste in the database, and t represents the matching degree between the input retrieval index and the matched feature information.

更进一步地,匹配度t的计算方法为:Furthermore, the calculation method of the matching degree t is:

Figure SMS_3
Figure SMS_3

其中,k的计算方法为:The calculation method of k is:

Figure SMS_4
Figure SMS_4

用户输入的检索指标设定为

Figure SMS_5
,匹配到的指标为
Figure SMS_6
,则
Figure SMS_7
Figure SMS_8
。The search index entered by the user is set as
Figure SMS_5
, the matching index is
Figure SMS_6
,but
Figure SMS_7
,
Figure SMS_8
.

进一步地,采用多指标计算模型计算相似度包括:Furthermore, the similarity is calculated using a multi-index calculation model including:

S301、判断用户输入的检索指标的类型,按照文本型指标和数值型指标进行分类;S301, determining the type of search index input by the user, and classifying the search index into text index and numerical index;

S302、设定文本型指标数量为N1,数值型指标数量为N2;当N1为1时,对文本型指标采用所述单指标计算模型计算相似度,当N1≥2时,对文本型指标采用交叉熵计算模型计算相似度,最后得出文本型指标对应的相似度

Figure SMS_9
;当N2为1时,对数值型指标采用所述单指标计算模型计算相似度,当N2≥2时,对数值型指标采用交叉熵计算模型计算相似度,最后得出数值型指标对应的相似度
Figure SMS_10
;S302, set the number of text indicators to N1, and the number of numerical indicators to N2; when N1 is 1, the single indicator calculation model is used to calculate the similarity of the text indicators; when N1≥2, the cross entropy calculation model is used to calculate the similarity of the text indicators, and finally the similarity corresponding to the text indicators is obtained.
Figure SMS_9
; When N2 is 1, the single indicator calculation model is used to calculate the similarity of the numerical indicator. When N2 ≥ 2, the cross entropy calculation model is used to calculate the similarity of the numerical indicator. Finally, the similarity corresponding to the numerical indicator is obtained.
Figure SMS_10
;

S303、选取相似度

Figure SMS_11
Figure SMS_12
中较大值,作为未知废物与匹配的已知废物的相似度。S303, select similarity
Figure SMS_11
and
Figure SMS_12
The larger value is taken as the similarity between the unknown waste and the matching known waste.

更进一步地,检索指标为多个数值型指标时,将输入的未知废物的数值型指标构成一个集合Y=(y1,y2, y3…yn),将匹配到的已知废物的指标构成一个集合X=(x1,x2, x3…xn),分别计算两个数据集的概率分布为q(y)=(q1,q2, q3…qn)和p(x)=(p1,p2, p3…pn),Furthermore, when the retrieval index is a plurality of numerical indexes, the numerical indexes of the input unknown waste are constituted into a set Y=(y 1 ,y 2 , y 3 …y n ), and the indexes of the matched known waste are constituted into a set X=(x 1 ,x 2 , x 3 …x n ), and the probability distributions of the two data sets are calculated as q(y)=(q 1 ,q 2 ,q 3 …q n ) and p(x)=(p 1 ,p 2 ,p 3 …p n ), respectively.

计算未知废物和已知废物的交叉熵计算概率为

Figure SMS_13
:The cross entropy probability of calculating unknown waste and known waste is calculated as
Figure SMS_13
:

Figure SMS_14
Figure SMS_14

Figure SMS_15
Figure SMS_15

其中:i=1,2……n;Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

Figure SMS_16
:The probability of calculating the distribution entropy of known waste indicators is
Figure SMS_16
:

Figure SMS_17
Figure SMS_17

Figure SMS_18
Figure SMS_18

则,未知废物与已知废物的相似度

Figure SMS_19
为:Then, the similarity between the unknown waste and the known waste is
Figure SMS_19
for:

Figure SMS_20
Figure SMS_20
.

更进一步地,检索指标为多个文本型指标时,对文本型指标进行赋值,Furthermore, when the search index is multiple text indexes, the text indexes are assigned values.

将赋值后的未知废物的文本型指标构成一个集合B=(b1,b2, b3…bn),将匹配到的已知废物的指标进行赋值转换构成一个集合A=(a1,a2, a3…an),分别计算两个数据集的概率分布为r(b)=(r1,r2, r3…rn)和s(a)=(s1,s2, s3…sn),The text indicators of unknown waste after assignment are formed into a set B=(b 1 ,b 2 ,b 3 …b n ), and the indicators of matched known waste are assigned and transformed to form a set A=(a 1 ,a 2 ,a 3 …a n ). The probability distribution of the two data sets is calculated as r(b)=(r 1 ,r 2 ,r 3 …r n ) and s(a)=(s 1 ,s 2 ,s 3 …s n ), respectively.

计算未知废物和已知废物的交叉熵计算概率为

Figure SMS_21
:The cross entropy probability of calculating unknown waste and known waste is calculated as
Figure SMS_21
:

Figure SMS_22
Figure SMS_22

Figure SMS_23
Figure SMS_23

其中:i=1,2……n;Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

Figure SMS_24
:The probability of calculating the distribution entropy of known waste indicators is
Figure SMS_24
:

Figure SMS_25
Figure SMS_25

Figure SMS_26
Figure SMS_26

则,未知废物与已知废物的相似度

Figure SMS_27
为:Then, the similarity between the unknown waste and the known waste is
Figure SMS_27
for:

Figure SMS_28
Figure SMS_28
.

相对于现有技术,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

本发明通过构建合理的危险废物数据库,数据库中包含危险废物的各种特征信息,用户输入相应的检索条件,数据库根据检索指标匹配不同的模型计算相似度,不仅能够给出量化的匹配结果,还能够减少计算量,提高匹配效率,增加数据库匹配的准确率;有利于废物的快速鉴别,实现危险废物的快速溯源,进而辅助后续的决策。The present invention constructs a reasonable hazardous waste database, which contains various characteristic information of hazardous waste. Users input corresponding search conditions, and the database matches different models according to the search indicators to calculate similarities. It can not only provide quantitative matching results, but also reduce the amount of calculation, improve matching efficiency, and increase the accuracy of database matching; it is conducive to the rapid identification of wastes, the rapid tracing of hazardous wastes, and further assisting subsequent decision-making.

本发明通过相似度表征未知废物与废物数据库的匹配结果,直观、有效。The present invention characterizes the matching results of unknown waste and waste database by similarity, which is intuitive and effective.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的溯源流程示意图。FIG1 is a schematic diagram of the traceability process of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合附图说明对本发明的技术方案进行清楚的描述,显然,所描述的实施例并不是本发明的全部实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明的保护范围。The technical solution of the present invention will be clearly described below in conjunction with the accompanying drawings. Obviously, the described embodiments are not all embodiments of the present invention, and all other embodiments obtained by ordinary technicians in the field without making creative work are within the protection scope of the present invention.

本发明提供一种基于匹配度和交叉熵的危险废物溯源方法,包括:The present invention provides a hazardous waste tracing method based on matching degree and cross entropy, comprising:

S1、构建危险废物指纹特征数据库,数据库中包含危险废物的特征信息;所述数据库中包括1554行信息和193列信息,其中,每一行表示一种危险废物,即数据库中包含1554种危险废物,每一列表示一个特征信息。所述特征信息包括行业分类、废物类别、物理形态、形状、磁性、气味、颜色、表观形貌、物质组成、特征指标、数值指标。数值指标包括重金属含量等,特征信息还包括数据来源、废物类型、固废名称、产生环节、废物描述等。S1. Construct a hazardous waste fingerprint feature database, which contains the feature information of hazardous waste; the database includes 1554 rows of information and 193 columns of information, wherein each row represents a type of hazardous waste, that is, the database contains 1554 types of hazardous waste, and each column represents a feature information. The feature information includes industry classification, waste category, physical form, shape, magnetism, odor, color, appearance, material composition, feature index, and numerical index. The numerical index includes heavy metal content, etc. The feature information also includes data source, waste type, solid waste name, generation link, waste description, etc.

S2、用户输入相应检索指标,数据库根据输入的检索指标匹配相应的特征信息;S2. The user inputs the corresponding search index, and the database matches the corresponding feature information according to the input search index;

用户可以直接输入相应的危险废物名称,根据废物名称字串与数据库中的危险废物名称进行匹配,字串重叠最多的即为检索结果,显示在前端页面,供用户参考。Users can directly enter the corresponding hazardous waste name, and match the waste name string with the hazardous waste name in the database. The one with the most string overlap is the search result and is displayed on the front page for user reference.

当危险废物的名称、种类不确定时,可以利用现有技术获取危险废物的容易获取的特征信息,作为检索指标进行检索,以预估危险废物最可能的废物名称和种类。When the name and type of hazardous waste are uncertain, existing technology can be used to obtain easily accessible characteristic information of the hazardous waste and use it as a search indicator to estimate the most likely name and type of the hazardous waste.

S3、匹配后根据输入的检索指标的数量进行相似度计算;检索指标的数量N为1时,采用单指标计算模型计算相似度;N≥2时,采用多指标计算模型计算相似度;如图1所示为相似度计算的流程示意图。S3. After matching, similarity calculation is performed based on the number of search indicators input; when the number of search indicators N is 1, a single indicator calculation model is used to calculate similarity; when N ≥ 2, a multi-indicator calculation model is used to calculate similarity; FIG1 is a schematic diagram of the similarity calculation process.

所述单指标计算模型为:The single indicator calculation model is:

Figure SMS_29
Figure SMS_29

其中,

Figure SMS_30
表示未知废物与数据库中已知废物的相似度,t表示用于输入的检索指标与匹配到的特征信息之间的匹配度。in,
Figure SMS_30
It represents the similarity between the unknown waste and the known waste in the database, and t represents the matching degree between the input retrieval index and the matched feature information.

匹配度t的计算方法为:The calculation method of matching degree t is:

Figure SMS_31
Figure SMS_31

其中,k的计算方法为:The calculation method of k is:

Figure SMS_32
Figure SMS_32

用户输入的检索指标设定为

Figure SMS_33
,匹配到的指标为
Figure SMS_34
,则
Figure SMS_35
Figure SMS_36
。The search index entered by the user is set as
Figure SMS_33
, the matching index is
Figure SMS_34
,but
Figure SMS_35
,
Figure SMS_36
.

其中,如果输入的单指标属于文本型指标时,无法直接用于相似度的计算,因此,需要将文本型指标进行赋值转化,当文本类型为物理形态、磁性、气味、颜色时,按照表1进行赋值转化,赋值转化后再进行相似度。表1中的颜色指标,可以根据实际需要增加其他不同颜色的赋值数据,在此不做赘述。Among them, if the input single indicator belongs to a text-type indicator, it cannot be directly used for similarity calculation. Therefore, the text-type indicator needs to be assigned and converted. When the text type is physical form, magnetism, smell, and color, the assignment conversion is performed according to Table 1, and the similarity is calculated after the assignment conversion. The color indicators in Table 1 can add other different color assignment data according to actual needs, which will not be repeated here.

表1文本型指标赋值表Table 1 Text type index assignment table

Figure SMS_37
Figure SMS_37

采用多指标计算模型计算相似度包括:The similarity is calculated using a multi-index calculation model including:

S301、判断用户输入的检索指标的类型,按照文本型指标和数值型指标进行分类;S301, determining the type of search index input by the user, and classifying the search index into text index and numerical index;

S302、设定文本型指标数量为N1,数值型指标数量为N2;当N1为1时,对文本型指标采用所述单指标计算模型计算相似度,当N1≥2时,对文本型指标采用交叉熵计算模型计算相似度,最后得出文本型指标对应的相似度

Figure SMS_38
;当N2为1时,对数值型指标采用所述单指标计算模型计算相似度,当N2≥2时,对数值型指标采用交叉熵计算模型计算相似度,最后得出数值型指标对应的相似度
Figure SMS_39
;S302, set the number of text indicators to N1, and the number of numerical indicators to N2; when N1 is 1, the single indicator calculation model is used to calculate the similarity of the text indicators; when N1≥2, the cross entropy calculation model is used to calculate the similarity of the text indicators, and finally the similarity corresponding to the text indicators is obtained.
Figure SMS_38
; When N2 is 1, the single indicator calculation model is used to calculate the similarity of the numerical indicator. When N2 ≥ 2, the cross entropy calculation model is used to calculate the similarity of the numerical indicator. Finally, the similarity corresponding to the numerical indicator is obtained.
Figure SMS_39
;

S303、选取相似度

Figure SMS_40
Figure SMS_41
中较大值,作为未知废物与匹配的已知废物的相似度。S303, select similarity
Figure SMS_40
and
Figure SMS_41
The larger value is taken as the similarity between the unknown waste and the matching known waste.

更进一步地,检索指标为多个数值型指标时,将输入的未知废物的数值型指标构成一个集合Y=(y1,y2, y3…yn),将匹配到的已知废物的指标构成一个集合X=(x1,x2, x3…xn),分别计算两个数据集的概率分布为q(y)=(q1,q2, q3…qn)和p(x)=(p1,p2, p3…pn),Furthermore, when the retrieval index is a plurality of numerical indexes, the numerical indexes of the input unknown waste are constituted into a set Y=(y 1 ,y 2 , y 3 …y n ), and the indexes of the matched known waste are constituted into a set X=(x 1 ,x 2 , x 3 …x n ), and the probability distributions of the two data sets are calculated as q(y)=(q 1 ,q 2 ,q 3 …q n ) and p(x)=(p 1 ,p 2 ,p 3 …p n ), respectively.

计算未知废物和已知废物的交叉熵计算概率为

Figure SMS_42
:The cross entropy probability of calculating unknown waste and known waste is calculated as
Figure SMS_42
:

Figure SMS_43
Figure SMS_43

Figure SMS_44
Figure SMS_44

其中:i=1,2……n;Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

Figure SMS_45
:The probability of calculating the distribution entropy of known waste indicators is
Figure SMS_45
:

Figure SMS_46
Figure SMS_46

Figure SMS_47
Figure SMS_47

则,未知废物与已知废物的相似度

Figure SMS_48
为:Then, the similarity between the unknown waste and the known waste is
Figure SMS_48
for:

Figure SMS_49
Figure SMS_49
.

需要说明的是,数值型指标匹配的过程中,可能会匹配到不同的已知废物,而已知废物指标数据集是指同一种已知废物的对应指标的数据集,对于匹配到的不同的已知废物,可以计算出不同的相似度,最终选择其中最大值显示相似度结果,同时,也可以调取每种匹配到的已知废物的相似度。It should be noted that in the process of matching numerical indicators, different known wastes may be matched, and the known waste indicator data set refers to the data set of corresponding indicators of the same known waste. For different known wastes matched, different similarities can be calculated, and finally the maximum value is selected to display the similarity result. At the same time, the similarity of each matched known waste can also be retrieved.

更进一步地,检索指标为多个文本型指标时,对文本型指标按照表1进行赋值,将赋值后的未知废物的文本型指标构成一个集合B=(b1,b2, b3…bn),将匹配到的已知废物的指标进行赋值转换构成一个集合A=(a1,a2, a3…an),分别计算两个数据集的概率分布为r(b)=(r1,r2, r3…rn)和s(a)=(s1,s2, s3…sn),Furthermore, when the retrieval index is a plurality of text-type indexes, the text-type indexes are assigned values according to Table 1, and the text-type indexes of the unknown waste after the assignment form a set B=( b1 , b2 , b3 ... bn ), and the indexes of the matched known wastes are assigned values to form a set A=( a1 , a2 , a3 ... an ), and the probability distributions of the two data sets are calculated as r(b)=( r1 , r2 , r3 ... rn ) and s(a)=( s1 , s2 , s3 ... sn ), respectively.

计算未知废物和已知废物的交叉熵计算概率为

Figure SMS_50
:The cross entropy probability of calculating unknown waste and known waste is calculated as
Figure SMS_50
:

Figure SMS_51
Figure SMS_51

Figure SMS_52
Figure SMS_52

其中:i=1,2……n;Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

Figure SMS_53
:The probability of calculating the distribution entropy of known waste indicators is
Figure SMS_53
:

Figure SMS_54
Figure SMS_54

Figure SMS_55
Figure SMS_55

则,未知废物与已知废物的相似度

Figure SMS_56
为:Then, the similarity between the unknown waste and the known waste is
Figure SMS_56
for:

Figure SMS_57
Figure SMS_57
.

同样,文本型指标匹配的过程中,可能会匹配到不同的已知废物,而已知废物指标数据集是指同一种已知废物的对应指标的数据集,对于匹配到的不同的已知废物,可以计算出不同的相似度,最终选择其中最大值显示相似度结果,同时,也可以调取每种匹配到的已知废物的相似度。Similarly, different known wastes may be matched during the text-based indicator matching process, and the known waste indicator data set refers to the data set of corresponding indicators of the same known waste. For different known wastes that are matched, different similarities can be calculated, and the maximum value is finally selected to display the similarity result. At the same time, the similarity of each matched known waste can also be retrieved.

根据以上特征信息以及相似度的计算即可得出最终的相似度,其余的特征信息不参与相似度的计算,仅作为信息数据使用。The final similarity can be obtained based on the above feature information and similarity calculation. The remaining feature information does not participate in the similarity calculation and is only used as information data.

S4、按照计算后的相似度显示溯源结果。当然,可以选择其中相似度最大值显示一种最可能的废物信息,也可以按照相似度从大到小的顺序显示溯源结果。相似度最大所对应的危险废物为未知废物最可能属于的危险废物种类,可能性随着相似度的减少逐渐减小。S4. Display the traceability results according to the calculated similarity. Of course, you can choose to display the most likely waste information with the highest similarity, or you can display the traceability results in descending order of similarity. The hazardous waste corresponding to the highest similarity is the most likely type of hazardous waste to which the unknown waste belongs, and the possibility gradually decreases as the similarity decreases.

实施例Example

以铝灰为例,分析本发明的危险废物溯源方法在未知废物的溯源或相似度分析应用的过程。Taking aluminum ash as an example, the process of applying the hazardous waste tracing method of the present invention to the tracing or similarity analysis of unknown wastes is analyzed.

铝灰的基本特征:铝灰具有刺激性气味(氨气),含有一定量的氧化铝、铝、氮化铝和氟化物。假设上述指标能够表征铝灰基本特征,即有刺激性气味,氧化铝、铝、氮化铝和氟化物及其含量分布是铝灰的指纹特征,构成铝灰指纹特征数据库。Basic characteristics of aluminum ash: aluminum ash has a pungent smell (ammonia) and contains a certain amount of aluminum oxide, aluminum, aluminum nitride and fluoride. Assuming that the above indicators can characterize the basic characteristics of aluminum ash, that is, it has a pungent smell, aluminum oxide, aluminum, aluminum nitride and fluoride and their content distribution are the fingerprint characteristics of aluminum ash, which constitute the aluminum ash fingerprint feature database.

当只知道未知废物的一个指标,如具有刺激性气味时,此时铝灰数据库中“有刺激性气味”是铝灰的指纹特征,因此,

Figure SMS_58
=1;未知废物同样具有刺激性气味,即
Figure SMS_59
=1。因此,
Figure SMS_60
Figure SMS_61
,因此,f(t)=1。即,在“有刺激性气味”指标的匹配分析条件下,未知废物与已知废物相似度为100%。When only one indicator of unknown waste is known, such as pungent odor, then "pungent odor" in the aluminum ash database is the fingerprint feature of aluminum ash.
Figure SMS_58
=1; unknown waste also has a pungent odor, i.e.
Figure SMS_59
=1. Therefore,
Figure SMS_60
,
Figure SMS_61
, therefore, f(t) = 1. That is, under the matching analysis condition of the "pungent odor" indicator, the similarity between the unknown waste and the known waste is 100%.

当获得未知废物指标多于一个时,如获得氧化铝含量、铝含量、氮化铝含量和氟化物含量的数值型指标,如表2所示,分别为50%、40%、9%和2%,即未知废物指标集合Y=(0.5,0.4, 0.09, 0.02),将主要物质含量归一化,构建未知废物指标概率分布q=(0.495,0.396,0.089, 0.020)。When more than one unknown waste indicator is obtained, such as the numerical indicators of alumina content, aluminum content, aluminum nitride content and fluoride content, as shown in Table 2, they are 50%, 40%, 9% and 2% respectively, that is, the unknown waste indicator set Y=(0.5, 0.4, 0.09, 0.02), the content of the main substances is normalized, and the probability distribution of the unknown waste indicators q=(0.495, 0.396, 0.089, 0.020) is constructed.

依据已知的铝灰指纹特征数据库,以铝灰主要物质含量均值构建已知指标集合X=(0.56, 0.38, 0.10, 0.03),如下表所示,将铝灰已知指标集合转化为概率分布p=(0.526,0.356, 0.091, 0.027)。According to the known aluminum ash fingerprint feature database, the known indicator set X=(0.56, 0.38, 0.10, 0.03) is constructed with the mean value of the main substance content of aluminum ash, as shown in the following table. The known indicator set of aluminum ash is converted into a probability distribution p=(0.526, 0.356, 0.091, 0.027).

计算未知废物指标概率分布和已知指标概率分布的交叉熵,H(p, q) =1.04,同时计算已知铝灰指标概率分布的熵为H(X)为1.01,依据信息量与概率的关系,分别计算出

Figure SMS_62
Figure SMS_63
,计算两个废物相似度
Figure SMS_64
为97.20%。表明未知废物与已知废物(铝灰)的相似度达到97.20%;此外,由于两种废物相似度较高,同时表明改未知废物可能来自于已知废物(铝灰),在基础指纹数据库充足的情况下,可实现未知废物的溯源工作。The cross entropy between the probability distribution of unknown waste indicators and the probability distribution of known indicators is calculated, H(p, q) = 1.04, and the entropy of the probability distribution of known aluminum ash indicators is calculated to be H(X) = 1.01. According to the relationship between information and probability,
Figure SMS_62
and
Figure SMS_63
, calculate the similarity of two wastes
Figure SMS_64
The value is 97.20%, indicating that the similarity between the unknown waste and the known waste (aluminum ash) is 97.20%. In addition, due to the high similarity between the two wastes, it also indicates that the unknown waste may come from the known waste (aluminum ash). If the basic fingerprint database is sufficient, the traceability of the unknown waste can be achieved.

表2 基于交叉熵的两种废物数值型指标相似度计算示例Table 2 Example of similarity calculation of two waste numerical indicators based on cross entropy

Figure SMS_65
Figure SMS_65

当获得未知废物指标多于一个时,如多个文本指标:颜色、气味、物理形态等,依据数据库中对文本指标的赋值情况,相应的将获得的未知废物文本指标对应赋值,依据多指标的基于交叉熵的溯源方法,计算相似度。When more than one unknown waste indicator is obtained, such as multiple text indicators: color, smell, physical form, etc., the unknown waste text indicators will be assigned corresponding values according to the assignment of text indicators in the database, and the similarity will be calculated based on the cross-entropy-based tracing method of multiple indicators.

例如未知废物为一种颜色为黄色、有刺激性气味的固体废物,匹配到的已知废物为颜色为灰色、有刺激性气味的固体废物,按照本发明提供的计算方法,计算结果如表3所示。For example, the unknown waste is a solid waste that is yellow in color and has a pungent odor, and the matched known waste is a solid waste that is gray in color and has a pungent odor. According to the calculation method provided by the present invention, the calculation results are shown in Table 3.

表3 基于交叉熵的两种废物文本型指标相似度计算示例Table 3 Example of similarity calculation of two waste text indicators based on cross entropy

Figure SMS_66
Figure SMS_66

将文本赋值后形成的数据分布,通过交叉熵计算两种废物的相似度,根据上述示例,发现A与B的相似度为86.54%。表明,在颜色、气味、物理形态分别为灰色、有刺激性气味、固态的条件下,未知废物与已知废物的相似概率为86.54%。The data distribution formed after the text assignment is used to calculate the similarity of the two wastes through cross entropy. According to the above example, it is found that the similarity between A and B is 86.54%. This shows that under the conditions that the color, smell, and physical form are gray, pungent smell, and solid, respectively, the probability of similarity between the unknown waste and the known waste is 86.54%.

以上具体实施方式仅用以说明本发明的技术方案而非限制,尽管参照实例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的范围,其均应涵盖在本发明的权利要求范围当中。The above specific implementation methods are only used to illustrate the technical solutions of the present invention rather than to limit it. Although the present invention has been described in detail with reference to examples, those skilled in the art should understand that the technical solutions of the present invention can be modified or replaced by equivalents without departing from the scope of the technical solutions of the present invention, which should be included in the scope of the claims of the present invention.

Claims (8)

1.一种基于匹配度和交叉熵的危险废物溯源方法,其特征在于,包括:1. A method for tracing the source of hazardous waste based on matching degree and cross entropy, characterized by comprising: S1、构建危险废物指纹特征数据库,数据库中包含危险废物的特征信息;S1. Construct a hazardous waste fingerprint feature database, which contains the feature information of hazardous waste; S2、用户输入相应检索指标,数据库根据输入的检索指标匹配相应的特征信息;S2. The user inputs the corresponding search index, and the database matches the corresponding feature information according to the input search index; S3、匹配后根据输入的检索指标的数量进行相似度计算;检索指标的数量N为1时,采用单指标计算模型计算相似度;N≥2时,采用多指标计算模型计算相似度;S3. After matching, similarity calculation is performed according to the number of search indicators input; when the number of search indicators N is 1, a single indicator calculation model is used to calculate similarity; when N ≥ 2, a multi-indicator calculation model is used to calculate similarity; S4、按照计算后的相似度显示溯源结果。S4. Display the traceability result according to the calculated similarity. 2.根据权利要求1所述的危险废物溯源方法,其特征在于,所述数据库中包括若干行信息和若干列信息,其中,每一行表示一种危险废物,每一列表示一个特征信息。2. The hazardous waste tracing method according to claim 1 is characterized in that the database includes a plurality of rows of information and a plurality of columns of information, wherein each row represents a type of hazardous waste and each column represents a characteristic information. 3.根据权利要求2所述的危险废物溯源方法,其特征在于,所述特征信息包括行业分类、废物类别、物理形态、形状、磁性、气味、颜色、表观形貌、物质组成、特征指标、数值指标。3. The hazardous waste tracing method according to claim 2 is characterized in that the characteristic information includes industry classification, waste category, physical form, shape, magnetism, odor, color, appearance, material composition, characteristic indicators, and numerical indicators. 4.根据权利要求1所述的危险废物溯源方法,其特征在于,所述单指标计算模型为:4. The hazardous waste source tracing method according to claim 1, characterized in that the single indicator calculation model is:
Figure QLYQS_1
Figure QLYQS_1
其中,
Figure QLYQS_2
表示未知废物与数据库中已知废物的相似度,t表示用于输入的检索指标与匹配到的特征信息之间的匹配度。
in,
Figure QLYQS_2
It represents the similarity between the unknown waste and the known waste in the database, and t represents the matching degree between the input retrieval index and the matched feature information.
5.根据权利要求4所述的危险废物溯源方法,其特征在于,匹配度t的计算方法为:5. The method for tracing the source of hazardous waste according to claim 4 is characterized in that the calculation method of the matching degree t is:
Figure QLYQS_3
Figure QLYQS_3
其中,k的计算方法为:The calculation method of k is:
Figure QLYQS_4
Figure QLYQS_4
用户输入的检索指标设定为
Figure QLYQS_5
,匹配到的指标为
Figure QLYQS_6
,则
Figure QLYQS_7
Figure QLYQS_8
The search index entered by the user is set as
Figure QLYQS_5
, the matching index is
Figure QLYQS_6
,but
Figure QLYQS_7
,
Figure QLYQS_8
.
6.根据权利要求1所述的危险废物溯源方法,其特征在于,采用多指标计算模型计算相似度包括:6. The method for tracing the source of hazardous waste according to claim 1 is characterized in that the similarity is calculated using a multi-index calculation model, which comprises: S301、判断用户输入的检索指标的类型,按照文本型指标和数值型指标进行分类;S301, determining the type of search index input by the user, and classifying the search index into text index and numerical index; S302、设定文本型指标数量为N1,数值型指标数量为N2;当N1为1时,对文本型指标采用所述单指标计算模型计算相似度,当N1≥2时,对文本型指标采用交叉熵计算模型计算相似度,最后得出文本型指标对应的相似度
Figure QLYQS_9
;当N2为1时,对数值型指标采用所述单指标计算模型计算相似度,当N2≥2时,对数值型指标采用交叉熵计算模型计算相似度,最后得出数值型指标对应的相似度
Figure QLYQS_10
S302, set the number of text indicators to N1, and the number of numerical indicators to N2; when N1 is 1, the single indicator calculation model is used to calculate the similarity of the text indicators; when N1≥2, the cross entropy calculation model is used to calculate the similarity of the text indicators, and finally the similarity corresponding to the text indicators is obtained.
Figure QLYQS_9
; When N2 is 1, the single indicator calculation model is used to calculate the similarity of the numerical indicator. When N2 ≥ 2, the cross entropy calculation model is used to calculate the similarity of the numerical indicator. Finally, the similarity corresponding to the numerical indicator is obtained.
Figure QLYQS_10
;
S303、选取相似度
Figure QLYQS_11
Figure QLYQS_12
中较大值,作为未知废物与匹配的已知废物的相似度。
S303, select similarity
Figure QLYQS_11
and
Figure QLYQS_12
The larger value is taken as the similarity between the unknown waste and the matching known waste.
7.根据权利要求6所述的危险废物溯源方法,其特征在于,检索指标为多个数值型指标时,将输入的未知废物的数值型指标构成一个集合Y=(y1,y2, y3…yn),将匹配到的已知废物的指标构成一个集合X=(x1,x2, x3…xn),分别计算两个数据集的概率分布为q(y)=(q1,q2, q3…qn)和p(x)=(p1,p2, p3…pn),7. The hazardous waste tracing method according to claim 6 is characterized in that, when the retrieval index is a plurality of numerical indexes, the numerical indexes of the input unknown wastes are formed into a set Y=(y 1 ,y 2 , y 3 …y n ), and the indexes of the matched known wastes are formed into a set X=(x 1 ,x 2 ,x 3 …x n ), and the probability distributions of the two data sets are calculated as q(y)=(q 1 ,q 2 ,q 3 …q n ) and p(x)=(p 1 ,p 2 ,p 3 …p n ), respectively. 计算未知废物和已知废物的交叉熵计算概率为
Figure QLYQS_13
The cross entropy probability of calculating unknown waste and known waste is calculated as
Figure QLYQS_13
:
Figure QLYQS_14
Figure QLYQS_14
Figure QLYQS_15
Figure QLYQS_15
其中:i=1,2……n;Where: i=1, 2...n; 计算已知废物指标分布熵的计算概率为
Figure QLYQS_16
The probability of calculating the distribution entropy of known waste indicators is
Figure QLYQS_16
:
Figure QLYQS_17
Figure QLYQS_17
Figure QLYQS_18
Figure QLYQS_18
则,未知废物与已知废物的相似度
Figure QLYQS_19
为:
Then, the similarity between the unknown waste and the known waste is
Figure QLYQS_19
for:
Figure QLYQS_20
Figure QLYQS_20
.
8.根据权利要求6所述的危险废物溯源方法,其特征在于,检索指标为多个文本型指标时,对文本型指标进行赋值,8. The method for tracing the source of hazardous waste according to claim 6 is characterized in that when the search index is a plurality of text-type indexes, the text-type indexes are assigned values, 将赋值后的未知废物的文本型指标构成一个集合B=(b1,b2, b3…bn),将匹配到的已知废物的指标进行赋值转换构成一个集合A=(a1,a2, a3…an),分别计算两个数据集的概率分布为r(b)=(r1,r2, r3…rn)和s(a)=(s1,s2, s3…sn),The text indicators of unknown waste after assignment are formed into a set B=(b 1 ,b 2 ,b 3 …b n ), and the indicators of matched known waste are assigned and transformed to form a set A=(a 1 ,a 2 ,a 3 …a n ). The probability distribution of the two data sets is calculated as r(b)=(r 1 ,r 2 ,r 3 …r n ) and s(a)=(s 1 ,s 2 ,s 3 …s n ), respectively. 计算未知废物和已知废物的交叉熵计算概率为
Figure QLYQS_21
The cross entropy probability of calculating unknown waste and known waste is calculated as
Figure QLYQS_21
:
Figure QLYQS_22
Figure QLYQS_22
Figure QLYQS_23
Figure QLYQS_23
其中:i=1,2……n;Where: i=1, 2...n; 计算已知废物指标分布熵的计算概率为
Figure QLYQS_24
The probability of calculating the distribution entropy of known waste indicators is
Figure QLYQS_24
:
Figure QLYQS_25
Figure QLYQS_25
Figure QLYQS_26
Figure QLYQS_26
则,未知废物与已知废物的相似度
Figure QLYQS_27
为:
Then, the similarity between the unknown waste and the known waste is
Figure QLYQS_27
for:
Figure QLYQS_28
Figure QLYQS_28
.
CN202310139981.7A 2023-02-21 2023-02-21 A Hazardous Waste Traceability Method Based on Matching Degree and Cross Entropy Active CN115860769B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310139981.7A CN115860769B (en) 2023-02-21 2023-02-21 A Hazardous Waste Traceability Method Based on Matching Degree and Cross Entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310139981.7A CN115860769B (en) 2023-02-21 2023-02-21 A Hazardous Waste Traceability Method Based on Matching Degree and Cross Entropy

Publications (2)

Publication Number Publication Date
CN115860769A true CN115860769A (en) 2023-03-28
CN115860769B CN115860769B (en) 2023-05-05

Family

ID=85658525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310139981.7A Active CN115860769B (en) 2023-02-21 2023-02-21 A Hazardous Waste Traceability Method Based on Matching Degree and Cross Entropy

Country Status (1)

Country Link
CN (1) CN115860769B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000155681A (en) * 1998-11-24 2000-06-06 Fujitsu Ltd Prediction device and method for performing prediction based on similar cases
CN111291069A (en) * 2018-12-07 2020-06-16 北京搜狗科技发展有限公司 Data processing method and device and electronic equipment
CN113361263A (en) * 2021-06-04 2021-09-07 中国人民解放军战略支援部队信息工程大学 Character entity attribute alignment method and system based on attribute value distribution
CN115776401A (en) * 2022-11-23 2023-03-10 中国人民解放军国防科技大学 Method and device for tracing the source of network attack events based on few-sample learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000155681A (en) * 1998-11-24 2000-06-06 Fujitsu Ltd Prediction device and method for performing prediction based on similar cases
CN111291069A (en) * 2018-12-07 2020-06-16 北京搜狗科技发展有限公司 Data processing method and device and electronic equipment
CN113361263A (en) * 2021-06-04 2021-09-07 中国人民解放军战略支援部队信息工程大学 Character entity attribute alignment method and system based on attribute value distribution
CN115776401A (en) * 2022-11-23 2023-03-10 中国人民解放军国防科技大学 Method and device for tracing the source of network attack events based on few-sample learning

Also Published As

Publication number Publication date
CN115860769B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
Zhang et al. How construction and demolition waste management has addressed sustainable development goals: exploring academic and industrial trends
Warr Representation of chemical structures
Stefanović et al. A comparison of the analytic hierarchy process and the analysis and synthesis of parameters under information deficiency method for assessing the sustainability of waste management scenarios
Wernick et al. Industrial ecology: Some directions for research
CN111914141B (en) Public opinion knowledge base construction method and public opinion knowledge base
Shanta et al. Municipal solid waste management: Identification and analysis of technology selection criteria using Fuzzy Delphi and Fuzzy DEMATEL technique
CN109300042A (en) A kind of air control system based on big data
CN117216105A (en) High-precision scientific and technological achievement conversion method and system based on standardized evaluation
CN114117065A (en) Knowledge graph construction method and system based on electricity production statistics business
CN115757810A (en) A method for constructing knowledge graph standard ontology
CN115392939B (en) A Hazardous Waste Traceability Method Based on Retrieval Comparison and Matching Degree Calculation
Moradi et al. Sustainability indicators in building construction projects through the lens of project delivery elements
Han et al. Exploring the greenhouse gas emissions inventory and driving mechanisms of municipal solid waste in China
CN115860769B (en) A Hazardous Waste Traceability Method Based on Matching Degree and Cross Entropy
Sun et al. Big data revealed relationship between air pollution and manufacturing industry in China
Bencekri et al. A systematic review of the 15-minute city framework: implications for environmental heritage preservation in the Anthropocene
Miotto et al. Supporting the Curation of Biological Databases Reusable Text Mining
CN110750622A (en) Big data-based financial event discovery method
CN119025623A (en) A method for tracking governance structure changes based on policy citation correlation
Wang et al. Analysis of hazardous waste management elements in oil and gas enterprises based on the life-cycle management concept
CN116049368B (en) Content grabbing system based on legal text vector analysis
Liu et al. Optimisation of the Circular Economy Based on the Resource Circulation Equation
Lancho-Barrantes et al. The iceberg hypothesis revisited
Jiang et al. Carbon Emission Assessment During the Recycling Phase of Building Meltable Materials from Construction and Demolition Waste: A Case Study in China
CN115953041A (en) Construction scheme and system of operator policy system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant