CN115860769A

CN115860769A - Hazardous waste tracing method based on matching degree and cross entropy

Info

Publication number: CN115860769A
Application number: CN202310139981.7A
Authority: CN
Inventors: 杨玉飞; 杨金忠; 李雪冰; 迭庆杞; 黄启飞; 王菲; 于天
Original assignee: Chinese Research Academy of Environmental Sciences
Current assignee: Chinese Research Academy of Environmental Sciences
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-03-28
Anticipated expiration: 2043-02-21
Also published as: CN115860769B

Abstract

The invention belongs to the technical field of data information processing, and particularly relates to a dangerous waste tracing method based on matching degree and cross entropy, which comprises the following steps: s1, constructing a dangerous waste fingerprint characteristic database, wherein the database contains characteristic information of dangerous waste; s2, a user inputs a corresponding retrieval index, and the database matches corresponding characteristic information according to the input retrieval index; s3, after matching, similarity calculation is carried out according to the number of input retrieval indexes; when the number N of the retrieval indexes is 1, calculating the similarity by adopting a single index calculation model; when N is more than or equal to 2, calculating the similarity by adopting a multi-index calculation model; and S4, displaying the tracing result according to the calculated similarity. The method can not only give quantitative matching results, but also reduce the calculated amount, improve the matching efficiency and increase the accuracy of database matching; the method is beneficial to the rapid identification of the waste, realizes the rapid tracing of the hazardous waste, and further assists the subsequent decision.

Description

A hazardous waste tracing method based on matching degree and cross entropy

技术领域Technical Field

本发明属于数据信息处理技术领域，具体涉及一种基于匹配度和交叉熵的危险废物溯源方法。The present invention belongs to the technical field of data information processing, and in particular relates to a hazardous waste tracing method based on matching degree and cross entropy.

背景技术Background Art

随着社会的快速发展，带来的废物垃圾等日益增多，尤其工业生产中产生的危险废物，全世界每年的危险废物产生量高达3亿多吨。危险废物通常具有腐蚀性、毒性、易燃性、反应性或感染性等其中一种或多种危险特性，化工领域中产生的废有机溶剂、蒸馏废液等，农药领域中产生的母液、废盐等，石油领域中产生的浮渣、含油污泥等，有色金属冶炼领域中产生的收尘灰、冶炼废渣等。这些危险废物不仅危害人们健康，对环境也会造成长期破坏，因此，对危险废物的合理处置及科学管理极其重要。With the rapid development of society, waste and garbage are increasing day by day, especially hazardous wastes generated in industrial production. The annual amount of hazardous wastes generated in the world is as high as more than 300 million tons. Hazardous wastes usually have one or more hazardous characteristics such as corrosiveness, toxicity, flammability, reactivity or infectivity. They include waste organic solvents and distillation waste liquids generated in the chemical industry, mother liquor and waste salt generated in the pesticide field, scum and oily sludge generated in the petroleum field, and dust ash and smelting waste residue generated in the non-ferrous metal smelting field. These hazardous wastes not only endanger people's health, but also cause long-term damage to the environment. Therefore, the reasonable disposal and scientific management of hazardous wastes are extremely important.

近年来频发的固废特别是危废倾倒事件中，对固废的快速且准确的溯源及特性鉴定是对这类事件罚责认定及安全处置的重大难题。然而，由于缺乏对危废产生源和产生特性匹配的快速定性和精准识别技术，极大地阻碍了对这类备受关注的倾倒事件综合治理方案的及时提出。同时，危废利用处置单位在接收危废时，由于对危废类别准确识别能力的不足，容易导致生产安全事故以及超能力处置危废的风险产生。因而，建立数据库并开发不明固废的溯源技术与方法，对打击固废倾倒，维护生态环境安全以及防控固废利用处置过程的风险意义十分重大。In the frequent solid waste dumping incidents in recent years, especially hazardous waste dumping incidents, the rapid and accurate tracing and characteristic identification of solid waste is a major problem in the determination of penalties and safe disposal of such incidents. However, the lack of rapid qualitative and precise identification technology for matching the source and characteristics of hazardous waste has greatly hindered the timely proposal of comprehensive management plans for such highly concerned dumping incidents. At the same time, when hazardous waste utilization and disposal units receive hazardous waste, due to the lack of accurate identification of hazardous waste categories, it is easy to cause production safety accidents and the risk of over-capacity disposal of hazardous waste. Therefore, establishing a database and developing traceability technologies and methods for unknown solid waste are of great significance to combating solid waste dumping, maintaining ecological and environmental safety, and preventing and controlling risks in the solid waste utilization and disposal process.

此外，对于危险废物的精细化管理技术较为薄弱，因此，亟需提供一种危险废物溯源方法，利用不明废物的相关信息结合危险废物溯源系统，即可得知危险废物的预估类型以及预估特性。In addition, the refined management technology for hazardous waste is relatively weak. Therefore, there is an urgent need to provide a hazardous waste traceability method. By using the relevant information of unknown wastes and combining it with the hazardous waste traceability system, the estimated type and estimated characteristics of the hazardous waste can be obtained.

发明内容Summary of the invention

为了解决现有技术中危险废物溯源困难的技术问题，本发明提供一种基于匹配度和交叉熵的危险废物溯源方法。In order to solve the technical problem of difficulty in tracing the source of hazardous waste in the prior art, the present invention provides a hazardous waste tracing method based on matching degree and cross entropy.

为实现上述目的，本发明的技术方案如下：To achieve the above object, the technical solution of the present invention is as follows:

一种基于匹配度和交叉熵的危险废物溯源方法，包括：A hazardous waste tracing method based on matching degree and cross entropy, comprising:

S1、构建危险废物指纹特征数据库，数据库中包含危险废物的特征信息；S1. Construct a hazardous waste fingerprint feature database, which contains the feature information of hazardous waste;

S2、用户输入相应检索指标，数据库根据输入的检索指标匹配相应的特征信息；S2. The user inputs the corresponding search index, and the database matches the corresponding feature information according to the input search index;

S3、匹配后根据输入的检索指标的数量进行相似度计算；检索指标的数量N为1时，采用单指标计算模型计算相似度；N≥2时，采用多指标计算模型计算相似度；S3. After matching, similarity calculation is performed according to the number of search indicators input; when the number of search indicators N is 1, a single indicator calculation model is used to calculate similarity; when N ≥ 2, a multi-indicator calculation model is used to calculate similarity;

S4、按照计算后的相似度显示溯源结果。S4. Display the traceability result according to the calculated similarity.

进一步地，所述数据库中包括若干行信息和若干列信息，其中，每一行表示一种危险废物，每一列表示一个特征信息。Furthermore, the database includes a plurality of rows of information and a plurality of columns of information, wherein each row represents a type of hazardous waste and each column represents a characteristic information.

更进一步地，所述特征信息包括行业分类、废物类别、物理形态、形状、磁性、气味、颜色、表观形貌、物质组成、特征指标、数值指标。Furthermore, the characteristic information includes industry classification, waste category, physical form, shape, magnetism, odor, color, appearance, material composition, characteristic indicators, and numerical indicators.

进一步地，所述单指标计算模型为：Furthermore, the single indicator calculation model is:

其中，

表示未知废物与数据库中已知废物的相似度，t表示用于输入的检索指标与匹配到的特征信息之间的匹配度。in,

It represents the similarity between the unknown waste and the known waste in the database, and t represents the matching degree between the input retrieval index and the matched feature information.

更进一步地，匹配度t的计算方法为：Furthermore, the calculation method of the matching degree t is:

其中，k的计算方法为：The calculation method of k is:

用户输入的检索指标设定为

，匹配到的指标为

，则

，

。The search index entered by the user is set as

, the matching index is

,but

,

.

进一步地，采用多指标计算模型计算相似度包括：Furthermore, the similarity is calculated using a multi-index calculation model including:

S301、判断用户输入的检索指标的类型，按照文本型指标和数值型指标进行分类；S301, determining the type of search index input by the user, and classifying the search index into text index and numerical index;

S302、设定文本型指标数量为N1，数值型指标数量为N2；当N1为1时，对文本型指标采用所述单指标计算模型计算相似度，当N1≥2时，对文本型指标采用交叉熵计算模型计算相似度，最后得出文本型指标对应的相似度

；当N2为1时，对数值型指标采用所述单指标计算模型计算相似度，当N2≥2时，对数值型指标采用交叉熵计算模型计算相似度，最后得出数值型指标对应的相似度

；S302, set the number of text indicators to N1, and the number of numerical indicators to N2; when N1 is 1, the single indicator calculation model is used to calculate the similarity of the text indicators; when N1≥2, the cross entropy calculation model is used to calculate the similarity of the text indicators, and finally the similarity corresponding to the text indicators is obtained.

; When N2 is 1, the single indicator calculation model is used to calculate the similarity of the numerical indicator. When N2 ≥ 2, the cross entropy calculation model is used to calculate the similarity of the numerical indicator. Finally, the similarity corresponding to the numerical indicator is obtained.

;

S303、选取相似度

和

中较大值，作为未知废物与匹配的已知废物的相似度。S303, select similarity

and

The larger value is taken as the similarity between the unknown waste and the matching known waste.

更进一步地，检索指标为多个数值型指标时，将输入的未知废物的数值型指标构成一个集合Y=(y₁,y₂, y₃…y_n)，将匹配到的已知废物的指标构成一个集合X=(x₁,x₂, x₃…x_n)，分别计算两个数据集的概率分布为q(y)=(q₁,q₂, q₃…q_n)和p(x)=(p₁,p₂, p₃…p_n)，Furthermore, when the retrieval index is a plurality of numerical indexes, the numerical indexes of the input unknown waste are constituted into a set Y=(y ₁ ,y ₂ , y ₃ …y _n ), and the indexes of the matched known waste are constituted into a set X=(x ₁ ,x ₂ , x ₃ …x _n ), and the probability distributions of the two data sets are calculated as q(y)=(q ₁ ,q ₂ ,q ₃ …q _n ) and p(x)=(p ₁ ,p ₂ ,p ₃ …p _n ), respectively.

计算未知废物和已知废物的交叉熵计算概率为

：The cross entropy probability of calculating unknown waste and known waste is calculated as

:

其中：i=1，2……n；Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

：The probability of calculating the distribution entropy of known waste indicators is

:

则，未知废物与已知废物的相似度

为：Then, the similarity between the unknown waste and the known waste is

for:

。

.

更进一步地，检索指标为多个文本型指标时，对文本型指标进行赋值，Furthermore, when the search index is multiple text indexes, the text indexes are assigned values.

将赋值后的未知废物的文本型指标构成一个集合B=(b₁,b₂, b₃…b_n)，将匹配到的已知废物的指标进行赋值转换构成一个集合A=(a₁,a₂, a₃…a_n)，分别计算两个数据集的概率分布为r(b)=(r₁,r₂, r₃…r_n)和s(a)=(s₁,s₂, s₃…s_n)，The text indicators of unknown waste after assignment are formed into a set B=(b ₁ ,b ₂ ,b ₃ …b _n ), and the indicators of matched known waste are assigned and transformed to form a set A=(a ₁ ,a ₂ ,a ₃ …a _n ). The probability distribution of the two data sets is calculated as r(b)=(r ₁ ,r ₂ ,r ₃ …r _n ) and s(a)=(s ₁ ,s ₂ ,s ₃ …s _n ), respectively.

计算未知废物和已知废物的交叉熵计算概率为

:

其中：i=1，2……n；Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

:

则，未知废物与已知废物的相似度

为：Then, the similarity between the unknown waste and the known waste is

for:

。

.

相对于现有技术，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明通过构建合理的危险废物数据库，数据库中包含危险废物的各种特征信息，用户输入相应的检索条件，数据库根据检索指标匹配不同的模型计算相似度，不仅能够给出量化的匹配结果，还能够减少计算量，提高匹配效率，增加数据库匹配的准确率；有利于废物的快速鉴别，实现危险废物的快速溯源，进而辅助后续的决策。The present invention constructs a reasonable hazardous waste database, which contains various characteristic information of hazardous waste. Users input corresponding search conditions, and the database matches different models according to the search indicators to calculate similarities. It can not only provide quantitative matching results, but also reduce the amount of calculation, improve matching efficiency, and increase the accuracy of database matching; it is conducive to the rapid identification of wastes, the rapid tracing of hazardous wastes, and further assisting subsequent decision-making.

本发明通过相似度表征未知废物与废物数据库的匹配结果，直观、有效。The present invention characterizes the matching results of unknown waste and waste database by similarity, which is intuitive and effective.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的溯源流程示意图。FIG1 is a schematic diagram of the traceability process of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合附图说明对本发明的技术方案进行清楚的描述，显然，所描述的实施例并不是本发明的全部实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。The technical solution of the present invention will be clearly described below in conjunction with the accompanying drawings. Obviously, the described embodiments are not all embodiments of the present invention, and all other embodiments obtained by ordinary technicians in the field without making creative work are within the protection scope of the present invention.

本发明提供一种基于匹配度和交叉熵的危险废物溯源方法，包括：The present invention provides a hazardous waste tracing method based on matching degree and cross entropy, comprising:

S1、构建危险废物指纹特征数据库，数据库中包含危险废物的特征信息；所述数据库中包括1554行信息和193列信息，其中，每一行表示一种危险废物，即数据库中包含1554种危险废物，每一列表示一个特征信息。所述特征信息包括行业分类、废物类别、物理形态、形状、磁性、气味、颜色、表观形貌、物质组成、特征指标、数值指标。数值指标包括重金属含量等，特征信息还包括数据来源、废物类型、固废名称、产生环节、废物描述等。S1. Construct a hazardous waste fingerprint feature database, which contains the feature information of hazardous waste; the database includes 1554 rows of information and 193 columns of information, wherein each row represents a type of hazardous waste, that is, the database contains 1554 types of hazardous waste, and each column represents a feature information. The feature information includes industry classification, waste category, physical form, shape, magnetism, odor, color, appearance, material composition, feature index, and numerical index. The numerical index includes heavy metal content, etc. The feature information also includes data source, waste type, solid waste name, generation link, waste description, etc.

用户可以直接输入相应的危险废物名称，根据废物名称字串与数据库中的危险废物名称进行匹配，字串重叠最多的即为检索结果，显示在前端页面，供用户参考。Users can directly enter the corresponding hazardous waste name, and match the waste name string with the hazardous waste name in the database. The one with the most string overlap is the search result and is displayed on the front page for user reference.

当危险废物的名称、种类不确定时，可以利用现有技术获取危险废物的容易获取的特征信息，作为检索指标进行检索，以预估危险废物最可能的废物名称和种类。When the name and type of hazardous waste are uncertain, existing technology can be used to obtain easily accessible characteristic information of the hazardous waste and use it as a search indicator to estimate the most likely name and type of the hazardous waste.

S3、匹配后根据输入的检索指标的数量进行相似度计算；检索指标的数量N为1时，采用单指标计算模型计算相似度；N≥2时，采用多指标计算模型计算相似度；如图1所示为相似度计算的流程示意图。S3. After matching, similarity calculation is performed based on the number of search indicators input; when the number of search indicators N is 1, a single indicator calculation model is used to calculate similarity; when N ≥ 2, a multi-indicator calculation model is used to calculate similarity; FIG1 is a schematic diagram of the similarity calculation process.

所述单指标计算模型为：The single indicator calculation model is:

其中，

匹配度t的计算方法为：The calculation method of matching degree t is:

其中，k的计算方法为：The calculation method of k is:

用户输入的检索指标设定为

，匹配到的指标为

，则

，

。The search index entered by the user is set as

, the matching index is

,but

,

.

其中，如果输入的单指标属于文本型指标时，无法直接用于相似度的计算，因此，需要将文本型指标进行赋值转化，当文本类型为物理形态、磁性、气味、颜色时，按照表1进行赋值转化，赋值转化后再进行相似度。表1中的颜色指标，可以根据实际需要增加其他不同颜色的赋值数据，在此不做赘述。Among them, if the input single indicator belongs to a text-type indicator, it cannot be directly used for similarity calculation. Therefore, the text-type indicator needs to be assigned and converted. When the text type is physical form, magnetism, smell, and color, the assignment conversion is performed according to Table 1, and the similarity is calculated after the assignment conversion. The color indicators in Table 1 can add other different color assignment data according to actual needs, which will not be repeated here.

表1文本型指标赋值表Table 1 Text type index assignment table

采用多指标计算模型计算相似度包括：The similarity is calculated using a multi-index calculation model including:

;

S303、选取相似度

和

and

计算未知废物和已知废物的交叉熵计算概率为

:

其中：i=1，2……n；Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

:

则，未知废物与已知废物的相似度

为：Then, the similarity between the unknown waste and the known waste is

for:

。

.

需要说明的是，数值型指标匹配的过程中，可能会匹配到不同的已知废物，而已知废物指标数据集是指同一种已知废物的对应指标的数据集，对于匹配到的不同的已知废物，可以计算出不同的相似度，最终选择其中最大值显示相似度结果，同时，也可以调取每种匹配到的已知废物的相似度。It should be noted that in the process of matching numerical indicators, different known wastes may be matched, and the known waste indicator data set refers to the data set of corresponding indicators of the same known waste. For different known wastes matched, different similarities can be calculated, and finally the maximum value is selected to display the similarity result. At the same time, the similarity of each matched known waste can also be retrieved.

更进一步地，检索指标为多个文本型指标时，对文本型指标按照表1进行赋值，将赋值后的未知废物的文本型指标构成一个集合B=(b₁,b₂, b₃…b_n)，将匹配到的已知废物的指标进行赋值转换构成一个集合A=(a₁,a₂, a₃…a_n)，分别计算两个数据集的概率分布为r(b)=(r₁,r₂, r₃…r_n)和s(a)=(s₁,s₂, s₃…s_n)，Furthermore, when the retrieval index is a plurality of text-type indexes, the text-type indexes are assigned values according to Table 1, and the text-type indexes of the unknown waste after the assignment form a set B=( _b1 , _b2 , _b3 ... _bn ), and the indexes of the matched known wastes are assigned values to form a set A=( _a1 , _a2 , _a3 ... _an ), and the probability distributions of the two data sets are calculated as r(b)=( _r1 , _r2 , _r3 ... _rn ) and s(a)=( _s1 , _s2 , _s3 ... _sn ), respectively.

计算未知废物和已知废物的交叉熵计算概率为

:

其中：i=1，2……n；Where: i=1, 2...n;

计算已知废物指标分布熵的计算概率为

:

则，未知废物与已知废物的相似度

为：Then, the similarity between the unknown waste and the known waste is

for:

。

.

同样，文本型指标匹配的过程中，可能会匹配到不同的已知废物，而已知废物指标数据集是指同一种已知废物的对应指标的数据集，对于匹配到的不同的已知废物，可以计算出不同的相似度，最终选择其中最大值显示相似度结果，同时，也可以调取每种匹配到的已知废物的相似度。Similarly, different known wastes may be matched during the text-based indicator matching process, and the known waste indicator data set refers to the data set of corresponding indicators of the same known waste. For different known wastes that are matched, different similarities can be calculated, and the maximum value is finally selected to display the similarity result. At the same time, the similarity of each matched known waste can also be retrieved.

根据以上特征信息以及相似度的计算即可得出最终的相似度，其余的特征信息不参与相似度的计算，仅作为信息数据使用。The final similarity can be obtained based on the above feature information and similarity calculation. The remaining feature information does not participate in the similarity calculation and is only used as information data.

S4、按照计算后的相似度显示溯源结果。当然，可以选择其中相似度最大值显示一种最可能的废物信息，也可以按照相似度从大到小的顺序显示溯源结果。相似度最大所对应的危险废物为未知废物最可能属于的危险废物种类，可能性随着相似度的减少逐渐减小。S4. Display the traceability results according to the calculated similarity. Of course, you can choose to display the most likely waste information with the highest similarity, or you can display the traceability results in descending order of similarity. The hazardous waste corresponding to the highest similarity is the most likely type of hazardous waste to which the unknown waste belongs, and the possibility gradually decreases as the similarity decreases.

实施例Example

以铝灰为例，分析本发明的危险废物溯源方法在未知废物的溯源或相似度分析应用的过程。Taking aluminum ash as an example, the process of applying the hazardous waste tracing method of the present invention to the tracing or similarity analysis of unknown wastes is analyzed.

铝灰的基本特征：铝灰具有刺激性气味（氨气），含有一定量的氧化铝、铝、氮化铝和氟化物。假设上述指标能够表征铝灰基本特征，即有刺激性气味，氧化铝、铝、氮化铝和氟化物及其含量分布是铝灰的指纹特征，构成铝灰指纹特征数据库。Basic characteristics of aluminum ash: aluminum ash has a pungent smell (ammonia) and contains a certain amount of aluminum oxide, aluminum, aluminum nitride and fluoride. Assuming that the above indicators can characterize the basic characteristics of aluminum ash, that is, it has a pungent smell, aluminum oxide, aluminum, aluminum nitride and fluoride and their content distribution are the fingerprint characteristics of aluminum ash, which constitute the aluminum ash fingerprint feature database.

当只知道未知废物的一个指标，如具有刺激性气味时，此时铝灰数据库中“有刺激性气味”是铝灰的指纹特征，因此，

=1；未知废物同样具有刺激性气味，即

=1。因此，

，

，因此，f(t)=1。即，在“有刺激性气味”指标的匹配分析条件下，未知废物与已知废物相似度为100%。When only one indicator of unknown waste is known, such as pungent odor, then "pungent odor" in the aluminum ash database is the fingerprint feature of aluminum ash.

=1; unknown waste also has a pungent odor, i.e.

=1. Therefore,

,

, therefore, f(t) = 1. That is, under the matching analysis condition of the "pungent odor" indicator, the similarity between the unknown waste and the known waste is 100%.

当获得未知废物指标多于一个时，如获得氧化铝含量、铝含量、氮化铝含量和氟化物含量的数值型指标，如表2所示，分别为50%、40%、9%和2%，即未知废物指标集合Y=(0.5,0.4, 0.09, 0.02)，将主要物质含量归一化，构建未知废物指标概率分布q=(0.495,0.396,0.089, 0.020)。When more than one unknown waste indicator is obtained, such as the numerical indicators of alumina content, aluminum content, aluminum nitride content and fluoride content, as shown in Table 2, they are 50%, 40%, 9% and 2% respectively, that is, the unknown waste indicator set Y=(0.5, 0.4, 0.09, 0.02), the content of the main substances is normalized, and the probability distribution of the unknown waste indicators q=(0.495, 0.396, 0.089, 0.020) is constructed.

依据已知的铝灰指纹特征数据库，以铝灰主要物质含量均值构建已知指标集合X=(0.56, 0.38, 0.10, 0.03)，如下表所示，将铝灰已知指标集合转化为概率分布p=(0.526,0.356, 0.091, 0.027)。According to the known aluminum ash fingerprint feature database, the known indicator set X=(0.56, 0.38, 0.10, 0.03) is constructed with the mean value of the main substance content of aluminum ash, as shown in the following table. The known indicator set of aluminum ash is converted into a probability distribution p=(0.526, 0.356, 0.091, 0.027).

计算未知废物指标概率分布和已知指标概率分布的交叉熵，H(p, q) =1.04，同时计算已知铝灰指标概率分布的熵为H(X)为1.01，依据信息量与概率的关系，分别计算出

和

，计算两个废物相似度

为97.20%。表明未知废物与已知废物（铝灰）的相似度达到97.20%；此外，由于两种废物相似度较高，同时表明改未知废物可能来自于已知废物（铝灰），在基础指纹数据库充足的情况下，可实现未知废物的溯源工作。The cross entropy between the probability distribution of unknown waste indicators and the probability distribution of known indicators is calculated, H(p, q) = 1.04, and the entropy of the probability distribution of known aluminum ash indicators is calculated to be H(X) = 1.01. According to the relationship between information and probability,

and

, calculate the similarity of two wastes

The value is 97.20%, indicating that the similarity between the unknown waste and the known waste (aluminum ash) is 97.20%. In addition, due to the high similarity between the two wastes, it also indicates that the unknown waste may come from the known waste (aluminum ash). If the basic fingerprint database is sufficient, the traceability of the unknown waste can be achieved.

表2 基于交叉熵的两种废物数值型指标相似度计算示例Table 2 Example of similarity calculation of two waste numerical indicators based on cross entropy

当获得未知废物指标多于一个时，如多个文本指标：颜色、气味、物理形态等，依据数据库中对文本指标的赋值情况，相应的将获得的未知废物文本指标对应赋值，依据多指标的基于交叉熵的溯源方法，计算相似度。When more than one unknown waste indicator is obtained, such as multiple text indicators: color, smell, physical form, etc., the unknown waste text indicators will be assigned corresponding values according to the assignment of text indicators in the database, and the similarity will be calculated based on the cross-entropy-based tracing method of multiple indicators.

例如未知废物为一种颜色为黄色、有刺激性气味的固体废物，匹配到的已知废物为颜色为灰色、有刺激性气味的固体废物，按照本发明提供的计算方法，计算结果如表3所示。For example, the unknown waste is a solid waste that is yellow in color and has a pungent odor, and the matched known waste is a solid waste that is gray in color and has a pungent odor. According to the calculation method provided by the present invention, the calculation results are shown in Table 3.

表3 基于交叉熵的两种废物文本型指标相似度计算示例Table 3 Example of similarity calculation of two waste text indicators based on cross entropy

将文本赋值后形成的数据分布，通过交叉熵计算两种废物的相似度，根据上述示例，发现A与B的相似度为86.54%。表明，在颜色、气味、物理形态分别为灰色、有刺激性气味、固态的条件下，未知废物与已知废物的相似概率为86.54%。The data distribution formed after the text assignment is used to calculate the similarity of the two wastes through cross entropy. According to the above example, it is found that the similarity between A and B is 86.54%. This shows that under the conditions that the color, smell, and physical form are gray, pungent smell, and solid, respectively, the probability of similarity between the unknown waste and the known waste is 86.54%.

以上具体实施方式仅用以说明本发明的技术方案而非限制，尽管参照实例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或者等同替换，而不脱离本发明技术方案的范围，其均应涵盖在本发明的权利要求范围当中。The above specific implementation methods are only used to illustrate the technical solutions of the present invention rather than to limit it. Although the present invention has been described in detail with reference to examples, those skilled in the art should understand that the technical solutions of the present invention can be modified or replaced by equivalents without departing from the scope of the technical solutions of the present invention, which should be included in the scope of the claims of the present invention.

Claims

1. A method for tracing the source of hazardous waste based on matching degree and cross entropy, characterized by comprising:

S1. Construct a hazardous waste fingerprint feature database, which contains the feature information of hazardous waste;

S2. The user inputs the corresponding search index, and the database matches the corresponding feature information according to the input search index;

S3. After matching, similarity calculation is performed according to the number of search indicators input; when the number of search indicators N is 1, a single indicator calculation model is used to calculate similarity; when N ≥ 2, a multi-indicator calculation model is used to calculate similarity;

S4. Display the traceability result according to the calculated similarity.

2. The hazardous waste tracing method according to claim 1 is characterized in that the database includes a plurality of rows of information and a plurality of columns of information, wherein each row represents a type of hazardous waste and each column represents a characteristic information.

3. The hazardous waste tracing method according to claim 2 is characterized in that the characteristic information includes industry classification, waste category, physical form, shape, magnetism, odor, color, appearance, material composition, characteristic indicators, and numerical indicators.

4. The hazardous waste source tracing method according to claim 1, characterized in that the single indicator calculation model is:

in,

5. The method for tracing the source of hazardous waste according to claim 4 is characterized in that the calculation method of the matching degree t is:

The calculation method of k is:

The search index entered by the user is set as

, the matching index is

,but

,

.

6. The method for tracing the source of hazardous waste according to claim 1 is characterized in that the similarity is calculated using a multi-index calculation model, which comprises:

S301, determining the type of search index input by the user, and classifying the search index into text index and numerical index;

S302, set the number of text indicators to N1, and the number of numerical indicators to N2; when N1 is 1, the single indicator calculation model is used to calculate the similarity of the text indicators; when N1≥2, the cross entropy calculation model is used to calculate the similarity of the text indicators, and finally the similarity corresponding to the text indicators is obtained.

;

S303, select similarity

and

7. The hazardous waste tracing method according to claim 6 is characterized in that, when the retrieval index is a plurality of numerical indexes, the numerical indexes of the input unknown wastes are formed into a set Y=(y ₁ ,y ₂ , y ₃ …y _n ), and the indexes of the matched known wastes are formed into a set X=(x ₁ ,x ₂ ,x ₃ …x _n ), and the probability distributions of the two data sets are calculated as q(y)=(q ₁ ,q ₂ ,q ₃ …q _n ) and p(x)=(p ₁ ,p ₂ ,p ₃ …p _n ), respectively.

The cross entropy probability of calculating unknown waste and known waste is calculated as

:

Where: i=1, 2...n;

The probability of calculating the distribution entropy of known waste indicators is

:

Then, the similarity between the unknown waste and the known waste is

for:

.

8. The method for tracing the source of hazardous waste according to claim 6 is characterized in that when the search index is a plurality of text-type indexes, the text-type indexes are assigned values,

The text indicators of unknown waste after assignment are formed into a set B=(b ₁ ,b ₂ ,b ₃ …b _n ), and the indicators of matched known waste are assigned and transformed to form a set A=(a ₁ ,a ₂ ,a ₃ …a _n ). The probability distribution of the two data sets is calculated as r(b)=(r ₁ ,r ₂ ,r ₃ …r _n ) and s(a)=(s ₁ ,s ₂ ,s ₃ …s _n ), respectively.

:

Where: i=1, 2...n;

:

Then, the similarity between the unknown waste and the known waste is

for:

.