CN116192705A

CN116192705A - Object-oriented protocol consistency problem positioning test method and system

Info

Publication number: CN116192705A
Application number: CN202211170332.5A
Authority: CN
Inventors: 巫钟兴; 刘宣; 杜艺娜; 赵兵; 林繁涛; 陈昊; 郑安刚; 唐悦; 朱子旭; 刘兴奇; 张宇鹏; 窦健; 郄爽; 尚怀赢; 韩月
Original assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI
Current assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2023-05-30

Abstract

The invention discloses a method and system for locating and testing object-oriented protocol consistency problems. The method includes: collecting interactive communication messages generated by a power consumption information collection system as original sample data to form an original sample database; Read the original sample data from the original sample database, perform data standardization processing on the original sample data, and form standard data; based on a large number of historical standard data records, form a standard sample with standard data including all the characteristic factors of the message as the training sample library, and perform model training on the training samples according to the machine learning algorithm to determine the prediction model; input the samples to be detected into the prediction model for abnormal detection, and output the detection results.

Description

Object-oriented protocol consistency problem location testing method and system

技术领域technical field

本发明涉及数据挖掘技术领域，并且更具体地，涉及一种面向对象协议一致性问题定位测试方法及系统。The present invention relates to the technical field of data mining, and more specifically, to an object-oriented protocol consistency problem location testing method and system.

背景技术Background technique

随着《DL\T-698.45面向对象的用电信息数据交换协议》的发布和广泛应用，因其较高的开发灵活性和易扩展性，极大的降低了用电信息采集系统和设备方面的系统开发和维护成本。但是高灵活性带来的弊端也是显而易见的，在《协议》被广泛应用的形势下，众多厂商对一些具体对象的属性或方法的描述存在自己的独特理解，从而导致在同一用采系统中运行的设备所使用的通信报文存在一致性上的偏差，致使不同厂家的设备间不能兼容，降低了用采系统的通信效率，增加了厂家的开发成本。由此急需一种符合主流的协议一致性评价规则。With the release and wide application of "DL\T-698.45 Object-Oriented Power Consumption Information Data Exchange Protocol", due to its high development flexibility and easy scalability, it has greatly reduced the power consumption information collection system and equipment. system development and maintenance costs. However, the disadvantages of high flexibility are also obvious. In the situation where the "Protocol" is widely used, many manufacturers have their own unique understanding of the description of the attributes or methods of some specific objects, which leads to There are deviations in the consistency of the communication messages used by the equipment of different manufacturers, which makes the equipment of different manufacturers incompatible, reduces the communication efficiency of the adopting system, and increases the development cost of the manufacturer. Therefore, there is an urgent need for a protocol conformance evaluation rule that conforms to the mainstream.

当前用于用电信息采集设备协议一致性检测的方法是指定具体的测试内容，通过返回数据的数据类型和数值进行严格比较，用于判断该设备是否符合协议一致性规范。这种测试形式的局限性非常大，第一，其测试用例固化，不方便扩展，且易于被设备厂商做特殊优化而通过检测。第二，其检测的覆盖率不足，指定的测试内容无法包含实际使用中的各种场景，达不到海量数据的要求。第三，检测规则未必符合主流的理解方式，限于检测软件的研发人员的业务认识的局限性，会导致检测规则的失准。The current method for protocol consistency detection of power consumption information collection equipment is to specify specific test content, and strictly compare the data type and value of the returned data to determine whether the device conforms to the protocol consistency specification. The limitations of this test form are very large. First, its test cases are fixed, inconvenient to expand, and easy to pass the test through special optimization by equipment manufacturers. Second, the coverage of its detection is insufficient, and the specified test content cannot cover various scenarios in actual use, which cannot meet the requirements of massive data. Third, the detection rules may not conform to the mainstream way of understanding, limited to the limitations of the business understanding of the R&D personnel of the detection software, which will lead to inaccurate detection rules.

发明内容Contents of the invention

根据本发明，提供了一种方法及系统，以解决当前用于用电信息采集设备协议一致性检测的方法是指定具体的测试内容，通过返回数据的数据类型和数值进行严格比较，用于判断该设备是否符合协议一致性规范。这种测试形式的局限性非常大的技术问题。According to the present invention, a method and system are provided to solve the problem that the current method used for the protocol consistency detection of electricity information collection equipment is to specify specific test content, and strictly compare the data type and value of the returned data for judgment Whether the device complies with the protocol conformance specification. The limitations of this form of testing are very technical.

根据本发明的第一个方面，提供了一种面向对象协议一致性问题定位测试方法，包括：According to the first aspect of the present invention, an object-oriented protocol consistency problem location testing method is provided, including:

将用电信息采集系统产生的交互通信报文作为原始样本数据进行收集，形成原始样本库；Collect the interactive communication messages generated by the electricity consumption information collection system as the original sample data to form the original sample library;

从所述原始样本库中读取原始样本数据，对所述原始样本数据进行数据标准化处理，形成标准数据；reading original sample data from the original sample library, and performing data standardization processing on the original sample data to form standard data;

基于海量的历史标准数据记录，形成以包含报文所有特征因子的标准数据为训练样本的标准样本库，并根据机器学习算法对所述训练样本进行模型训练，确定预测模型；Based on massive historical standard data records, form a standard sample library with standard data containing all the characteristic factors of the message as training samples, and perform model training on the training samples according to the machine learning algorithm to determine the prediction model;

将待检测样本输入所述预测模型进行异常检测，并输出检测结果。The samples to be detected are input into the prediction model for anomaly detection, and the detection results are output.

可选地，对所述原始样本数据进行数据标准化处理，形成标准数据，包括：Optionally, performing data standardization processing on the original sample data to form standard data, including:

对所述原始样本数据进行数据标准化处理，按预定规则将报文的帧结构分割成某一对象的属性或者方法的参数单元；Perform data standardization processing on the original sample data, and divide the frame structure of the message into the attribute of a certain object or the parameter unit of the method according to a predetermined rule;

根据参数单元的数据类型生成该参数单元的特征因子，将所有的特征因子序列化后形成标准数据。According to the data type of the parameter unit, the characteristic factor of the parameter unit is generated, and all the characteristic factors are serialized to form standard data.

可选地，按预定规则将报文的帧结构分割成某一对象的属性或者方法的参数单元，包括：Optionally, the frame structure of the message is divided into attribute or method parameter units of a certain object according to predetermined rules, including:

按预定规则将报文分为起始符，长度域，控制域，地址域，帧头校验，链路用户数据，帧校验，结束符，确定某一对象的属性或者方法的参数单元。According to the predetermined rules, the message is divided into a start character, a length field, a control field, an address field, a frame header check, a link user data, a frame check, a terminator, and a parameter unit that determines an attribute of an object or a method.

可选地，根据参数单元的数据类型生成该参数单元的特征因子，将所有的特征因子序列化后形成标准数据，包括：Optionally, generate the feature factor of the parameter unit according to the data type of the parameter unit, and serialize all the feature factors to form standard data, including:

判定所述参数单元是否符合预定规范，若所述参数单元符合预定规范，生成该参数单元的特征因子，若所述参数单元不符合预定规范，判断报文为异常样本数据并舍弃；Determining whether the parameter unit meets the predetermined specification, if the parameter unit meets the predetermined specification, generating a characteristic factor of the parameter unit, if the parameter unit does not meet the predetermined specification, judging that the message is abnormal sample data and discarding it;

当原始数据提取完成特征因子之后，按报文的解析结构顺序，以决策树式的方式形成标准数据流。After the original data is extracted and the characteristic factors are completed, a standard data stream is formed in a decision tree-like manner according to the order of the parsed structure of the message.

可选地，将待检测样本输入所述预测模型进行异常检测，并输出检测结果，包括：Optionally, the samples to be detected are input into the prediction model for anomaly detection, and the detection results are output, including:

将待检测样本输入所述预测模型进行异常检测，确定所述待检测样本归属于各个类别的概率分布；Inputting the samples to be detected into the prediction model for anomaly detection, and determining the probability distribution that the samples to be detected belong to each category;

当没有任意分类类别的概率分布大于规定的阈值时，则判定所述待检测样本数据存在异常，不符合协议一致性规则；When the probability distribution of no arbitrary classification category is greater than the specified threshold, it is determined that the sample data to be detected is abnormal and does not conform to the protocol consistency rule;

若某一个类别的概率分布大于规定的阈值时，则可以判定所述待检测样本数据符合协议一致性规则中的一个子分类。If the probability distribution of a certain category is greater than the specified threshold, it can be determined that the sample data to be detected conforms to a subcategory in the protocol consistency rule.

根据本发明的另一个方面，还提供了一种面向对象协议一致性问题定位测试系统，包括：According to another aspect of the present invention, an object-oriented protocol consistency problem location testing system is also provided, including:

形成原始样本库模块，用于将用电信息采集系统产生的交互通信报文作为原始样本数据进行收集，形成原始样本库；Form an original sample library module, which is used to collect the interactive communication messages generated by the electricity consumption information collection system as original sample data to form an original sample library;

形成标准数据模块，用于从所述原始样本库中读取原始样本数据，对所述原始样本数据进行数据标准化处理，形成标准数据；Forming a standard data module for reading original sample data from the original sample library, performing data standardization processing on the original sample data, and forming standard data;

确定预测模型模块，用于基于海量的历史标准数据记录，形成以包含报文所有特征因子的标准数据为训练样本的标准样本库，并根据机器学习算法对所述训练样本进行模型训练，确定预测模型；Determine the prediction model module, which is used to form a standard sample library based on a large amount of historical standard data records, using standard data containing all the characteristic factors of the message as training samples, and perform model training on the training samples according to the machine learning algorithm to determine the prediction Model;

输出检测结果模块，用于将待检测样本输入所述预测模型进行异常检测，并输出检测结果。The detection result output module is used to input the samples to be detected into the prediction model for anomaly detection, and output the detection results.

可选地，形成标准数据模块，包括：Optionally, form a standard data module, including:

分割为参数单元子模块，用于对所述原始样本数据进行数据标准化处理，按预定规则将报文的帧结构分割成某一对象的属性或者方法的参数单元；Divide into parameter unit sub-modules, which are used to perform data standardization processing on the original sample data, and divide the frame structure of the message into parameter units of attributes or methods of a certain object according to predetermined rules;

形成标准数据子模块，用于根据参数单元的数据类型生成该参数单元的特征因子，将所有的特征因子序列化后形成标准数据。A standard data sub-module is formed, which is used to generate characteristic factors of the parameter unit according to the data type of the parameter unit, and all characteristic factors are serialized to form standard data.

可选地，分割为参数单元子模块，包括：Optionally split into parameter unit submodules, including:

确定参数单元，用于按预定规则将报文分为起始符，长度域，控制域，地址域，帧头校验，链路用户数据，帧校验，结束符，确定某一对象的属性或者方法的参数单元。Determine the parameter unit, which is used to divide the message into start character, length field, control field, address field, frame header check, link user data, frame check, end character according to predetermined rules, and determine the attributes of an object Or the parameter unit of a method.

可选地，形成标准数据子模块，包括：Optionally, form standard data submodules, including:

判定参数单元，用于判定所述参数单元是否符合预定规范，若所述参数单元符合预定规范，生成该参数单元的特征因子，若所述参数单元不符合预定规范，判断报文为异常样本数据并舍弃；judging the parameter unit, used to determine whether the parameter unit conforms to the predetermined specification, if the parameter unit conforms to the predetermined specification, generate the characteristic factor of the parameter unit, if the parameter unit does not conform to the predetermined specification, determine that the message is abnormal sample data and discard;

形成标准数据流单元，用于当原始数据提取完成特征因子之后，按报文的解析结构顺序，以决策树式的方式形成标准数据流。A standard data flow unit is formed, which is used to form a standard data flow in a decision tree-like manner according to the order of the parsed structure of the message after the original data has been extracted to complete the eigenfactors.

可选地，输出检测结果模块，包括：Optionally, the output detection result module includes:

确定概率分布子模块，用于将待检测样本输入所述预测模型进行异常检测，确定所述待检测样本归属于各个类别的概率分布；Determine the probability distribution sub-module, which is used to input the samples to be detected into the prediction model for anomaly detection, and determine the probability distribution that the samples to be detected belong to each category;

判定不符合一致性子模块，用于当没有任意分类类别的概率分布大于规定的阈值时，则判定所述待检测样本数据存在异常，不符合协议一致性规则；A sub-module for judging non-compliance is used for judging that the sample data to be detected is abnormal and does not conform to the protocol consistency rule when the probability distribution of any classification category is greater than the specified threshold;

判定符合一致性子模块，用于若某一个类别的概率分布大于规定的阈值时，则可以判定所述待检测样本数据符合协议一致性规则中的一个子分类。The conformity determination sub-module is used to determine that the sample data to be tested conforms to a sub-category in the protocol consistency rule if the probability distribution of a certain category is greater than a specified threshold.

从而，通过既有的样本数据库进行基期学习，训练得到一个最优化的预测模型，该模型可以对任何与样本库中类似结构的样本进行预测，判定他是否符合规则。如果需要对其进行扩展只需增加相应分类下的样本库重新进行训练即可，而不需要改动软件本身，具有高度的可扩展性。其检测的数据来源于设备日常运行的交互报文，极大的包含了设备在各种场景下的应用工况，提升了测试的覆盖率，满足日常使用过程中的海量报文数据要求。该检测方法的报文样本库来自于用采系统中实际交互的场景，包含了众多的终端厂家，样本的丰富性和随机性足以保障。通过该样本库训练得到的分析预测模型具有足够的典型性，能够代表众多厂家的主流理解方式。Therefore, through the existing sample database for base period learning, an optimized prediction model can be obtained through training. This model can predict any sample with a similar structure in the sample database to determine whether it conforms to the rules. If you need to expand it, you only need to increase the sample library under the corresponding classification and retrain without changing the software itself, which is highly scalable. The detected data comes from the interactive messages of the daily operation of the equipment, which greatly includes the application conditions of the equipment in various scenarios, improves the coverage of the test, and meets the requirements of massive message data in the daily use process. The message sample library of this detection method comes from the actual interaction scene in the user-collection system, including many terminal manufacturers, and the richness and randomness of the samples are sufficient to guarantee. The analysis and prediction model obtained through the training of the sample library is typical enough to represent the mainstream understanding methods of many manufacturers.

附图说明Description of drawings

通过参考下面的附图，可以更为完整地理解本发明的示例性实施方式：A more complete understanding of the exemplary embodiments of the present invention can be had by referring to the following drawings:

图1为本实施方式所述的一种面向对象协议一致性问题定位测试方法的流程示意图；FIG. 1 is a schematic flow diagram of an object-oriented protocol consistency problem location testing method described in this embodiment;

图2为本实施方式所述的数据标准化模型示意图；FIG. 2 is a schematic diagram of the data standardization model described in this embodiment;

图3为本实施方式所述的数据序列化的示意图；FIG. 3 is a schematic diagram of data serialization described in this embodiment;

图4为本实施方式所述的的一种面向对象协议一致性问题定位测试系统的示意图。FIG. 4 is a schematic diagram of an object-oriented protocol conformance problem location testing system described in this embodiment.

具体实施方式Detailed ways

现在参考附图介绍本发明的示例性实施方式，然而，本发明可以用许多不同的形式来实施，并且不局限于此处描述的实施例，提供这些实施例是为了详尽地且完全地公开本发明，并且向所属技术领域的技术人员充分传达本发明的范围。对于表示在附图中的示例性实施方式中的术语并不是对本发明的限定。在附图中，相同的单元/元件使用相同的附图标记。Exemplary embodiments of the present invention will now be described with reference to the drawings; however, the present invention may be embodied in many different forms and are not limited to the embodiments described herein, which are provided for the purpose of exhaustively and completely disclosing the present invention. invention and fully convey the scope of the invention to those skilled in the art. The terms used in the exemplary embodiments shown in the drawings do not limit the present invention. In the figures, the same units/elements are given the same reference numerals.

除非另有说明，此处使用的术语(包括科技术语)对所属技术领域的技术人员具有通常的理解含义。另外，可以理解的是，以通常使用的词典限定的术语，应当被理解为与其相关领域的语境具有一致的含义，而不应该被理解为理想化的或过于正式的意义。Unless otherwise specified, the terms (including scientific and technical terms) used herein have the commonly understood meanings to those skilled in the art. In addition, it can be understood that terms defined by commonly used dictionaries should be understood to have consistent meanings in the context of their related fields, and should not be understood as idealized or overly formal meanings.

根据本发明的第一个方面，提供了一种面向对象协议一致性问题定位测试方法100，参考图1所示，该方法100包括：According to a first aspect of the present invention, an object-oriented protocol conformance problem location testing method 100 is provided, as shown in FIG. 1 , the method 100 includes:

S101:将用电信息采集系统产生的交互通信报文作为原始样本数据进行收集，形成原始样本库；S101: Collect the interactive communication messages generated by the electricity consumption information collection system as original sample data to form an original sample library;

S102:从所述原始样本库中读取原始样本数据，对所述原始样本数据进行数据标准化处理，形成标准数据；S102: read original sample data from the original sample library, and perform data standardization processing on the original sample data to form standard data;

S103:基于海量的历史标准数据记录，形成以包含报文所有特征因子的标准数据为训练样本的标准样本库，并根据机器学习算法对所述训练样本进行模型训练，确定预测模型；S103: Based on a large amount of historical standard data records, form a standard sample library with the standard data containing all the characteristic factors of the message as the training samples, and carry out model training on the training samples according to the machine learning algorithm to determine the prediction model;

S104:将待检测样本输入所述预测模型进行异常检测，并输出检测结果。S104: Input the samples to be detected into the prediction model for abnormality detection, and output the detection results.

具体地，包括：Specifically, including:

S1：数据准备。通过对用电信息采集系统每天产生的交互通信报文进行收集，形成报文素材库。S1: Data preparation. By collecting the interactive communication messages generated by the power consumption information collection system every day, a message material library is formed.

S2：数据标准化。对原始的报文素材库进行分析处理，按《协议》的约定规则将报文的帧结构分割成某一对象的属性或者方法的参数单元，之后根据每个参数单元的数据类型生成该参数单元的特征因子，避免因为具体数据内容的差异导致的特征损失，最后将所有的特征因子序列化后形成标准数据。S2: Data normalization. Analyze and process the original message material library, divide the frame structure of the message into attribute or method parameter units of a certain object according to the agreed rules in the "Protocol", and then generate the parameter unit according to the data type of each parameter unit eigenfactors to avoid feature loss due to differences in specific data content, and finally serialize all eigenfactors to form standard data.

S3：训练分析模型。基于海量的历史数据记录，形成以包含报文所有特征因子的标准数据为样本的训练样本库，使用计算机的机器学习算法进行模型训练。S3: Train the analysis model. Based on a large amount of historical data records, a training sample library is formed with standard data containing all the characteristic factors of the message as samples, and the machine learning algorithm of the computer is used for model training.

S4：异常数据检测。当有需要检测的报文时，同样经过步骤S2的数据标准化操作后，得到的就是以标准数据为样本的测试样本库。输入异常检测分析模型后，给出检测结论及对应的概率分布。S4: Abnormal data detection. When there is a message that needs to be detected, after the data standardization operation in step S2, the test sample library with standard data as samples is obtained. After inputting the anomaly detection analysis model, the detection conclusion and corresponding probability distribution are given.

进一步地，所述步骤中S2的帧结构解析模型，如表1所示，Further, the frame structure analysis model of S2 in the step, as shown in Table 1,

表1Table 1

按《协议》的规定可以详细的分为起始符，长度域，控制域，地址域，帧头校验，链路用户数据，帧校验，结束符。其中链路用户数据可以进一步的按《协议》约定划分为不同的应用服务类型。According to the provisions of the "Protocol", it can be divided into start character, length field, control field, address field, frame header check, link user data, frame check, end character. Among them, link user data can be further divided into different application service types according to the "Agreement".

进一步的，所述步骤中S3的训练模型定义为集成学习中的随机森林算法，这是一种基于if-then-else规则的有监督学习算法，他的解释性强，也符合人类的直观思维。Further, the training model of S3 in the above steps is defined as a random forest algorithm in integrated learning, which is a supervised learning algorithm based on if-then-else rules, which is highly explanatory and conforms to human intuitive thinking .

随机森林的基础是决策树算法，决策树是一种解决分类问题的算法。其采用树形结构，使用层层推理来实现最终的分类。决策树由下面几种元素构成：The basis of random forests is the decision tree algorithm, which is an algorithm for solving classification problems. It adopts a tree structure and uses layer-by-layer reasoning to achieve the final classification. A decision tree consists of the following elements:

根节点：包含样本的全集Root node: contains the full set of samples

内部节点：对应特征属性测试Internal node: Corresponding feature attribute test

叶节点：代表决策的结果Leaf node: represents the result of the decision

预测时，在树的内部节点处用某一属性值进行判断，根据判断结果决定进入哪个分支节点，直到到达叶节点处，得到分类结果。随机森林是由很多决策树构成的，不同决策树之间没有关联。随机森林最后的预测结果就是众多决策树预测结果的众数。When predicting, judge with a certain attribute value at the internal node of the tree, and decide which branch node to enter according to the judgment result, until the leaf node is reached, and the classification result is obtained. A random forest is composed of many decision trees, and there is no correlation between different decision trees. The final prediction result of the random forest is the mode of the prediction results of many decision trees.

具体实施方式：Detailed ways:

结合附图及具体实施例，对本发明进行进一步详细说明。实施例：The present invention will be described in further detail in conjunction with the accompanying drawings and specific embodiments. Example:

S1：如图2所示，软件从原始样本数据库中读原始样本数据DA1，经过数据标准化功能模块后，原始样本DA1被按帧结构解析为数个参数单元PA1…PAn，参数单元的标记规则遵循ASN.1的抽象语法，详见GB/T16262.1—2006。跟进ASN.1的抽象语法规则，提取当前参数单元PAm的特征因子Tm，Tm即当前参数单元的具体数据类型，详细定义见《协议》中数据类型定义部分。S1: As shown in Figure 2, the software reads the original sample data DA1 from the original sample database. After the data standardization function module, the original sample DA1 is parsed into several parameter units PA1...PAn according to the frame structure. The marking rules of the parameter units follow ASN .1 abstract syntax, see GB/T16262.1-2006 for details. Following the abstract syntax rules of ASN.1, extract the characteristic factor Tm of the current parameter unit PAm. Tm is the specific data type of the current parameter unit. For detailed definitions, see the data type definition section in the "Protocol".

S2：当原始数据提取完成特征因子之后，需要进行数据序列化。序列化的原则即按照随机森林算法的理论决策树定义，如图3按报文的解析结构顺序，以决策树式的方式形成标准数据流，如表2示：S2: After the feature factor is extracted from the original data, data serialization is required. The principle of serialization is defined according to the theoretical decision tree of the random forest algorithm, as shown in Figure 3, according to the order of the parsed structure of the message, a standard data flow is formed in a decision tree-like manner, as shown in Table 2:

表2Table 2

参数单元1parameter unit 1 参数单元2Parameter unit 2 参数单元…Parameter unit... …… 参数单元NParameter unit N 特征因子1Eigenfactor 1 特征因子2Characteristic factor 2 特征因子…Characteristic factor… …… 特征因子NCharacteristic factor N

其中检测到任意参数单元不符合《协议》规定，则判断此报文为异常样本数据并舍弃。If it is detected that any parameter unit does not comply with the provisions of the "Agreement", the message is judged to be abnormal sample data and discarded.

S3：训练模型过程可以使用python中的sklearn.ensemble.RandomForestRegressor()库进行。RandomForestRegressor函数主要性能参数：S3: The process of training the model can be performed using the sklearn.ensemble.RandomForestRegressor() library in python. The main performance parameters of the RandomForestRegressor function:

max_features：随机森林允许单个决策树使用特征的最大数量。max_features: The maximum number of features that a random forest allows a single decision tree to use.

n_estimators：随机森林中决策树的数量。n_estimators: Number of decision trees in the random forest.

min_sample_leaf：叶子节点最少样本数。min_sample_leaf: The minimum number of samples for leaf nodes.

该函数会返回一个使用训练样本训练完成的预测模型。This function returns a predictive model trained using training samples.

S4：使用预测模型对待检测样本数据进行异常检测，预测模型提供一个对外的接口函数predict_proba(testdata)，该函数返回一个一维向量，分别是testdata归属于各个类别的概率分布，当没有任意分类类别的概率分布大于规定的阈值时，则判定该测试样本数据存在异常，不符合协议一致性规则。如果某一个类别的概率分布大于规定的阈值时，则可以判定该测试样本数据符合协议一致性规则中的一个子分类。S4: Use the prediction model to detect the abnormality of the sample data to be tested. The prediction model provides an external interface function predict_proba(testdata), which returns a one-dimensional vector, which is the probability distribution of the testdata belonging to each category. When there is no classification category When the probability distribution of is greater than the specified threshold, it is determined that the test sample data is abnormal and does not conform to the protocol consistency rules. If the probability distribution of a certain category is greater than the specified threshold, it can be determined that the test sample data conforms to a subcategory in the protocol consistency rules.

根据本发明的另一个方面，还提供了一种面向对象协议一致性问题定位测试系统400，参考图4示，该系统400包括：According to another aspect of the present invention, an object-oriented protocol conformance problem location testing system 400 is also provided. Referring to FIG. 4, the system 400 includes:

形成原始样本库模块410，用于将用电信息采集系统产生的交互通信报文作为原始样本数据进行收集，形成原始样本库；Form an original sample library module 410, which is used to collect the interactive communication messages generated by the electricity consumption information collection system as original sample data to form an original sample library;

形成标准数据模块420，用于从所述原始样本库中读取原始样本数据，对所述原始样本数据进行数据标准化处理，形成标准数据；Forming a standard data module 420, which is used to read original sample data from the original sample library, perform data standardization processing on the original sample data, and form standard data;

确定预测模型模块430，用于基于海量的历史标准数据记录，形成以包含报文所有特征因子的标准数据为训练样本的标准样本库，并根据机器学习算法对所述训练样本进行模型训练，确定预测模型；Determine the prediction model module 430, which is used to form a standard sample library based on a large amount of historical standard data records with standard data containing all feature factors of the message as training samples, and perform model training on the training samples according to machine learning algorithms, and determine predictive model;

输出检测结果模块440，用于将待检测样本输入所述预测模型进行异常检测，并输出检测结果。The detection result output module 440 is configured to input the samples to be detected into the prediction model for anomaly detection, and output the detection result.

本发明的实施例的一种面向对象协议一致性问题定位测试系统400与本发明的另一个实施例的一种面向对象协议一致性问题定位测试方法100相对应，在此不再赘述。An object-oriented protocol conformance problem location testing system 400 in an embodiment of the present invention corresponds to an object-oriented protocol consistency problem location testing method 100 in another embodiment of the present invention, and will not be repeated here.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请实施例中的方案可以采用各种计算机语言实现，例如，面向对象的程序设计语言Java和直译式脚本语言JavaScript等。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The solutions in the embodiments of the present application can be realized by using various computer languages, for example, the object-oriented programming language Java and the literal translation scripting language JavaScript.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

尽管已描述了本申请的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。While preferred embodiments of the present application have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, the appended claims are intended to be construed to cover the preferred embodiment and all changes and modifications which fall within the scope of the application.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims

1. The object-oriented protocol consistency problem positioning test method is characterized by comprising the following steps:

collecting an interactive communication message generated by an electricity consumption information acquisition system as original sample data to form an original sample library;

reading original sample data from the original sample library, and performing data standardization processing on the original sample data to form standard data;

forming a standard sample library taking standard data containing all characteristic factors of a message as training samples based on massive historical standard data records, carrying out model training on the training samples according to a machine learning algorithm, and determining a prediction model;

and inputting the sample to be detected into the prediction model for abnormality detection, and outputting a detection result.

2. The method of claim 1, wherein performing data normalization processing on the raw sample data to form standard data comprises:

carrying out data standardization processing on the original sample data, and dividing the frame structure of the message into attribute of a certain object or parameter units of a method according to a preset rule;

and generating characteristic factors of the parameter unit according to the data type of the parameter unit, and serializing all the characteristic factors to form standard data.

3. The method according to claim 2, wherein the partitioning of the frame structure of the message into the attributes of an object or the parameter elements of the method according to a predetermined rule comprises:

the message is divided into a start symbol, a length field, a control field, an address field, a frame header check, link user data, a frame check and an end symbol according to a preset rule, and the attribute of an object or a parameter unit of a method is determined.

4. The method of claim 2, wherein generating the feature factors of the parameter unit according to the data type of the parameter unit, and serializing all feature factors to form the standard data comprises:

judging whether the parameter unit accords with a preset specification, if the parameter unit accords with the preset specification, generating a characteristic factor of the parameter unit, and if the parameter unit does not accord with the preset specification, judging that the message is abnormal sample data and discarding the message;

after the feature factors are extracted from the original data, a standard data stream is formed in a decision tree mode according to the analysis structure sequence of the message.

5. The method according to claim 1, wherein inputting the sample to be detected into the predictive model for abnormality detection and outputting the detection result comprises:

inputting a sample to be detected into the prediction model for abnormality detection, and determining probability distribution of the sample to be detected belonging to each category;

when the probability distribution of any classification category is not greater than a specified threshold, judging that the sample data to be detected is abnormal and does not accord with the protocol consistency rule;

if the probability distribution of a certain class is larger than a specified threshold, the sample data to be detected can be judged to accord with one sub-class in the protocol consistency rule.

6. An object-oriented protocol consistency problem location test system, comprising:

the original sample library forming module is used for collecting the interactive communication message generated by the electricity consumption information acquisition system as original sample data to form an original sample library;

the standard data forming module is used for reading original sample data from the original sample library, and carrying out data standardization processing on the original sample data to form standard data;

the prediction model determining module is used for forming a standard sample library taking standard data containing all characteristic factors of the message as training samples based on massive historical standard data records, performing model training on the training samples according to a machine learning algorithm, and determining a prediction model;

and the output detection result module is used for inputting the sample to be detected into the prediction model for abnormality detection and outputting a detection result.

7. The system of claim 6, wherein forming the standard data module comprises:

the sub-module is divided into parameter units, which is used for carrying out data standardization processing on the original sample data and dividing the frame structure of the message into parameter units of an attribute or a method of a certain object according to a preset rule;

and forming a standard data sub-module, which is used for generating characteristic factors of the parameter unit according to the data type of the parameter unit, and serializing all the characteristic factors to form standard data.

8. The system of claim 7, wherein the partitioning into parameter unit sub-modules comprises:

and the parameter determining unit is used for dividing the message into a start symbol, a length domain, a control domain, an address domain, a frame header check, link user data, a frame check and an end symbol according to a preset rule, and determining the attribute of an object or the parameter unit of a method.

9. The system of claim 7, wherein forming the standard data sub-module comprises:

the judging parameter unit is used for judging whether the parameter unit accords with a preset specification, generating a characteristic factor of the parameter unit if the parameter unit accords with the preset specification, and judging that the message is abnormal sample data and discarding the message if the parameter unit does not accord with the preset specification;

and forming a standard data stream unit, wherein the standard data stream unit is used for forming a standard data stream in a decision tree mode according to the analysis structure sequence of the message after the feature factors are extracted from the original data.

10. The system of claim 6, wherein the output test result module comprises:

the probability distribution determining sub-module is used for inputting a sample to be detected into the prediction model for abnormal detection and determining probability distribution of the sample to be detected belonging to each category;

the judging non-conforming sub-module is used for judging that the sample data to be detected is abnormal and does not conform to the protocol conforming rule when the probability distribution of any classification category is larger than a specified threshold value;

and the consistency judging sub-module is used for judging that the sample data to be detected accords with one sub-category in the protocol consistency rule if the probability distribution of one category is larger than a specified threshold value.