CN110555051B - Product test abnormal behavior detection system based on behavior sequence analysis - Google Patents

Product test abnormal behavior detection system based on behavior sequence analysis Download PDF

Info

Publication number
CN110555051B
CN110555051B CN201810456933.XA CN201810456933A CN110555051B CN 110555051 B CN110555051 B CN 110555051B CN 201810456933 A CN201810456933 A CN 201810456933A CN 110555051 B CN110555051 B CN 110555051B
Authority
CN
China
Prior art keywords
data
sequence
module
similarity
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810456933.XA
Other languages
Chinese (zh)
Other versions
CN110555051A (en
Inventor
张贝格
姜丽红
蔡鸿明
叶聪聪
于晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810456933.XA priority Critical patent/CN110555051B/en
Publication of CN110555051A publication Critical patent/CN110555051A/en
Application granted granted Critical
Publication of CN110555051B publication Critical patent/CN110555051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A system for detecting abnormal behavior of a product test based on behavior sequence analysis, comprising: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module, wherein: the data preprocessing module acquires and analyzes the quality detection data record file, generates structured data, outputs the structured data to the sequence model construction module and the storage module respectively, the sequence model construction module calculates the sequence similarity of each group of data, clusters the sequence similarity according to the sequence similarity, and outputs cluster centers representing conventional behavior clusters to the storage module and the prediction module as conventional behavior models, and the prediction module calculates offset between any batch of data and the conventional behavior models according to the conventional behavior models and realizes abnormal behavior detection by comparing the offset. According to the invention, the similarity difference of the data sequences is analyzed, a conventional data recording behavior model is established, and the abnormality in the data recording process is detected, so that the reliability evaluation of the product quality detection data is obtained.

Description

Product test abnormal behavior detection system based on behavior sequence analysis
Technical Field
The invention relates to a technology in the field of information processing, in particular to a product test abnormal behavior detection system based on behavior sequence analysis.
Background
In the manufacturing industry, if a inspector does not actually test a product, but based on some of the actual test results, certain strategies are adopted to forge data, so that the forged data are also within reasonable error range, the false data are difficult to find, but the quality inspection result becomes unreliable. The existing abnormal behavior detection method comprises the steps of learning the characteristics of abnormal behaviors under the condition of a large number of labels, and detecting whether the known abnormal behaviors exist in new data according to the characteristics; and when the abnormal mode cannot be confirmed and represented by the characteristics, establishing a conventional behavior model, and finding abnormal behaviors by detecting deviations from the conventional behavior model. The strategies adopted by different testers in forging false data may be different, and it is difficult to build a model for each abnormal behavior when the labeled data set is less.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a product testing abnormal behavior detection system based on behavior sequence analysis, which is used for establishing a conventional data recording behavior model by analyzing similarity differences of data sequences and detecting the abnormality in the data recording process, so that the reliability of product quality detection data is evaluated.
The invention is realized by the following technical scheme:
the invention comprises the following steps: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module, wherein: the data preprocessing module acquires and analyzes the quality detection data record file, generates structured data, outputs the structured data to the sequence model construction module and the storage module respectively, the sequence model construction module calculates the sequence similarity of each group of data, clusters the sequence similarity according to the sequence similarity, and outputs cluster centers representing conventional behavior clusters to the storage module and the prediction module as conventional behavior models, and the prediction module calculates offset between any batch of data and the conventional behavior models according to the conventional behavior models and realizes abnormal behavior detection by comparing the offset.
Technical effects
Compared with the prior art, the method and the device realize the process of data reliability assessment by modeling the data recording behavior based on quality test data and by utilizing the sequence information of the data and by a behavior model analysis method. Calculating sequence similarity for different sub-data segments to obtain the highest similarity sub-sequence score existing in each sub-sequence segment; for the same data sequence, the difference between the highest sequence similarity scores among different subsets can reflect the strategy change before and after the data sequence generation process; clustering the internal sequence similarity differences of the data sequences, wherein the largest cluster can represent the conventional behavior; the reliability of the batch data can be predicted from deviations from the conventional behavioral model.
Drawings
FIG. 1 is a schematic diagram of the structure of the present invention;
fig. 2 is a block diagram of an embodiment of the present invention.
Detailed Description
As shown in fig. 2, the present embodiment includes: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module.
The data preprocessing module acquires quality detection data record files, wherein each file is a product inspection result of a certain batch, analyzes the data file, extracts data information in the data file, and converts a sequence of the data record into an ordered list for representation, and specifically comprises the following steps:
each element in the list represents the test result of one product in the batch, then the data line with the missing value or obvious abnormal value is removed through data cleaning, the list is uniformly divided into a plurality of non-repeated sub-data segments according to the set segmentation number, and the structured data is stored in a storage module and is simultaneously sent to a sequence model building module for further processing.
The sequence model construction module receives structured segment data information D, wherein D represents a data list obtained after processing a data file, and D comprises n sequences D with equal length 1 ,d 2 ,…,d n The method comprises the steps of carrying out a first treatment on the surface of the The sequence model construction module constructs each segmented data sequence d i Sub-sequence division is carried out, and two sub-sequences obtained after four groups of division are used for carrying out local sequence by using a dynamic programming algorithmAfter comparing and calculating the sequence similarity score matrix, taking the maximum value in the obtained similarity matrix to represent the maximum similarity subsequence score s in the segment i I.e. the highest sequence similarity among them.
The division, considering that the data segment pasted after modification may be adjacent to the original data segment or have an interval, is divided into two cases, one is to divide the sequence into two sequences a, b directly from the middle, the other is to assume that the length of the sub-sequence to be divided is L, take an interval value as gap_length, splice [0, gap_length ], [2 x gap_length,3 x gap_length) … into a sub-sequence a g The rest part is spliced into another subsequence b g
Considering that the pasted data segment may be in the opposite sequence relation with the original data segment, the two sub-sequences are respectively obtained by taking one sub-sequence for the two divisions
Figure BDA0001659908810000021
b and->
Figure BDA0001659908810000022
b g
The dynamic programming algorithm calculates the sequence similarity score matrix in the following manner
Figure BDA0001659908810000023
Wherein: a is a sequence similarity scoring matrix, a, b respectively represent two subsequences to be compared, s (a i ,b j ) Representing the similarity between the ith element in the sequence a and the jth element in the sequence b, A ij Representing the alignment of two sequences from front to back to element a i ,b j C represents a gap penalty, n and m are the lengths of the sequences a, b, respectively. In the calculation process, firstly, initializing a matrix, wherein A is i,0 =A 0,j =0, (0.ltoreq.i.ltoreq.n, 0.ltoreq.j.ltoreq.m). For each subsequent A i,j Are calculated from the element scores that have been calculated previously.
By introducing a gap penalty mechanism, this algorithm can also find sequences with high similarity in addition to matching identical sequences.
The mechanism of gap penalty refers to: for two sequence segments, if one of the skipped intervals or if several elements are repeated the same as the other, then this interval is penalized. In the algorithm, the interval is 1 penalty of c.
The clustering is divided into four groups of the data list D obtained after the data file processing, namely, all s are taken respectively i The coefficient of variation and the relative difference are calculated, namely: coefficient of variation: CV = σ/μ, relatively very poor: rr= (max-min)/μ, four groups (CV, RR) are used to represent differences of sequence similarity inside each batch of data D, and based on a large number of different batches of test data, the (CV, RR) values are clustered, wherein the largest cluster represents normal behavior, and the cluster center of the cluster is selected as a normal behavior model and stored in the storage module.
The prediction module receives four groups of value pairs representing the difference of the similarity of the internal sequence of any batch of data calculated by the sequence model construction module, takes out a conventional behavior model from the storage module, and calculates the distance between the two. By mapping the distance between 0 and 1 with the tanh function, a deviation of the batch data in percent from the conventional behavior model is obtained. And selecting the maximum value from the four offsets of one batch of data, comparing the maximum value with a set threshold value, and judging that the quality data of the batch of products has fake behaviors in the recording process if the maximum value exceeds the threshold value, otherwise, judging that the quality data of the batch of products does not have the non-compliant behaviors.
The storage module is provided with a database for storing the processed structured data, if parameters are adjusted, the model can be recalculated, and the file system stores model files obtained in the modeling process and is used for judging the category of the given batch data and analyzing the reliability of the given batch data by extracting the model from the prediction module.
The system specifically detects abnormal behaviors by the following modes: the data preprocessing module reads the product quality data file, analyzes the product quality data file to obtain a batch of product testing original data, and transmits the structured data obtained through data cleaning and segmentation processing to the sequence model construction module, and simultaneously stores a part of the structured data into the storage module; the sequence model construction module calculates the internal highest sequence similarity of the segmented data, represents sequence similarity difference of different segments by using a variation coefficient and relative extremely difference, finds out the largest cluster to represent conventional behavior through clustering, uses a cluster center as a conventional behavior model, and stores the conventional behavior model into the storage module and the prediction module; and the prediction module obtains the reliability of the batch of product data according to the deviation of the similarity difference between the conventional behavior model and the internal sequence of the data to be detected.
The comparison of the technical indexes of the work and the invention effects of similar products at home and abroad is shown in Table 1
TABLE 1 comparison of inventive effects
Figure BDA0001659908810000041
Compared with the prior art, the invention does not need to collect other information as an aid in the process of collecting the product quality data, but directly uses the obtained product quality data for analysis. In the analysis process, data characteristics caused by a behavior sequence are presumed from possible irregular behaviors of a data recorder, and a conventional behavior model is established by analyzing unlabeled data so as to compare with the behavior of a new data sequence, thereby finding data fake behaviors and realizing analysis of data reliability. The method does not depend on expert opinion and does not need to collect additional information, so that the method solves the problem of checking the product quality data and provides thought for reliability analysis of more types of data.
The foregoing embodiments may be partially modified in numerous ways by those skilled in the art without departing from the principles and spirit of the invention, the scope of which is defined in the claims and not by the foregoing embodiments, and all such implementations are within the scope of the invention.

Claims (3)

1. A system for detecting abnormal behavior of a product test based on behavior sequence analysis, comprising: the system comprises a data preprocessing module, a sequence model construction module, a storage module and a prediction module, wherein: the data preprocessing module acquires and analyzes the quality detection data record file, generates structured data, outputs the structured data to the sequence model construction module and the storage module respectively, the sequence model construction module calculates the sequence similarity of each group of data, clusters the sequence similarity according to the sequence similarity, and outputs cluster centers representing conventional behavior clusters to the storage module and the prediction module as conventional behavior models, and the prediction module calculates offset between any batch of data and the conventional behavior models according to the conventional behavior models and realizes abnormal behavior detection by comparing the offset;
the analysis refers to: converting the sequence of the data records into an ordered list representation, each element in the list representing a test result of a product in the batch; removing data lines with missing values or obvious abnormal values in the sequence table through data cleaning; uniformly dividing the sequence table into a plurality of non-repeated sub-data segments, namely structured data;
the sequence similarity of each group of data refers to: each segmented data sequence D of the data list D obtained after processing the data file i Sub-sequence division is carried out, the two sub-sequences obtained after division are subjected to local sequence comparison by using a dynamic programming algorithm, and after the sequence similarity score matrix is calculated, the maximum value in the obtained similarity matrix is taken to represent the maximum similarity sub-sequence score s in the segment i I.e., the highest sequence similarity therein;
the sequence similarity scoring matrix is as follows:
Figure FDA0004134510570000011
wherein: a is a sequence similarity scoring matrix, a, b respectively represent two subsequences to be compared, s (a i ,b j ) Representing the similarity between the ith element in the sequence a and the jth element in the sequence b, A ij Representing the alignment of two sequences from front to back to element a i ,b j Highest subsequence at the time of (a)Similarity score, c represents gap penalty;
the clustering is divided into four groups of the data list D obtained after the data file processing, namely, all s are taken respectively i The coefficient of variation and the relative difference are calculated, namely: coefficient of variation: CV = σ/μ, relatively very poor: rr= (max-min)/μ, using four groups (CV, RR) to represent differences of sequence similarity inside each batch of data D, clustering (CV, RR) values of the four groups based on a large number of different batches of test data, wherein the largest cluster represents a conventional behavior, selecting a cluster center of the cluster as a conventional behavior model, and storing the cluster center into a storage module;
the dividing comprises the following steps:
(1) dividing the sequence directly from the middle into two sequences a, b;
(2) when the length of the subsequence to be divided is L, taking an interval value as gap_length, splicing [0, gap_length ], [2, 3 ] gap_length) … into a subsequence a g The rest part is spliced into another subsequence b g The method comprises the steps of carrying out a first treatment on the surface of the The two divisions are respectively carried out one sub-sequence and the other sub-sequence is obtained
Figure FDA0004134510570000021
b and->
Figure FDA0004134510570000022
b g
The gap penalty is: for two sequence segments, if one of the skipped intervals or if several elements are repeated the same as the other, then a penalty is placed on this interval;
the comparison offset is as follows: the prediction module receives four groups of value pairs representing the difference of the internal sequence similarity of any batch of data calculated by the sequence model construction module, takes out the conventional behavior model from the storage module, calculates the distance between the two, obtains the offset of the batch of data expressed by percentage and the conventional behavior model by mapping the distance between 0 and 1 by using a tanh function, and selects the maximum value from the four offsets of one batch of data to compare with a set threshold value.
2. The system of claim 1, wherein the storage module has a database for storing the processed structured data.
3. The abnormal behavior detection method based on the system of claim 1 or 2, characterized in that a product quality data file is read through a data preprocessing module, analyzed to obtain a batch of product test original data, and the structured data obtained through data cleaning and segmentation processing is transmitted to a sequence model construction module and stored in a storage module; the sequence model construction module calculates the internal highest sequence similarity of the segmented data, represents sequence similarity difference of different segments by using a variation coefficient and relative extremely difference, finds out the largest cluster to represent conventional behavior through clustering, uses a cluster center as a conventional behavior model, and stores the conventional behavior model into the storage module and the prediction module; and the prediction module obtains the reliability of the batch of product data according to the deviation of the similarity difference between the conventional behavior model and the internal sequence of the data to be detected.
CN201810456933.XA 2018-05-14 2018-05-14 Product test abnormal behavior detection system based on behavior sequence analysis Active CN110555051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810456933.XA CN110555051B (en) 2018-05-14 2018-05-14 Product test abnormal behavior detection system based on behavior sequence analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810456933.XA CN110555051B (en) 2018-05-14 2018-05-14 Product test abnormal behavior detection system based on behavior sequence analysis

Publications (2)

Publication Number Publication Date
CN110555051A CN110555051A (en) 2019-12-10
CN110555051B true CN110555051B (en) 2023-04-28

Family

ID=68733648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810456933.XA Active CN110555051B (en) 2018-05-14 2018-05-14 Product test abnormal behavior detection system based on behavior sequence analysis

Country Status (1)

Country Link
CN (1) CN110555051B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
JP5342708B1 (en) * 2013-06-19 2013-11-13 株式会社日立パワーソリューションズ Anomaly detection method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
JP5342708B1 (en) * 2013-06-19 2013-11-13 株式会社日立パワーソリューションズ Anomaly detection method and apparatus

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
一种基于聚类的无监督异常检测方法;杨斌等;《计算机工程与应用》;20080101(第01期);全文 *
中空吹塑类聚乙烯产品质量稳定性评价;王立东等;《石化技术》;20180428;全文 *
基于K-Means聚类的农产品价格异常数据检测;韩琳等;《计算机系统应用》;20170315(第03期);全文 *
基于序列匹配的作业相似度检测系统;王晓英等;《计算机工程》;20121220(第24期);正文第3节 *
基于数据关联性分析的飞轮异常检测;龚学兵等;《航空学报》;20140701(第03期);全文 *
基于聚类的入侵检测研究综述;肖敏等;《计算机应用》;20080615;全文 *

Also Published As

Publication number Publication date
CN110555051A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN113838054B (en) Mechanical part surface damage detection method based on artificial intelligence
US10634621B2 (en) Information processing method, information processing apparatus, and program
CN114511568B (en) Expressway bridge overhauling method based on unmanned aerial vehicle
EP1958034B1 (en) Use of sequential clustering for instance selection in machine condition monitoring
CN112529109A (en) Unsupervised multi-model-based anomaly detection method and system
US20220245402A1 (en) Ai-based pre-training model determination system, and ai-based vision inspection management system using same for product production lines
KR102387886B1 (en) Method and apparatus for refining clean labeled data for artificial intelligence training
CN110837874A (en) Service data abnormity detection method based on time series classification
CN113805018A (en) Intelligent identification method for partial discharge fault type of 10kV cable of power distribution network
CN112308148A (en) Defect category identification and twin neural network training method, device and storage medium
CN112000081A (en) Fault monitoring method and system based on multi-block information extraction and Mahalanobis distance
CN117591986B (en) Real-time automobile data processing method based on artificial intelligence
CN111180013A (en) Device for detecting blood disease fusion gene
CN110555051B (en) Product test abnormal behavior detection system based on behavior sequence analysis
CN112906672A (en) Steel rail defect identification method and system
Flotzinger et al. Building inspection toolkit: Unified evaluation and strong baselines for damage recognition
CN107067034B (en) Method and system for rapidly identifying infrared spectrum data classification
CN114818116A (en) Aircraft engine failure mode identification and service life prediction method based on joint learning
CN115659271A (en) Sensor abnormality detection method, model training method, system, device, and medium
KR102072894B1 (en) Abnormal sequence identification method based on intron and exon
CN114662613A (en) Abnormal battery detection system and method based on elastic time series similarity network
CN114037941A (en) Method and device for performing algorithm multi-data cross validation completion aiming at video target attributes
CN115240065A (en) Unsupervised mismatching detection method based on reinforcement learning
CN114841262A (en) Rolling bearing fault diagnosis method based on DS evidence theory
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant