WO2021103589A1 - 信令分析方法和相关装置 - Google Patents

信令分析方法和相关装置 Download PDF

Info

Publication number
WO2021103589A1
WO2021103589A1 PCT/CN2020/102680 CN2020102680W WO2021103589A1 WO 2021103589 A1 WO2021103589 A1 WO 2021103589A1 CN 2020102680 W CN2020102680 W CN 2020102680W WO 2021103589 A1 WO2021103589 A1 WO 2021103589A1
Authority
WO
WIPO (PCT)
Prior art keywords
signaling
sequence
feature
abnormal
training
Prior art date
Application number
PCT/CN2020/102680
Other languages
English (en)
French (fr)
Inventor
秦臻
饶思维
叶强
吕佳
田光见
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021103589A1 publication Critical patent/WO2021103589A1/zh
Priority to US17/752,848 priority Critical patent/US20220286263A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/0091Signaling for the administration of the divided path
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0053Allocation of signaling, i.e. of overhead other than pilot signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/003Arrangements for allocating sub-channels of the transmission path
    • H04L5/0058Allocation criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/08Protocols for interworking; Protocol conversion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/34Signalling channels for network management communication
    • H04L41/344Out-of-band transfers

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a signaling analysis method and related devices.
  • signaling is used as a control command transmitted between devices. It can not only explain the operation of the device itself, but also put forward the connection requirements for related devices.
  • the instruction needs to be sent to the other party in accordance with a specific protocol encapsulation, so that the other party can recognize and process it.
  • the instruction information includes various information such as the calling party, the called party, and the audio format, so that the other party can complete the requirements of the instruction; the called party needs to respond to the instruction, indicating the completion of the instruction, so that the communicating parties can know each other's operation .
  • many fault problems in operation and maintenance require analysis of signaling data to complete abnormal identification and fault problem delimitation.
  • the actual signaling data analysis has the following three characteristics. First, the scale of data analysis is large. The signaling data that engineers need to analyze at a time often reaches thousands or even tens of thousands. Secondly, the content of signaling data is complicated. Not only are the signaling formats of different protocols different, but each signaling contains at least dozens of information elements with different service meanings in addition to the message type information. Finally, the different business logic behind different protocols makes signaling analysis more complicated, which often requires rich business knowledge and experience as support.
  • the embodiment of the application discloses a signaling analysis method and related device, which can cover the abnormality caused by the cell error and multiplex between different protocols.
  • an embodiment of the present application provides a signaling analysis method, the method includes: acquiring a signaling flow to be tested, the signaling flow to be tested includes N pieces of signaling, and N is an integer greater than 1;
  • the first feature structure is performed on the message type and information element included in each of the N pieces of signaling respectively to obtain a first feature sequence;
  • the first feature sequence includes N first feature vectors, and the N first feature vectors
  • a feature vector corresponds to the N pieces of signaling one-to-one;
  • the first feature sequence is input to the first signaling anomaly detection model for anomaly detection processing, and the first anomaly detection result is output, and the first anomaly detection result indicates The signaling process to be tested is normal or abnormal.
  • each signaling in the signaling procedure to be tested corresponds to the same protocol and the same interface.
  • the interface corresponding to a signaling refers to the interface that generates the signaling.
  • Interface refers to the boundary between two systems in a communication network, which is defined by a specific protocol or specification to ensure the compatibility of formats, functions, signals, and interconnections at the boundary.
  • the protocol used by each signaling in the signaling process to be tested can be Session Initiation Protocol (SIP), S1 Application Protocol (S1 Application Protocol, S1AP), or other signaling protocols. This application Not limited.
  • the signaling analysis method provided by the embodiment of the present application is applicable to different protocols, that is, it can be multiplexed between different protocols. Since each first feature vector in the first feature sequence is obtained by feature construction of the message type and information element included in its corresponding signaling, the first feature sequence is input into the first signaling anomaly detection model for anomaly detection processing , Can analyze the message type and whether the cell is abnormal. It can be seen that the method provided by the embodiment of the present application can comprehensively cover the abnormality caused by the cell error. Therefore, the method provided by the embodiment of the present application can cover the abnormality caused by the cell error, and can be multiplexed between different protocols.
  • the method further includes: Perform a second feature structure on the message type included in each of the N pieces of signaling to obtain a second feature sequence; the second feature sequence includes N second feature vectors, and the N second feature vectors One-to-one correspondence with the N pieces of signaling; the second feature sequence is input into the second signaling anomaly detection model for anomaly detection processing, and a second anomaly detection result is obtained, and the second anomaly detection result indicates the waiting
  • the signaling process is normal or abnormal; the first feature structure is performed on the message type and information element included in each of the N signaling pieces, and the first feature sequence is obtained, including: the second abnormality detection result In the case of indicating that the signaling flow under test is normal, the first feature structure is performed on each of the N signaling pieces to obtain the first feature sequence.
  • Inputting the second feature sequence into the second signaling anomaly detection model for anomaly detection processing can be understood as coarse-level anomaly detection based on the message type; inputting the first feature sequence into the first signaling anomaly detection model for processing Anomaly detection processing can be understood as fine-level anomaly detection based on message types and cells. That is to say, inputting the second feature sequence into the second signaling anomaly detection model for anomaly detection processing (that is, the first round of anomaly detection) is compared to inputting the first feature sequence into the first signaling anomaly detection model for abnormality detection Processing (that is, the second round of abnormal detection), the processing time is shorter, but the accuracy of abnormal detection is low.
  • the second signaling anomaly detection model can be used for anomaly detection first; when using the second signaling anomaly detection model After the detection model does not detect the abnormality of the signaling process to be tested, the first signaling abnormality detection model is used to perform abnormality detection.
  • the first signaling anomaly detection model is no longer used for abnormality detection.
  • the second signaling anomaly detection model is used for anomaly detection, and after the second signaling anomaly detection model is not used to detect the abnormality of the signaling process to be tested, the first signaling anomaly detection model is used for detection.
  • the method of anomaly detection can balance detection efficiency and accuracy.
  • the first feature construction of the message type and information element included in each of the N pieces of signaling respectively to obtain the first feature sequence includes: according to the N pieces of signaling Perform the first feature construction on the N pieces of signaling in the sequence of time stamps to obtain the first feature sequence.
  • the performing the first feature construction on the N pieces of signaling in sequence according to the time stamp sequence of the N pieces of signaling, and obtaining the first feature sequence includes: The combination of a message type and target cell is characterized as a whole to obtain a first vector; the first message type is the message type included in the first signaling, and the target cell is the first signaling including The first signaling is any one of the N signaling, and the first vector is included in the first characteristic sequence.
  • the signaling analysis device may use the combination of message type and information element included in each signaling as a word to perform feature construction to obtain the first feature vector corresponding to each signaling. That is to say, the signaling analysis device can regard each combination of message type and cell as a word, and use natural language processing (NLP) to construct the feature of the word.
  • NLP natural language processing
  • the feature construction methods for words include but are not limited to one-hot encoding (one-hot), bag of words encoding (Bag of Words, BoW).
  • this implementation mode by taking the combination of the message type and the cell included in the signaling as a word for feature construction, it is possible to convert any signaling into a feature vector, which is suitable for signaling under different protocols.
  • the target cell includes a cell indicating a reason for sending the first signaling.
  • the target information element only includes information elements indicating a reason for sending the first signaling.
  • the cell that indicates the reason for sending the signaling in a signaling can be called cause-type cell.
  • the cause cell in the GTPv2-C protocol indicates the reason for service success or failure
  • the eMM-cause message in the diameter protocol indicates the status of the eMM service. yuan.
  • the diameter protocol is a new generation aaa protocol developed by the Internet Engineering Task Force (IETF). In this implementation manner, by using the combination of the message type and the target cell included in the first signaling as a word to perform feature construction, the accuracy of abnormality detection can be improved.
  • the performing the first feature construction on the N pieces of signaling in sequence according to the time stamp sequence of the N pieces of signaling, and obtaining the first feature sequence includes:
  • the M cells in the second signaling are feature-structured as the text including one or more words in the natural language processing NLP algorithm to obtain the second vector;
  • the second signaling is any one of the N signaling Signaling, the second vector is included in the first characteristic sequence, and the M is an integer greater than 1.
  • the M cells are all cells or part of cells in the message content of the second signaling.
  • the M cells in the second signaling are used as text including one or more words to construct the second vector, so that the first feature sequence including the second vector is used for abnormality detection.
  • the second feature structure is performed on the message type included in each of the N pieces of signaling to obtain the second feature sequence including: according to the time of the N pieces of signaling Perform the second feature construction on the N pieces of signaling in sequence by stamping the sequence to obtain the second feature sequence.
  • the performing the second feature construction on the N pieces of signaling in sequence according to the time stamp sequence of the N pieces of signaling, and obtaining the second feature sequence includes:
  • the second message type is characterized as a word in the natural language processing NLP algorithm to obtain a third vector; the second message type is the message type included in the third signaling, and the third signaling is the N messages For any piece of signaling in the command, the third vector is included in the second characteristic sequence.
  • any signaling can be converted into a feature vector, so that the second feature sequence including the third vector can be used for abnormality detection.
  • the exception of the message type is the message type included in the signaling.
  • the F-th first characteristic vector in the first characteristic sequence corresponds to the F-th signal in the N pieces of signaling; and the first characteristic sequence is changed to Inputting to the first signaling abnormality detection model for abnormality detection processing, and outputting the first abnormality detection result includes: in the F round of abnormality detection processing, inputting a third characteristic sequence to the first signaling abnormality detection model for abnormality detection Process to obtain a first set; the first set includes a combination of at least one message type and a cell, and the feature vector in the third feature sequence is the (FK)th first feature in the first feature sequence in turn Vector to the (F-1)th first feature vector; the F is an integer greater than 1, the K is an integer greater than 1 and less than the F; the F letter in the N pieces of signaling If the combination of the message type and the cell of the command is not included in the first set, the first abnormality detection result is output; the first abnormality detection result indicates that the signaling process to be tested is abnormal.
  • the combinations in the first set are all combinations of message types and information elements of the F-th signaling obtained by prediction.
  • the signaling analysis device may use a sliding window with a window length of w and a step length of 1 to perform detection one by one. . w is an integer greater than 1.
  • the signaling analysis device can fill w placeholders before the first feature vector of the first feature sequence for predicting the first first feature vector in the first feature sequence.
  • the signaling analysis device When the signaling analysis device performs abnormality detection, it inputs the (tw)th first feature vector to the (t-1)th first feature vector in the first feature sequence into the first signaling anomaly detection model for abnormality detection processing , A combination (corresponding to the first set) of at least one possible message type of the t-th signaling and possible information elements of the t-th signaling can be obtained. If the combination of the message type and information element of the t-th signaling in the signaling process under test is not included in the predicted combination of at least one message type and information element, it can be determined that the signaling process under test is abnormal.
  • the signaling analysis device can use the w pieces of signaling before the t-th piece of signaling (corresponding to the (tw)-th first feature vector to the (t-1)-th first feature in the first feature sequence).
  • Vector to detect whether the t-th signaling is abnormal.
  • the first signaling anomaly detection model may be an N-gram model (N-Gram), a neural network language model (Neural Network Language Model, NNLM), etc.
  • the first signaling anomaly detection model may be obtained by unsupervised learning using a normal signaling process.
  • the normal signaling process refers to a signaling process that does not occur abnormally.
  • the signaling analysis device can accurately and quickly detect abnormal signaling.
  • the method further includes: when the combination of the message type and the information element of the F-th signaling is included in the first set, and the F is less than the N, Perform the (F+1) round of abnormality detection processing; when the combination of the message type and cell of the F-th signaling is included in the first set, and the F is equal to the N, output the The first abnormality detection result; the first abnormality detection result indicates that the signaling process to be tested is normal.
  • the abnormality detection result can be output in time, and the abnormality detection process can be stopped.
  • the F-th second feature vector in the second feature sequence corresponds to the F-th signaling in the N pieces of signaling; and the second feature sequence is changed to Inputting to the second signaling anomaly detection model for abnormality detection processing, and obtaining the second anomaly detection result includes: in the F-th round of abnormality detection processing, inputting a fourth characteristic sequence to the second signaling anomaly detection model for abnormality detection Processing to obtain a second set; the second set includes at least one message type, and the feature vectors in the fourth feature sequence are sequentially from the (FK)th second feature vector to the ((FK)th feature vector in the second feature sequence F-1) second eigenvectors; the F is an integer greater than 1, the K is an integer greater than 1 and less than the F; the message type of the F-th signaling in the N signaling If it is not included in the second set, the second abnormality detection result is obtained; the second abnormality detection result indicates that the signaling process to be tested is abnormal.
  • the signaling analysis device detects one piece of signaling in each round, and can quickly detect abnormal signaling, and no signaling is missed.
  • the method further includes: when the message type of the F-th signaling is included in the second set, and the F is less than the N, executing the (F) +1) Round abnormality detection processing; in the case that the message type of the F-th signaling is included in the second set, and the F is equal to the N, the second abnormality detection result is obtained; The second abnormality detection result indicates that the signaling process to be tested is normal.
  • the abnormality detection result can be output in time, and the abnormality detection process can be stopped.
  • the method further includes: When an abnormality detection result indicates that the signaling flow to be tested is abnormal, the location where the abnormality occurs in the signaling flow to be tested is determined.
  • the location of the abnormality in the signaling flow to be tested can be further determined.
  • the H-th first feature vector in the first feature sequence corresponds to the H-th signaling in the N signaling; the determining the signaling to be tested
  • the abnormal location in the process includes: in the H round of signaling anomaly positioning, the fifth characteristic sequence is input to the first signaling anomaly detection model for abnormality detection processing to obtain the third set; the third set includes A combination of at least one message type and cell, and the feature vectors in the fifth feature sequence are sequentially from the (HL)th first feature vector to the (H-1)th first feature vector in the first feature sequence
  • the H is an integer greater than 1
  • the L is an integer greater than 1 and less than the H; the combination of the message type and cell of the H signaling in the N signaling is not included in the In the case of the third set, it is determined that the H-th signaling is abnormal.
  • the abnormal signaling can be accurately determined from the signaling flow to be tested.
  • the H-th first feature vector in the first feature sequence corresponds to the H-th signaling in the N signaling; the determining the signaling to be tested
  • the location of the abnormality in the process includes: obtaining the abnormal probability sequence corresponding to the N pieces of signaling, and the G-th probability in the abnormal probability sequence represents the first (D+G) signaling of the N pieces of signaling. It includes the probability of abnormal signaling, and the G and D are both integers greater than 0; according to the abnormal probability sequence, the location where the abnormality occurs in the signaling flow to be tested is determined.
  • the G-th probability in the abnormal probability sequence represents the probability that abnormal signaling is included in the first (D+G) signaling among the N signalings.
  • the abnormal probability sequence can reflect the change of the abnormal degree of the signaling process to be tested. Therefore, the interval positioning of abnormal signaling can be performed by analyzing the abnormal degree change of the signaling process to be tested.
  • the location of the abnormality in the signaling process to be tested is determined, and the location of the abnormality in the signaling process to be tested can be accurately determined.
  • the obtaining the abnormal probability sequence corresponding to the N pieces of signaling includes: inputting a sixth characteristic sequence into the first signaling abnormality detection model to perform abnormality detection processing, to obtain the abnormality The G-th probability in the probability sequence; the feature vectors included in the sixth feature sequence are the first (D+G) first feature vectors in the first feature sequence in turn.
  • the abnormal probability sequence corresponding to the N pieces of signaling can be quickly and accurately obtained.
  • the determining, according to the abnormal probability sequence, the location where the abnormality occurs in the signaling flow to be tested includes: the Gth probability and the (()th probability in the abnormal probability sequence G-1) When the difference between the probabilities is greater than the probability threshold, it is determined that the signaling of the first signaling interval is abnormal, and the first signaling interval includes the (G+D-th) of the N pieces of signaling.
  • the G is an integer greater than 1; if each probability in the abnormal probability sequence is not less than the previous probability, it is determined that signaling occurs in the second signaling interval Abnormal; the second signaling interval includes the (P+D) to Nth signaling in the N signaling, the Pth probability and the (P+1)th probability in the abnormal probability sequence The difference between is not less than the difference between any two adjacent probabilities in the abnormal probability sequence, and the P is an integer greater than 0; the probability in the abnormal probability sequence increases from the first value to the second value and When the second value is decreased from the third value to the first value before the second value is maintained, it is determined that the signaling in the third signaling interval is abnormal; the first value is less than the first threshold, and the second value And the third value are both greater than a second threshold, the first threshold is less than the second threshold, and the third signaling interval includes the (Q+D)th to the Nth of the N pieces of signaling Signaling, the Q-th probability in the abnormal probability sequence is the probability of
  • the location of the abnormality in the signaling flow to be tested can be accurately determined.
  • the method before the acquiring the signaling process under test, the method further includes: collecting signaling data; the signaling data includes the N pieces of signaling; parsing the signaling data Obtain the interface, timestamp, protocol, and process identifier corresponding to each piece of signaling; group the signaling with the same interface, protocol, and process identifier in the signaling data into the same group to obtain at least A group of signaling; each signaling in the target group signaling is sorted according to the sequence of the timestamps included in the signaling in the target group signaling to obtain the signaling flow under test, the target group
  • the signaling is any group of signaling in the at least one group of signaling.
  • the signaling belonging to the same signaling process can be screened out accurately and quickly, and then the signaling process to be tested can be obtained.
  • the embodiments of the present application provide a training method, which includes: separately constructing the message types and information elements included in each signaling in the training signaling process to obtain a first vector sequence;
  • the feature vector in the vector sequence corresponds to the signaling in the training signaling process one-to-one;
  • the (R)th feature vector to the (R+W)th feature vector in the first vector sequence are used as the first
  • the training feature sequence is input to the first training model for unsupervised learning, and the first signaling anomaly detection model is obtained;
  • the first training model is a W meta-language model, where W is an integer greater than 1, and R and S is an integer greater than zero.
  • the first signaling anomaly detection model may be NNLM or other sequence models, which is not limited in this application.
  • a first signaling anomaly detection model that uses the feature vector corresponding to the first W signaling of a signaling to predict the message type and cell of the signaling can be trained, and the training efficiency is high.
  • an embodiment of the present application provides a training method, which includes: separately constructing features of the message type included in each signaling in the training signaling process to obtain a second vector sequence; the second vector sequence The feature vector in is one-to-one correspondence with the signaling in the training signaling process; the (R)th feature vector to the (R+W)th feature vector in the second vector sequence are used as the second training feature
  • the sequence is input to the second training model for unsupervised learning to obtain a second signaling anomaly detection model;
  • the second training model is a W meta-language model, where W is an integer greater than 1, and both R and S are Is an integer greater than 0.
  • the second signaling anomaly detection model may be NNLM or other sequence models, which is not limited in this application.
  • a second signaling anomaly detection model can be trained to predict the message type of the signaling by using the feature vector corresponding to the first W signaling of a signaling, and the training efficiency is high.
  • an embodiment of the present application provides a training method, which includes: separately constructing a feature of the message type and information element included in each signal in the training signaling process to obtain a first training sample;
  • the feature vector in the training sample has a one-to-one correspondence with the signaling in the training signaling process;
  • the first training sample and the first annotation information are input to the third training model for abnormality detection processing, and the first abnormality detection processing result is obtained;
  • the first abnormality detection processing result indicates that the first training sample is a normal signaling flow or an abnormal signaling flow; according to the first abnormality detection processing result and the first standard result, it is determined that the first training sample corresponds to Loss;
  • the first standard result is the real result of the first training sample indicated by the first annotation information; using the loss corresponding to the first training sample, the parameters of the third training model are updated through an optimization algorithm , Get the first signaling anomaly detection model.
  • the third training model may be a Recurrent Neural Networks (RNN), a Long Short-Term Memory (LSTM) model, or other models, which are not limited in this application.
  • the model trained by the third training model is the above-mentioned first signaling anomaly detection model.
  • the training signaling process can be any signaling process. In practical applications, the training device can use the labeled normal signaling process and abnormal signaling process to train to obtain the first signaling anomaly detection model.
  • the normal signaling process and abnormal signaling process with labels are used to train the third training model, so that the first signaling anomaly detection model obtained by training can accurately detect whether each signaling process is abnormal.
  • an embodiment of the present application provides a training method, which includes: separately constructing features of the message type included in each signaling in the training signaling process to obtain a second training sample; the second training sample The feature vector in is one-to-one correspondence with the signaling in the training signaling process; the second training sample and the second annotation information are input to the fourth training model for abnormality detection processing, and the second abnormality detection processing result is obtained;
  • the second abnormality detection processing result indicates that the second training sample is a normal signaling flow or an abnormal signaling flow; the loss corresponding to the first training sample is determined according to the second abnormality detection processing result and the second standard result;
  • the second standard result is the real result of the second training sample indicated by the second annotation information; the loss corresponding to the second training sample is used to update the parameters of the fourth training model through an optimization algorithm to obtain The second signaling anomaly detection model.
  • the fourth training model may be an RNN, an LSTM model, or other models, which is not limited in this application.
  • the model trained by the fourth training model is the above-mentioned second signaling anomaly detection model.
  • the training signaling process can be any signaling process. In practical applications, the training device can use the marked normal signaling process and abnormal signaling process to train to obtain the second signaling anomaly detection model.
  • the fourth training model is trained using the labeled normal signaling process and abnormal signaling process, so that the second signaling anomaly detection model obtained by training can accurately detect whether each signaling process is abnormal.
  • an embodiment of the present application provides a signaling analysis device, including a processor and a memory, where the memory is used to store program instructions, and the processor is used to call the program instructions to perform the following operations:
  • Signaling flow the signaling flow to be tested includes N pieces of signaling, where N is an integer greater than 1, and the first feature structure is performed on the message types and information elements included in each of the N pieces of signaling respectively ,
  • a signaling abnormality detection model performs abnormality detection processing, and outputs a first abnormality detection result.
  • the first abnormality detection result indicates that the signaling process to be tested is normal or abnormal.
  • the processor is further configured to perform a second feature structure on the message type included in each of the N pieces of signaling to obtain a second feature sequence;
  • the feature sequence includes N second feature vectors, and the N second feature vectors correspond to the N pieces of signaling one-to-one; input the second feature sequence to the second signaling anomaly detection model for anomaly detection processing, Obtain a second abnormality detection result;
  • the second abnormality detection result indicates that the signaling flow to be tested is normal or abnormal;
  • the processor is specifically configured to indicate the signaling flow to be tested when the second abnormality detection result Under normal circumstances, the first feature structure is performed on each of the N pieces of signaling to obtain the first feature sequence.
  • the processor is specifically configured to perform the first feature structure on the N pieces of signaling in sequence according to the time stamp sequence of the N pieces of signaling to obtain the first Feature sequence.
  • the processor is specifically configured to perform characteristic construction of the combination of the first message type and the target cell as a whole to obtain the first vector;
  • the first message type is the first message
  • the type of message included in the command is the target cell is the cell included in the first signaling, the first signaling is any one of the N signaling, and the first vector is included in The first characteristic sequence.
  • the target cell includes a cell indicating a reason for sending the first signaling.
  • the processor is specifically configured to use the M cells in the second signaling as a text including one or more words in the natural language processing NLP algorithm for feature construction to obtain the second Vector;
  • the second signaling is any one of the N signaling, the second vector is included in the first characteristic sequence, and the M is an integer greater than 1.
  • the processor is specifically configured to perform the second feature construction on the N pieces of signaling in sequence according to the time stamp sequence of the N pieces of signaling to obtain the second Feature sequence.
  • the processor is specifically configured to use the second message type as a word in the natural language processing NLP algorithm to perform feature construction to obtain a third vector; the second message type is third The message type included in the signaling, the third signaling is any one of the N signaling, and the third vector is included in the second characteristic sequence.
  • the processor is specifically configured to input a third characteristic sequence into the first signaling anomaly detection model for anomaly detection processing in the F-th round of abnormality detection processing, to obtain the first Set;
  • the first set includes a combination of at least one message type and cell, and the feature vectors in the third feature sequence are sequentially from the (FK)th first feature vector to the (F)th feature vector in the first feature sequence -1) a first feature vector;
  • the F is an integer greater than 1
  • the K is an integer greater than 1 and less than the F;
  • the message type and information of the F-th signaling in the N signaling If the combination of elements is not included in the first set, output the first abnormality detection result; the first abnormality detection result indicates that the signaling process to be tested is abnormal.
  • the processor is further configured to: when the combination of the message type and the cell of the F-th signaling is included in the first set, and the F is less than the N Execute the (F+1)th round of abnormality detection processing; when the combination of the message type and cell of the F-th signaling is included in the first set, and the F is equal to the N, output The first abnormality detection result; the first abnormality detection result indicates that the signaling process to be tested is normal.
  • the F-th second feature vector in the second feature sequence corresponds to the F-th signaling in the N signaling; the processor is specifically configured to In the F round of abnormality detection processing, the fourth feature sequence is input to the second signaling anomaly detection model for abnormality detection processing to obtain a second set; the second set includes at least one message type, and the fourth feature
  • the feature vectors in the sequence are sequentially from the (FK)th second feature vector to the (F-1)th second feature vector in the second feature sequence; the F is an integer greater than 1, and the K is An integer greater than 1 and less than the F; in the case that the message type of the F-th signaling in the N signaling is not included in the second set, the second anomaly detection result is obtained; the The second abnormality detection result indicates that the signaling process to be tested is abnormal.
  • the processor is further configured to execute the first set of messages when the message type of the F-th signaling is included in the second set, and the F is less than the N (F+1) round of abnormality detection processing; when the message type of the F-th signaling is included in the second set, and the F is equal to the N, the second abnormality detection result is obtained; The second abnormality detection result indicates that the signaling process to be tested is normal.
  • the processor is further configured to determine that an abnormality occurs in the signaling process to be tested when the first abnormality detection result indicates that the signaling process to be tested is abnormal position.
  • the H-th first characteristic vector in the first characteristic sequence corresponds to the H-th signal in the N pieces of signaling; the processor is specifically configured to In the H round of signaling anomaly location, the fifth characteristic sequence is input to the first signaling anomaly detection model for anomaly detection processing to obtain a third set; the third set includes at least one combination of message type and cell,
  • the feature vectors in the fifth feature sequence are sequentially from the (HL)th first feature vector to the (H-1)th first feature vector in the first feature sequence; the H is an integer greater than 1.
  • the L is an integer greater than 1 and less than the H; in the case that the combination of the message type and the information element of the H signaling in the N signaling is not included in the third set, it is determined that The Article H signaling is abnormal.
  • the H-th first characteristic vector in the first characteristic sequence corresponds to the H-th signal in the N pieces of signaling; the processor is specifically configured to obtain The abnormal probability sequence corresponding to the N pieces of signaling, the G-th probability in the abnormal probability sequence represents the probability that the first (D+G) signaling in the N signaling includes abnormal signaling, so The G and the D are both integers greater than 0; according to the abnormal probability sequence, the location where the abnormality occurs in the signaling flow to be tested is determined.
  • the processor is specifically configured to input the sixth characteristic sequence into the first signaling anomaly detection model to perform anomaly detection processing, to obtain the Gth in the abnormal probability sequence.
  • Probability the feature vectors included in the sixth feature sequence are the first (D+G) first feature vectors in the first feature sequence in turn.
  • the processor is specifically configured to: when the difference between the Gth probability and the (G-1)th probability in the abnormal probability sequence is greater than a probability threshold, It is determined that the signaling of the first signaling interval is abnormal, and the first signaling interval includes the (G+D-1) to Nth signaling of the N signaling, and the G is greater than 1.
  • the second signaling interval includes the N pieces of signaling (P+D) to Nth signaling in the abnormal probability sequence, the difference between the Pth probability and the (P+1)th probability in the abnormal probability sequence is not less than any two adjacent ones in the abnormal probability sequence
  • the difference of the probability, the P is an integer greater than 0; the probability in the abnormal probability sequence increases from the first value to the second value and decreases from the third value to the first value before maintaining the second value.
  • the signaling in the third signaling interval is abnormal; the first value is less than the first threshold, the second value and the third value are both greater than the second threshold, and the first The threshold is smaller than the second threshold, the third signaling interval includes the (Q+D) to Nth signaling in the N signaling, and the Qth probability in the abnormal probability sequence is all The probability of the starting point of the last ascending segment in the abnormal probability curve, the Q is an integer greater than 0; in the case that each probability in the abnormal probability sequence is not less than the probability threshold, the fourth signaling interval is determined An abnormality occurs in the signaling in the fourth signaling interval, and the fourth signaling interval includes the Dth signaling to the Nth signaling of the N signalings.
  • the processor is also used to collect signaling data; the signaling data includes the N pieces of signaling; each piece of signaling in the signaling data is parsed to obtain each piece of signaling.
  • the interface, timestamp, protocol, and process identifier corresponding to each piece of signaling; the signaling that corresponds to the interface, protocol, and process identifier in the signaling data are grouped into the same group to obtain at least one set of signaling; according to the target group
  • the sequence of the time stamps included in each signaling in the signaling sequence the signaling in the target group signaling to obtain the signaling flow under test, and the target group signaling is the at least one group Any group of signaling in signaling.
  • an embodiment of the present application provides a training device, including a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to perform the following operations:
  • the message types and cells included in each signaling in the process are feature-structured to obtain a first vector sequence; the feature vectors in the first vector sequence correspond to the signaling in the training signaling process one-to-one;
  • the (R)th feature vector to the (R+W)th feature vector in the first vector sequence are input as the first training feature sequence to the first training model for unsupervised learning, and the first signaling anomaly detection model is obtained;
  • the first training model is a W meta-language model, W is an integer greater than 1, and both R and S are integers greater than 0.
  • an embodiment of the present application provides a training device, including a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to perform the following operations:
  • the message type included in each signaling in the process is feature-structured to obtain a second vector sequence; the feature vector in the second vector sequence corresponds to the signaling in the training signaling process one-to-one; and the second vector sequence corresponds to the signaling in the training signaling process.
  • the (R)th feature vector to the (R+W)th feature vector in the vector sequence are input as the second training feature sequence to the second training model for unsupervised learning to obtain the second signaling anomaly detection model;
  • the second training model is a W meta-language model, where W is an integer greater than 1, and both R and S are integers greater than 0.
  • an embodiment of the present application provides a training device, including a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to perform the following operations:
  • the message type and cell included in each signaling in the process are feature-structured to obtain the first training sample;
  • the feature vector in the first training sample corresponds to the signaling in the training signaling process one-to-one;
  • the training samples and the first annotation information are input to the third training model for abnormality detection processing to obtain a first abnormality detection processing result;
  • the first abnormality detection processing result indicates that the first training sample is a normal signaling process or abnormal signaling Process;
  • the first standard result is the loss of the first training sample indicated by the first annotation information Real result; using the loss corresponding to the first training sample to update the parameters of the third training model through an optimization algorithm to obtain the first signaling anomaly detection model.
  • an embodiment of the present application provides a training device, including a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to perform the following operations:
  • the message type included in each signaling in the process is feature-structured to obtain a second training sample;
  • the feature vector in the second training sample corresponds to the signaling in the training signaling process one-to-one;
  • the second training sample And inputting the second annotation information into the fourth training model to perform anomaly detection processing to obtain a second anomaly detection processing result;
  • the second anomaly detection processing result indicates that the second training sample is a normal signaling process or an abnormal signaling process;
  • the second standard result is the real result of the second training sample indicated by the second annotation information Use the loss corresponding to the second training sample to update the parameters of the fourth training model through an optimization algorithm to obtain a second signaling anomaly detection model.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program.
  • the computer program includes program instructions that, when executed by a processor, cause the processor to execute the foregoing The first aspect to the fifth aspect and any optional implementation method.
  • the embodiments of the present application provide a computer program product.
  • the computer program product includes program instructions that, when executed by a processor, cause the information processor to execute the first to fifth aspects and any of the above-mentioned aspects.
  • FIG. 1A is a schematic diagram of a network architecture of a signaling analysis system provided by an embodiment of this application;
  • FIG. 1B is a schematic diagram of a signaling analysis device provided by an embodiment of this application.
  • Figure 2 is a flowchart of a signaling analysis method provided by an embodiment of the application.
  • FIG. 3 is a flowchart of a method for locating abnormal signaling according to an embodiment of the application
  • FIG. 4 is a flowchart of a method for preprocessing signaling data according to an embodiment of the application
  • FIG. 5 is a flowchart of a method for detecting anomalies in signaling according to an embodiment of the application
  • Fig. 6 is a schematic diagram of a characteristic structure provided by an embodiment of the application.
  • FIG. 7 is a flowchart of another signaling abnormality detection method provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of another feature structure provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of an abnormal location process provided by an embodiment of this application.
  • FIG. 10 is a flowchart of a method for locating abnormal signaling according to an embodiment of this application.
  • FIG. 11 is a schematic diagram of constructing an abnormality evaluation curve provided by an embodiment of the application.
  • FIG. 12 is a flowchart of a method for detecting and locating a signaling anomaly according to an embodiment of the application
  • FIG. 13 is an abnormality evaluation curve of a signaling process to be tested provided by an embodiment of this application.
  • FIG. 14 is a flowchart of another signaling anomaly detection and location method provided by an embodiment of the application.
  • 15 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 16 is a schematic structural diagram of a signaling analysis apparatus provided by an embodiment of this application.
  • the control flow composed of signaling messages that control service functions such as calls, bearers, and connections transferred between devices in a communication network is called a signaling process.
  • Domain usually refers to a logical division of information technology (Information Technology, IT) basic resources, used to plan and manage basic resources. Different domains have different services and use different communication protocols.
  • IT Information Technology
  • the boundary between two systems in a communication network is defined by a specific protocol or specification to ensure the compatibility of formats, functions, signals, and interconnections at the boundary.
  • Information element The information unit carried by a signaling message. Its meaning is defined by the protocol that the specific signaling complies with, and it is encapsulated into a signaling message in a manner defined by the protocol. Specific content such as service type indication, bearer establishment parameters, user identification, etc.
  • Coarse-Fine A common layered analysis method of first coarse particle size (Coarse-level) and then fine particle size (Fine-level).
  • Control plane signaling usually refers to the control signaling data that establishes services for users in the communication network.
  • the signaling analysis method provided in the embodiments of the present application can be applied to scenarios such as signaling anomaly detection and signaling anomaly location.
  • the application of the signaling analysis method provided in the application embodiment in the signaling anomaly detection scenario and the signaling anomaly location scenario is briefly introduced below.
  • the signaling analysis device performs anomaly detection on the signaling data collected from the communication network in real time. After detecting an abnormality in any signaling process, it outputs the corresponding abnormality detection result for the convenience of operation and maintenance personnel Know the abnormal signaling process in time.
  • the signaling analysis device can analyze in real time whether there is an abnormal signaling process, and when an abnormal signaling process is detected, it sends the abnormality detection result to the operation and maintenance personnel, so that the operation and maintenance personnel can know the abnormality in time. Signaling process.
  • the signaling analysis device detects the abnormality of the signaling data collected from the communication network in real time. After detecting the abnormality of any signaling process, it further determines the abnormal signaling in the signaling process. And send information indicating that the signaling in the signaling interval is abnormal to the operation and maintenance personnel, so that the operation and maintenance personnel can solve the problem of abnormal signaling in a targeted manner.
  • FIG. 1A is a schematic diagram of a network architecture of a signaling analysis system provided by an embodiment of the application.
  • the signaling analysis system includes a signaling collection device and a signaling analysis device.
  • the signaling acquisition device includes a signaling acquisition module 101 and a signaling preprocessing module 102
  • the signaling analysis device includes a signaling abnormality detection module 103, a signaling abnormality location module 104, and an analysis result output module 105.
  • the signaling abnormality location module 104 is optional. In other words, the signaling analysis device may not include the signaling abnormality location module 104.
  • the signaling collection device is used to collect signaling data from the communication network; extract the signaling flow to be tested from the collected signaling data, and send it to the signaling analysis device.
  • the signaling analysis device is used to realize signaling abnormality detection and signaling abnormality location (that is, to determine the signaling interval where the abnormality occurs). The functions of each module will be detailed later.
  • FIG. 1B is a schematic diagram of a signaling analysis device provided by an embodiment of the application.
  • the signaling analysis device includes a signaling acquisition module 101, a signaling preprocessing module 102, a signaling anomaly detection module 103, and a signaling Let the abnormal location module 104 and the analysis result output module 105.
  • the signaling abnormality location module 104 is optional. Comparing FIG. 1A and FIG. 1B, it can be seen that the signaling analysis device can independently realize signaling abnormality detection, or can cooperate with other devices to realize signaling abnormality detection. It should be understood that the signaling analysis apparatus in FIG. 1A and FIG. 1B are two optional implementation manners. The modules in Fig. 1A and Fig. 1B are introduced below.
  • the signaling acquisition module 101 completes the collection of the signaling data to be analyzed in the communication network, that is, collects the signaling data.
  • Signaling preprocessing module 102 For the acquired signalling data, it first parses out information related to signalling analysis, and then extracts the signalling procedure to be tested in the unit of signalling procedure.
  • Signaling anomaly detection module 103 By analyzing the message type and message content in the signaling, a set of efficient and accurate coarse-fine signaling anomaly detection is realized. Among them, according to the labeling of the training data, two sets of anomaly detection methods, supervised and unsupervised, are provided respectively.
  • Signaling anomaly location module 104 A set of interpretable abnormal signaling location methods is implemented for the abnormal signaling process detected by the signaling anomaly detection module 103. Among them, according to the labeling of the training data, two sets of abnormal location methods, supervised and unsupervised, are provided respectively.
  • Analysis result output module 105 organize and output the results of the signaling anomaly detection module 103 and the signaling anomaly locating module 104.
  • Fig. 2 is a flowchart of a signaling analysis method provided by an embodiment of the application. As shown in Fig. 2, the method may include:
  • the signaling analysis device obtains the signaling flow to be tested.
  • the above-mentioned signaling flow under test includes N pieces of signaling, any one of the above-mentioned N pieces of signaling includes message type and information element, and the above-mentioned N is an integer greater than 1.
  • the signaling analysis device may be a device with data processing functions, such as a server, a computer, and a cloudized network element.
  • each signaling in the above-mentioned signaling procedure to be tested corresponds to the same protocol and the same interface.
  • the signaling acquisition module 101 and the signaling preprocessing module 102 in the signaling analysis apparatus implement step 201.
  • the signaling analysis apparatus may perform the following operations: collect signaling data; the above-mentioned signaling data includes the above-mentioned N pieces of signaling; parse each of the above-mentioned signaling data to obtain The interface, timestamp, protocol, and process identifier corresponding to each piece of signaling; the signaling that corresponds to the interface, protocol, and process identifier in the above signaling data are grouped into the same group to obtain at least one set of signaling; according to the target group The sequence of the time stamps included in each signaling in the signaling sequence the signaling in the above-mentioned target group signaling to obtain the above-mentioned signaling flow under test, and the above-mentioned target group signaling is any one of the above-mentioned at least one group of signaling. A set of signaling.
  • the first feature sequence includes N first feature vectors, and the N first feature vectors are in one-to-one correspondence with the N pieces of signaling.
  • step 202 may be replaced by: performing a first feature structure on each cell included in each of the N pieces of signaling to obtain the first feature sequence.
  • the foregoing first abnormality detection result indicates that the foregoing signaling process to be tested is normal or abnormal.
  • the first signaling anomaly detection model may be a model obtained through unsupervised learning, or a model obtained through supervised learning.
  • the signaling analysis apparatus uses the signaling abnormality detection module 103 to perform step 202 and step 203.
  • the first abnormality detection result is sent to the operation and maintenance personnel, so that the operation and maintenance personnel can learn the abnormal signaling flow in time.
  • the method provided in the embodiments of the present application can cover exceptions caused by message types and cell errors, and can be multiplexed between different protocols.
  • the signaling analysis device can use the signaling abnormality detection module 103 to perform the first round of coarse-level abnormality detection based on the message type. If no abnormality is detected in the first round of detection , Then perform the second round of fine-level abnormality detection based on message type and cell.
  • the signaling analysis apparatus before performing step 202, performs the following operations: respectively perform second feature construction on the above N pieces of signaling to obtain a second feature sequence; and input the above second feature sequence to The second signaling abnormality detection model performs abnormality detection processing to obtain a second abnormality detection result; the second abnormality detection result indicates that the signaling process to be tested is normal or abnormal.
  • the signaling analysis apparatus may perform step 203 when the foregoing second abnormality detection result indicates that the foregoing signaling process to be tested is normal.
  • the second feature sequence includes N second feature vectors, the N second feature vectors correspond to the N pieces of signaling one-to-one, and the reference feature vector of the N second feature vectors is determined by the information corresponding to the reference feature vector. Let the included message types perform feature construction, and the reference feature vector is any vector of the N second feature vectors.
  • the first round of coarse-level abnormality detection can improve the detection efficiency while ensuring the recall rate and higher accuracy, while the second round of fine-level abnormality detection can further improve the overall recall rate.
  • Figure 2 only describes the method flow of the signaling analysis device for signaling anomaly detection.
  • the signaling analysis device can also determine the location where the abnormality occurs in the signaling process under test, that is, the abnormal signaling interval.
  • the signaling analysis device may use the signaling abnormality location module 104 to determine the location of the abnormality in the signaling process to be tested.
  • the signaling analysis device may include an unsupervised signaling anomaly detection module 201, a supervised signaling anomaly detection module 202, an unsupervised signaling anomaly location module 203, and a supervised signaling anomaly location module 204.
  • the signaling anomaly detection module 103 may include a supervised signaling anomaly detection module 201 and a supervised signaling anomaly detection module 202, and the signaling anomaly location module 104 may unsupervised signaling anomalies.
  • the following describes an embodiment in which the signaling analysis device performs signaling abnormality location after detecting an abnormality in the signaling process to be tested.
  • Fig. 3 is a flowchart of a method for locating a signaling anomaly according to an embodiment of the application. As shown in Fig. 3, the method includes:
  • the signaling analysis device acquires signaling data.
  • the signaling acquisition module 101 in the signaling analysis apparatus acquires signaling data to be analyzed in the communication network.
  • the signaling acquisition module 101 uses a signaling collection tool or a network packet capture tool or a dedicated signaling instrument issued by the equipment manufacturer to collect signaling data from the communication network.
  • the collected signaling data can come from, but not limited to, the interactive control type signaling of the wireless domain, circuit switched (Circuit Switched) domain, packet switched (Packet Switched) domain, and IP Multimedia Subsystem (IMS) domain. data.
  • the signaling analysis device preprocesses the foregoing signaling data to obtain the signaling process to be tested.
  • the signaling preprocessing module 102 in the signaling analysis device preprocesses the above signaling data to obtain the signaling process to be tested.
  • the implementation of step 302 will be described in detail later.
  • the signaling process to be tested includes N pieces of signaling, the above-mentioned N pieces of signaling belong to the same signaling process, any one of the above-mentioned N pieces of signaling includes message type and information element, and the above-mentioned N is an integer greater than 1.
  • the signaling preprocessing module 102 may first parse the signaling data to obtain the protocol, interface, timestamp, process identifier, message type, and message content related to signaling analysis; and then according to the protocol and interface , Timestamp and process identifier to extract the signaling process to be tested.
  • the signaling analysis device determines whether the supervised signaling abnormality detection module 202 performs abnormality on the signaling process to be tested with a higher accuracy than a target threshold.
  • the signaling analysis device may store an accuracy rate of abnormality detection performed by the supervised signaling anomaly detection module 202 on the signaling process to be tested. It can be understood that if the accuracy of the supervised signaling anomaly detection module 202 for detecting the abnormality of the signaling process to be tested is lower than the target threshold, the unsupervised signaling anomaly detection module 202 is used to perform the signaling process to be tested. Anomaly detection; otherwise, the supervised signaling anomaly detection module 202 is used to perform anomaly detection on the signaling process to be tested. In practical applications, the signaling analysis device may store the accuracy of the supervised signaling abnormality detection module 202 for abnormal detection of each signaling process.
  • the signaling analysis device uses an unsupervised signaling anomaly detection module 201 to perform anomaly detection on the signaling process to be tested.
  • the unsupervised signaling anomaly detection module 201 first performs the first round of anomaly detection based on the message type of each signaling in the signaling process to be tested, and then based on the message type of each signaling in the signaling process to be tested And Xinyuan performs the second round of abnormality detection.
  • step 304 will be described in detail later.
  • the signaling analysis device determines whether the unsupervised abnormality detection module 201 detects an abnormality in the signaling process to be tested.
  • step 306 If yes, go to step 306; if not, go to step 310.
  • the signaling analysis device uses the unsupervised signaling abnormality location module 203 to determine the location of the abnormality in the signaling process to be tested.
  • the signaling analysis device uses the supervised signaling anomaly detection module 202 to perform anomaly detection on the signaling process to be tested.
  • the supervised signaling anomaly detection module 202 first performs the first round of anomaly detection based on the message type of each signaling in the signaling process to be tested, and then based on the message content of each signaling in the signaling process to be tested Perform the second round of abnormality detection.
  • the message content of each signaling includes at least one cell.
  • the signaling analysis device determines whether the supervised abnormality detection module 202 detects an abnormality in the signaling process to be tested.
  • step 309 If yes, go to step 309; if not, go to step 310.
  • the signaling analysis device uses the supervised signaling abnormality location module 204 to determine the location of the abnormal signaling in the signaling process to be tested.
  • the signaling analysis device outputs an abnormality detection result and an abnormality location result.
  • the analysis result output module 105 of the signaling analysis device outputs information indicating that the signaling process under test is normal when the signaling process under test is abnormal; when the signaling process under test is abnormal, it outputs the signal under test. Command the abnormal signaling interval or the location of abnormal signaling in the process.
  • the signaling analysis device may include an unsupervised signaling anomaly detection module 201 (ie, the signaling anomaly detection module 103) and an unsupervised signaling anomaly location module 203 (ie, the signaling anomaly location module 104), It does not include the supervised signaling anomaly detection module 202 and the supervised signaling anomaly location module 204.
  • the signaling analysis apparatus can perform step 301, step 302, step 304, step 305, step 306, and step 310 in FIG. 3.
  • the signaling analysis device may include a supervised signaling anomaly detection module 202 (ie, the signaling anomaly detection module 103) and a supervised signaling anomaly location module 204 (ie, the signaling anomaly location module 104), It does not include an unsupervised signaling anomaly detection module 201 and an unsupervised signaling anomaly location module 203.
  • the signaling analysis apparatus can perform step 301, step 302, step 307, step 308, step 309, and step 310 in FIG. 3.
  • the signaling preprocessing module 102 is used to preprocess the signaling data to obtain the signaling process to be tested. As shown in FIG. 4, the steps performed by the signaling preprocessing module 102 are as follows: 401, signaling analysis; 402, signaling flow extraction.
  • Signaling analysis can be: analyzing information related to signaling analysis for each piece of signaling in the signaling data, such as protocol, interface, timestamp, and process identifier.
  • the extraction of the signaling process may be: the signaling process to be tested is extracted in the unit of the signaling process for subsequent signaling analysis.
  • the signaling preprocessing module can use the signaling analysis tool to complete the signaling analysis, and complete the signaling process extraction based on information such as the protocol, interface, time stamp, and process identifier of each signaling.
  • Signaling analysis 401 In order to reduce the difference between the signaling data under different protocols, for example, the signaling messages under the S1AP protocol are in binary form and the signaling messages under the SIP protocol are similar to HyperText Markup Language (HTML) ) Form, and to improve the readability of signaling data, the signaling preprocessing module can parse each signaling to obtain a variety of information related to signaling analysis.
  • Table 1 shows a variety of information related to signaling analysis. It should be understood that the information in Table 1 is only used as an example, and the information related to signaling analysis obtained by the analysis of the signaling preprocessing module is not limited to Table 1.
  • the timestamp can be the time identifier of the sending/receiving of the signaling message, and is not limited to the absolute specific time (such as 2018-07-10 15:38:10.031), and the relative time count value on the communication device (such as 1122867).
  • Signaling process extraction 402 Since the analysis of signaling data is context-dependent, the signaling preprocessing module can first group all the parsed signaling according to the protocol, interface, and process identification information corresponding to each signaling, and then follow The sequence of the time stamp of each signaling in each group of signaling sorts the signaling to obtain the final signaling process to be tested. The process identifier and protocol corresponding to each signaling in each group of signaling are the same, so each group of signaling belongs to the same signaling process.
  • the signaling preprocessing module groups all the parsed signaling according to the protocol, interface, and process identifier corresponding to each signaling to obtain 5 groups of signaling; then, according to the signaling in each group of signaling
  • the sequence of the included timestamps sorts the signaling in each group of signaling, so that 5 signaling processes to be tested can be obtained.
  • the unsupervised signaling abnormality detection module 201 is used to detect whether the signaling process to be tested is abnormal.
  • the signaling analysis device can use an unsupervised anomaly detection module for model training and anomaly detection. As shown in Figure 5, the unsupervised anomaly detection module can perform the following operations:
  • the unsupervised signaling anomaly detection module performs the first round of coarse-level anomaly detection based on the message type on the signaling process to be tested.
  • the unsupervised signaling abnormality detection module determines whether the signaling process to be tested is abnormal.
  • the unsupervised signaling abnormality detection module determines whether the signaling process to be tested is abnormal according to the abnormality detection result obtained in step 501.
  • the unsupervised signaling anomaly detection module performs a second round of fine-level anomaly detection based on the message type and key information elements in the message content for the signaling process to be tested.
  • the unsupervised signaling abnormality detection module determines whether the signaling process to be tested is abnormal.
  • the unsupervised signaling abnormality detection module determines whether the signaling process to be tested is abnormal according to the abnormality detection result obtained in step 503.
  • the unsupervised signaling abnormality detection module outputs the analysis result without abnormality to the analysis result output module.
  • a non-abnormal analysis result indicates that the signaling process to be tested is normal.
  • the unsupervised signaling anomaly detection module outputs the signaling process to be tested to the unsupervised anomaly location module.
  • the unsupervised signaling anomaly detection module 201 can first perform the first round of coarse-level anomaly detection based on message type information. If no abnormality is detected in the first round of detection , Then perform the second round of fine-level abnormality detection based on the message type and key cell (corresponding to the target cell) in the message content.
  • the key information element may include the information element indicating the reason for sending the signaling in the message content.
  • the abnormal signaling process to be tested can be directly output to the unsupervised abnormality location module 203 for abnormality location. Otherwise, the analysis result without abnormality can be directly output to the analysis result output module 105.
  • the first round of anomaly detection of the unsupervised signaling anomaly detection module 201 can improve the detection efficiency while ensuring a certain recall rate and higher accuracy, while the second round of fine-level anomaly detection can further improve the overall recall rate.
  • an optional solution for the unsupervised anomaly detection module is to use NLP technology to establish a signaling anomaly detection model based on normal signaling data (corresponding to the first signaling anomaly detection model and the second signaling anomaly detection model). Model), by judging whether the signaling message to be tested is within the predicted signaling range for abnormality detection.
  • Normal signaling data refers to a signaling process that does not occur abnormally.
  • the unsupervised anomaly detection module can first perform feature construction on the message types of each signaling in the process to be tested in order to obtain the corresponding process of the process to be tested.
  • the message type feature sequence (corresponding to the second feature sequence).
  • the signaling anomaly detection model A (corresponding to the second signaling anomaly detection model) based on the message type is used to perform coarse-level anomaly detection on the signaling characteristic sequence to be tested.
  • the first round of feature construction take the message type of each signaling as a word in the NLP algorithm, and use the feature construction method of the word in NLP to construct the feature.
  • the feature construction methods of words include but are not limited to one-hot encoding and bag-of-words encoding.
  • the unsupervised anomaly detection module sequentially constructs the features of the message types of each signaling in the process to be tested, and can obtain a feature sequence. Each feature vector in the feature sequence corresponds to a message type.
  • the first round of anomaly detection for the feature sequence obtained after feature construction (that is, the message type feature sequence above), starting from the first feature vector (corresponding to the second feature vector) of the feature sequence, using the window length as w and the step length A sliding window of 1 is tested one by one. Among them, w placeholders can be filled before the first feature vector of the feature sequence to predict the first feature vector of the sequence.
  • the unsupervised anomaly detection module performs anomaly detection, it inputs the (tw)th feature vector, (t-w+1)th feature vector,..., (t-1)th feature vector in the feature sequence into the message-based
  • the type of signaling anomaly detection model A performs anomaly detection processing, and the range of possible message types for the t-th signaling can be obtained.
  • w and t are both integers greater than 1. If the message type of the t-th signaling in the process to be tested is not within the predicted message type range (corresponding to the second set), it can be considered that the signaling process to be tested is abnormal, and the abnormal signal to be tested The process is directly output to the unsupervised anomaly location module 203 for anomaly location, otherwise it enters the second round of anomaly detection based on message types and key cells.
  • the sequence model used in the signaling anomaly detection model based on the message type can be, but is not limited to, the N-ary model and NNLM.
  • Anomaly detection based on message type and key cell Because individual signaling faults are difficult or even impossible to be reflected in the signaling message type, after the first round of message type-based anomaly detection does not detect anomalies, unsupervised anomaly detection The module can perform the second round of abnormality detection based on the message type and key cells in the signaling process to be tested. Among them, for the message content composed of several cells in a signaling, the unsupervised anomaly detection module can treat cause type cells as key cells, and use cause type cells in the message content based on the message type. Anomaly detection is performed on the cell value of the cell. Among them, the cause type cell refers to the cell in the signaling message that can explicitly indicate the reason for sending this signaling.
  • the unsupervised anomaly detection module can first perform feature construction on the message types and key information elements of each signaling in the process to be tested in sequence, so as to obtain the feature sequence corresponding to the signaling process to be tested. Subsequently, a signaling anomaly detection model B (corresponding to the first signaling anomaly detection model) based on the message type and key cells is used to perform fine-level anomaly detection on the characteristic sequence corresponding to the signaling process to be tested.
  • a signaling anomaly detection model B corresponding to the first signaling anomaly detection model
  • the second round of feature construction The combination of the message type and key cell of each signaling is regarded as a word in the NLP algorithm, and the feature construction method of the word in the NLP is used to construct the feature.
  • an optional method for combining the message type and the key cell is to splice the message type and the cell value of the key cell, and treat the splicing result as a new word, as shown in Figure 6.
  • Word feature construction methods include but are not limited to One-hot coding and BoW coding.
  • the unsupervised anomaly detection module can sequentially perform feature construction on the message type and key information element of each signaling in the process to be tested, and obtain a feature sequence (corresponding to the first feature sequence). Each feature vector pair in the feature sequence It should be a combination of a signaling message type and key information element in the signaling process to be tested.
  • the second round of anomaly detection for the feature sequence obtained after the second round of feature construction, the unsupervised anomaly detection module detects one by one in the form of a sliding window with a window length of w′ and a step length of 1.
  • the unsupervised anomaly detection module performs the second anomaly detection, the (tw)th feature vector, the (t-w+1)th feature vector,..., the (t-1)th feature vector in the first feature sequence )
  • Eigenvectors are input into the signaling anomaly detection model B based on message types and key cells for anomaly detection processing, and possible combinations of message types and key cells for the t-th signaling can be obtained.
  • the abnormal signaling process under test can be determined to be abnormal, and the abnormal signaling process under test can be directly output to the unsupervised abnormal location module for abnormal location; otherwise, the analysis result without abnormality can be directly output to the analysis
  • the result output module 105 the sequence model that can be adopted by the signaling anomaly detection model B based on the message type and the key cell may be, but is not limited to, the N-Gram model and the NNLM model.
  • the signaling anomaly detection model A and the signaling anomaly detection model B used in the above-mentioned anomaly detection need to be obtained through a machine learning method on a normal signaling training data set.
  • the following describes how to train the signaling anomaly detection model A and the signaling anomaly detection model B.
  • the training method for the signaling anomaly detection model A obtained by training is as follows: the message types included in each signaling in the training signaling process are feature-structured to obtain the second vector sequence; The feature vector corresponds to the signaling in the training signaling process one-to-one; the (R)th feature vector to the (R+W)th feature vector in the second vector sequence are input as the second training feature sequence Perform unsupervised learning to the second training model to obtain a second signaling anomaly detection model; the second training model is a W metalanguage model, where W is an integer greater than 1, and both R and S are greater than An integer of 0.
  • multiple signaling processes ie normal signaling processes
  • the message type in each signaling process is characterized to obtain the characteristic sequence of each signaling process.
  • the signaling process can obtain the following feature sequence (corresponding to the second feature sequence) for training signaling anomaly detection model A: [ ⁇ bos>, ⁇ bos>, ⁇ bos>, a], [ ⁇ bos>, ⁇ bos >,a,b],[ ⁇ bos>,a,b,c],[a,b,c,d],[b,c,d, ⁇ eos>].
  • ⁇ bos> and ⁇ eos> are the corresponding vectors of placeholders and terminator after feature construction.
  • the model will count the conditional probability P in all feature sequences (that is, the fourth message in the feature sequence
  • the training method for the signaling anomaly detection model B obtained by training is as follows: the message types and information elements included in each signaling in the training signaling process are feature-structured to obtain the first vector sequence; the first vector sequence The feature vector in is one-to-one correspondence with the signaling in the training signaling process; the (R)th feature vector to the (R+W)th feature vector in the first vector sequence are used as the first training feature The sequence is input to the first training model for unsupervised learning to obtain the first signaling anomaly detection model; the first training model is a W meta-language model, where W is an integer greater than 1, and both R and S are Is an integer greater than 0.
  • multiple signaling processes ie, normal signaling processes
  • the message types and key information elements in each signaling process are characterized to obtain the characteristic sequence of each signaling process.
  • the characteristic sequence obtained by this process is [ ⁇ bos>, ⁇ bos>, ⁇ bos>, a], [ ⁇ bos>, ⁇ bos>, a, b], [ ⁇ bos>, a, b, c ], [a,b,c,d],[b,c,d, ⁇ eos>].
  • ⁇ bos> and ⁇ eos> are the corresponding vectors of placeholders and terminator after feature construction, respectively.
  • the model will count the conditional probability P in all normal feature sequences (the fourth message in the feature sequence
  • the supervised signaling abnormality detection module 202 is used to detect whether the signaling process to be tested is abnormal. With the gradual accumulation of labeled signaling data, when a large amount of labeled signaling data can be used for model training, the signaling analysis device can use a supervised anomaly detection module for model training and anomaly detection. Compared with the unsupervised signaling anomaly detection module, the supervised signaling anomaly detection module has a higher accuracy and recall rate for anomaly detection. As shown in Figure 7, the unsupervised coarse-fine anomaly detection module can perform the following operations:
  • the supervised signaling anomaly detection module performs the first round of coarse-level anomaly detection based on the message type on the signaling process to be tested.
  • the supervised signaling abnormality detection module determines whether the signaling process to be tested is abnormal.
  • step 706 If yes, go to step 706; if not, go to step 703.
  • the supervised signaling anomaly detection module performs a second round of fine-level anomaly detection based on message content on the signaling process to be tested.
  • the supervised signaling abnormality detection module determines whether the signaling process to be tested is abnormal.
  • step 706 If yes, go to step 706; if not, go to step 705.
  • the supervised signaling abnormality detection module outputs the analysis result without abnormality to the analysis result output module.
  • a non-abnormal analysis result indicates that the signaling process to be tested is normal.
  • the supervised signaling abnormality detection module outputs the signaling process to be tested to the supervised abnormal location module.
  • the supervised signaling anomaly detection module has a similar anomaly detection process.
  • the main difference lies in the different implementations of step 701 and step 501, as well as steps 703 and The implementation of step 503 is different. Since the anomaly detection process in FIG. 7 covers all the cells in the message content, instead of only the key cells, the anomaly detection method process in FIG. 7 has a wider anomaly detection range.
  • An optional solution for the supervised anomaly detection module is to use NLP technology to establish a signaling process classification model (normal, abnormal) based on labeled signaling data, and to classify the signaling process to be tested to complete the abnormality Detection.
  • the following describes in detail how the supervised anomaly detection module performs anomaly detection, that is, the implementation of step 701 and step 703.
  • the supervised anomaly detection module After the supervised anomaly detection module receives the parsed signaling process to be tested, it can first construct the feature structure of the message type information of each signaling message in the process to be tested in order to obtain the tested signaling process. The message type characteristic sequence corresponding to the process (corresponding to the second characteristic sequence). Subsequently, the signaling flow classification model C (corresponding to the second signaling anomaly detection model) based on the message type is used to perform coarse-level anomaly detection on the signaling characteristic sequence to be tested.
  • the first round of feature construction take each message type as a word in the NLP algorithm, and use the feature construction method of the word in NLP to construct the feature.
  • the feature construction methods of words include but are not limited to One-hot coding and BoW coding.
  • the supervised anomaly detection module sequentially constructs the features of the message types of each signaling in the process to be tested, and can obtain a feature sequence. Each feature vector in the feature sequence corresponds to a message type.
  • the first round of anomaly detection for the feature sequence obtained after feature construction, the supervised anomaly detection module classifies the feature sequence based on the signaling flow classification model C of the message type, so as to obtain the classification result of whether the signaling flow to be tested is abnormal . If the signaling flow classification model C classifies the flow to be tested as abnormal, the signaling flow to be tested is abnormal, and the abnormal signaling flow to be tested is directly output to the supervised abnormal location module 204 for abnormal location, otherwise it enters The second round of anomaly detection based on message content.
  • the signaling process classification model based on the message type can use models that can be, but are not limited to, Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) models.
  • Anomaly detection based on message content Because individual signaling failures are difficult or even impossible to reflect in the message type of signaling, after the first round of anomaly detection based on message type does not detect anomalies, the supervised anomaly detection module will be tested The signaling process performs the second round of abnormality detection based on the content of the message. Among them, for the message content composed of several cells in the signaling message, the supervised anomaly detection module uses all cells or part of the cells in the message content of each signaling message to perform anomaly detection. In other words, the supervised anomaly detection module can first perform feature construction on all cells or part of the cells of each signaling message in the process to be tested, so as to obtain a feature sequence corresponding to the process to be tested. Subsequently, a signal flow classification model D (corresponding to the first signaling anomaly detection model) based on the message content is used to perform fine-level anomaly detection on the characteristic sequence corresponding to the signaling process to be tested.
  • a signal flow classification model D corresponding to the first signal
  • Second feature construction treat each message content composed of several cells as a text paragraph composed of several words, and use the feature construction method of the text paragraph in NLP to carry out feature construction.
  • an optional message content processing method It extracts the cell value of the cell according to the order in which each cell appears in the message content, thereby converting the message content into a semantic text paragraph, as shown in Figure 8.
  • the feature construction methods of text paragraphs adopted by the supervised anomaly detection module include but are not limited to self-encoder and BoW encoding.
  • the supervised anomaly detection module can obtain a feature sequence by sequentially constructing features of the message content of each signaling in the process to be tested. Each feature vector in the feature sequence corresponds to a signaling message in the signaling message to be tested. content.
  • the second round of anomaly detection for the feature sequence obtained after feature construction, the supervised anomaly detection module can classify the feature sequence based on the signaling process classification model D of the message content, so as to obtain whether the signaling process to be tested is abnormal Classification results. If the signaling flow classification model D classifies the flow to be tested as abnormal, it is determined that the signaling flow to be tested is abnormal, and the abnormal signaling flow to be tested is directly output to the supervised abnormal location module for abnormal location; otherwise, The analysis result without abnormality is directly output to the analysis result output module 105.
  • the signaling flow classification model D based on the message type can be, but is not limited to, RNN and LSTM models.
  • the signaling flow classification model C and the signaling flow classification model D used in the above-mentioned anomaly detection need to be obtained through a machine learning method on the labeled signaling training data set.
  • the following describes the implementation of the signaling process classification model C and the signaling process classification model D obtained by training.
  • the training method for the signaling process classification model C obtained by training is as follows: the message types included in each signaling process in the training signaling process are feature-structured to obtain the second training sample; The feature vector corresponds to the signaling in the training signaling process one-to-one; the second training sample and the second annotation information are input to the fourth training model (corresponding to the signaling process classification model C) for abnormal detection processing, and the first 2.
  • the second anomaly detection processing result indicates that the second training sample is a normal signaling flow or an abnormal signaling flow;
  • the first training sample is determined based on the second abnormality detection processing result and the second standard result Corresponding loss;
  • the second standard result is the real result of the second training sample indicated by the second annotation information;
  • the loss corresponding to the second training sample is used to update the parameters of the fourth training model through an optimization algorithm to obtain the first 2.
  • Signaling anomaly detection model (corresponding to signaling flow classification model C).
  • the training device Before training the signaling flow classification model C, the training device may first obtain the labeled normal signaling flow and abnormal signaling flow as a data set for training the signaling flow classification model C; for each signaling flow in the data set, The feature structure is performed on the message type of each signaling in each signaling process, so as to obtain the feature sequence of each signaling process (including the second training sample).
  • the signaling flow can get The characteristic sequence is [a,b,c,d, ⁇ bos>, ⁇ bos>, ⁇ bos>, ⁇ bos>, ⁇ bos>, ⁇ bos>].
  • ⁇ bos> is the vector corresponding to the placeholder after feature construction.
  • the training device can perform modeling based on the characteristic sequence of each signaling process and the labeling information of each signaling process, learn the differences between the normal signaling process and the abnormal signaling process and their respective characteristics, so as to identify the abnormal signaling process The signaling flow classification model C.
  • the input data of the signaling process classification model C is the characteristic sequence of the signaling process and its corresponding label information (abnormal or normal), and the loss function used during training is cross entropy.
  • the loss value of the signaling flow classification model C is lower than the specified loss threshold, a classification model that can distinguish between the normal signaling flow and the abnormal signaling flow can be obtained.
  • the loss value of the signaling flow classification model C is the loss value calculated by using the loss function.
  • the training method for the signaling process classification model D obtained by training is as follows: the message types and information elements included in each signaling process in the training signaling process are feature-structured to obtain the first training sample; the first training sample The feature vector in is one-to-one correspondence with the signaling in the training signaling process; the first training sample and the first annotation information are input to the third training model (corresponding to the signaling process classification model D) for abnormality detection processing, Obtain a first abnormality detection processing result; the first abnormality detection processing result indicates that the first training sample is a normal signaling flow or an abnormal signaling flow; according to the first abnormality detection processing result and the first standard result, the first The loss corresponding to the training sample; the first standard result is the real result of the first training sample indicated by the first annotation information; the loss corresponding to the first training sample is used to update the parameters of the third training model through an optimization algorithm, Obtain the first signaling abnormality detection model (corresponding to the signaling flow classification model D).
  • the training device may first obtain the labeled normal signaling flow and abnormal signaling flow as a data set for training the signaling flow classification model D; for each signaling flow in the data set, Perform feature construction on the message content of each signaling in each signaling process, thereby obtaining the feature sequence (including the first training sample) of each signaling process.
  • the characteristic sequence of the message content in a signaling flow after the characteristic structure is [a, b, c, d]
  • the characteristic sequence that can be obtained by the signaling flow It is [a,b,c,d,null,null,null,null,null,null].
  • null is the vector corresponding to the placeholder after feature construction.
  • the training device can perform modeling based on the characteristic sequence of each signaling process and the labeling information of each signaling process, learn the differences between the normal signaling process and the abnormal signaling process and their respective characteristics, so as to identify the abnormal signaling process
  • the signaling process classification model D Taking the LSTM classification model with a sequence length of 10 as an example, the input data of the signaling process classification model D is the characteristic sequence of each signaling process and its corresponding label information (abnormal or normal), and the loss function used during training can be cross entropy. When the loss value of the signaling flow classification model D is lower than the specified loss threshold, a classification model that can distinguish the normal signaling flow from the abnormal signaling flow can be obtained.
  • the loss value of the signaling flow classification model D is the loss value calculated by using the loss function.
  • the unsupervised abnormal location module 203 is used to determine the location where the abnormality occurs in the signaling process to be tested, that is, to determine the location of the abnormal signaling in the process to be tested.
  • the unsupervised abnormality location module 203 after the unsupervised abnormality detection module 201 detects the abnormality of the signaling process to be tested, further analyzes and obtains the position of the abnormal signaling in the process to be tested.
  • the unsupervised abnormality location module 203 may perform abnormality positioning based on the incoming characteristic sequence, thereby completing the abnormality positioning. .
  • An optional solution for the unsupervised abnormal location module is to use the signaling anomaly detection model A to locate anomalies in the signaling process to be tested.
  • An implementation of this optional solution is as follows: the unsupervised abnormal location module starts from the first feature vector of the feature sequence to be tested, and uses a sliding window with a window length of w and a step length of 1 to perform detection one by one.
  • the characteristic sequence to be tested may be a characteristic sequence corresponding to the signaling process to be tested.
  • the feature sequence to be tested can be from an unsupervised anomaly detection module or from a supervised anomaly detection module.
  • Anomaly detection is performed by inputting the (tw)th feature vector, (t-w+1)th feature vector,..., (t-1)th feature vector in the feature sequence to be tested into the signaling anomaly detection model Processing, the possible range of the t-th signaling can be obtained. If the t-th signaling in the process to be tested is not within the predicted signaling range, it can be determined that the t-th signaling in the signaling process to be tested is abnormal, and the abnormal result of the process to be tested can be compared with the abnormality The position is output to the analysis result output module 105. Among them, w placeholders can be filled before the first feature vector of the feature sequence to be tested, which is used to predict the first signaling of the signaling process to be tested. FIG.
  • FIG. 9 is a schematic diagram of an abnormal location process provided by an embodiment of this application.
  • the feature sequence X represents the feature sequence to be tested
  • X(t) represents the t-th feature vector in the feature sequence X to be tested
  • Y represents the possible range of the t-th signaling (that is, all possible t-th Signaling)
  • the input of the signaling anomaly detection model is the (tw)-th feature vector to the (t-1)-th feature vector in the feature sequence to be tested
  • the output of the signaling anomaly detection model is the predicted t-th item Signaling.
  • the aforementioned characteristic sequence to be tested may be a characteristic sequence obtained based on the message type, or a characteristic sequence obtained based on the message type and key information elements.
  • the unsupervised anomaly location module uses the signaling anomaly detection model A to perform anomaly location.
  • the unsupervised anomaly location module uses the signaling anomaly detection model B to perform anomaly location.
  • the supervised abnormal location module 204 is used to determine the location where the abnormality occurs in the signaling process to be tested, that is, to determine the location of the abnormal signaling in the process to be tested.
  • the supervised abnormality location module 204 after the supervised abnormality detection module 202 detects the abnormality of the signaling process to be tested, further analyzes and obtains the position of the abnormal signaling in the process to be tested.
  • the supervised anomaly location module 204 may perform anomaly location based on the incoming characteristic sequence, thereby completing the abnormal location. .
  • the signaling analysis device inputs the signaling flow to be tested into the signaling flow classification model for abnormal detection processing, and the probability that it belongs to the abnormal flow can be obtained. This probability can be regarded as the abnormal degree of the signaling process under test. Therefore, the abnormal signaling interval can be located by analyzing the abnormal degree change of the signaling process under test, that is, the abnormal signaling process is located in the signaling process under test. Signaling interval.
  • FIG. 10 is a flowchart of a method for locating a signaling abnormality provided by an embodiment of the application. As shown in Fig. 10, the method includes: 1001. A supervised abnormal location module constructs an abnormality evaluation curve of the signaling process to be tested. 1002. The supervised abnormal location module uses the abnormal evaluation curve to locate the abnormal signaling interval, that is, to determine the location of the abnormality in the signaling process to be tested.
  • the abnormal evaluation curve can be obtained.
  • the variation range of the abnormality evaluation curve is between 0 and 1, and the value of the abnormality evaluation curve will increase as the degree of abnormality of the signaling segment increases.
  • the signaling process to be tested corresponds to an abnormal probability sequence
  • the G-th probability in the abnormal probability sequence represents the probability that the first (D+G) signaling of the above-mentioned N signaling includes abnormal signaling.
  • G and D are integers greater than zero.
  • D is 2
  • the probabilities in the abnormal probability sequence are P 1 , P 2 , ..., P m in order
  • P 1 is the abnormal evaluation probability of the first three signaling segments in the signaling process to be tested.
  • the signaling process classification model C is used for abnormal location, that is, the feature sequence is input to the signaling process classification model C for anomaly detection deal with.
  • the signaling process classification model D is used for abnormal location, that is, the feature sequence is input to the signaling process classification model D for abnormality detection processing.
  • the signaling process to be tested can often be further divided into several sub-processes.
  • the normal abnormal evaluation probability will increase with the beginning of the sub-process in the process and decrease with the normal end of the sub-process. Any abnormality that occurs in any signaling in the process, such as a conflict with the above signaling message, and the normal end of the sub-process, will result in a high probability of subsequent abnormality evaluation. Therefore, based on the fluctuations in the abnormality evaluation curve, an interpretable abnormality detection method can be provided for users.
  • Abnormal signaling interval location As abnormal evaluation curves generated under different abnormal situations have different characteristics, the supervised anomaly location module can first classify them according to the characteristics of the abnormal evaluation curves to obtain the type of the curve to which they belong, and then according to the curve to which they belong Type to locate the abnormal interval.
  • Table 2 shows an example of anomaly evaluation interval classification.
  • the abnormality evaluation curve can be divided into four categories according to Table 2, namely, steep increase, slow increase, fluctuation and continuous, and each category can be located in the abnormal interval according to the method in Table 3.
  • Table 3 shows some methods of locating abnormal intervals. Since the anomaly evaluation curve is still a time series in nature, curve classification can be done using time series classification algorithms, and abnormal interval positioning can be done using time series analysis methods.
  • the low value (corresponding to the first threshold) can be 0.2, 0.25, etc.
  • the high value (corresponding to the second threshold) can be 0.75, 0.8, 0.9, etc., which is not limited in this application.
  • an optional judgment method is to calculate the difference between two adjacent points in the probability sequence. If the difference is greater than the sharp increase judgment threshold (corresponding to the probability threshold), it is considered to be a sharp increase segment.
  • the sharp increase judgment threshold may be 0.3, 0.4, 0.5, and so on.
  • an optional judgment method is: calculate the difference between each adjacent point in the sequence, and select the segment with the largest difference and greater than the increase threshold. The two methods for judging the sharp increase section and the maximum increase section in the abnormal evaluation curve are described above, but they are not limited to these two methods.
  • the analysis result output module 105 is configured to complete the following analysis result output according to the received signaling analysis result.
  • the signaling analysis result may be the abnormality detection result output by the abnormality detection module, or the position of the abnormal signaling output by the abnormal positioning module or the position of the abnormal signaling interval.
  • the signaling process to be tested if the signaling process to be tested is normal, the signaling process to be tested and its analysis result "normal” are output; if the signaling process to be tested is abnormal, and the analysis result is generated by the unsupervised abnormal location module 203 , Then output the signaling flow to be tested, the analysis result "abnormal” and the location of the abnormal signaling; if the signaling flow to be tested is abnormal and the analysis result is generated by the supervised abnormal location module 204, the signaling flow to be tested is output , Analysis result "abnormal” and the location of the abnormal signaling interval.
  • FIG. 12 is a flowchart of a method for detecting and locating a signaling anomaly according to an embodiment of the application.
  • Fig. 12 is a further refinement and improvement of the method flow in Fig. 2. As shown in Fig. 12, the method includes:
  • the signaling analysis device collects signaling data under the SIP protocol from the IMS domain of the communication network.
  • the signaling analysis device analyzes the collected signaling data, and extracts the signaling flow to be tested.
  • the signaling analysis device first uses the SIP signaling analysis tool to analyze the protocol, interface, time stamp, process identifier, message type, and message content of each signaling in the signaling data. Then complete the extraction of the signaling process according to the protocol, interface, time stamp, and process identifier.
  • the steps for extracting the signaling flow are as follows: the signaling analysis device may first group the signaling that has the same protocol, interface, and flow identification in the signaling data into the same group, and then group the signaling data according to the signaling data in each group. Let the sequence of the included timestamps sort the signaling in each group of signaling.
  • each group of signaling corresponds to a signaling process to be tested after sorting.
  • the signaling analysis device can perform abnormality detection and location on each signaling process to be tested.
  • the signaling analysis device performs abnormality detection on the signaling process to be tested based on the message type of each signaling in the signaling process to be tested.
  • the signaling analysis device first performs One-hot encoding on the message types of each signaling in the signaling process to be tested in sequence to obtain the message type feature sequence corresponding to the process to be tested (corresponding to the second feature sequence), and then use The trained LSTM classification model based on the message type (corresponding to the signaling flow classification model C) classifies the above message type feature sequence.
  • step 1207 is executed, that is, the signaling process to be tested is directly output to the supervised abnormal location module; otherwise, step 1205 is executed.
  • the signaling analysis device performs abnormality detection on the signaling process to be tested based on the message content of each signaling in the signaling process to be tested.
  • the signaling analysis device may first perform feature construction based on the message content of each signaling in the signaling process under test to obtain a message content feature sequence (corresponding to the first feature sequence) ), and then use the trained LSTM classification model based on the message content (corresponding to the signaling process classification model D) to classify the above message content feature sequence.
  • a message content feature sequence corresponding to the first feature sequence
  • the trained LSTM classification model based on the message content (corresponding to the signaling process classification model D) to classify the above message content feature sequence.
  • an example of the message content coding sequence obtained by feature construction is as follows: (1) For the message content of each signaling message, identify that each cell belongs to the noun type , Numerical or enumerated.
  • the cell type of each cell can be obtained in the training phase of the signaling process classification model D using a statistical classification method.
  • the value of the noun cell is almost different in different signaling processes, and the value of the enumerated cell in different processes comes from a limited number of discrete value sets, and the value of the numerical cell From a continuous value space.
  • the numerical cells in the message content will be discretized in continuous numerical space with reference to the interval judgment method of the engineer. Assuming that the maximum value of the value space of a certain cell is Vmax, the minimum value is Vmin, and the number of discrete intervals is n, then the discrete unit interval length dl is (Vmax-Vmin)/n. When the cell value is x (Vmin ⁇ x ⁇ Vmax), the corresponding discretized value When the value of the cell is not in the value space, the discretized value is 0.
  • the message content in each signaling can be regarded as a text paragraph composed of a string of enumerated cells and discrete numeric cells, which can be converted into fixed-length text using the self-encoding method.
  • (4) Use the self-encoding method to convert the message content as a text paragraph composed of a string of enumerated cells and discrete numeric cells into a fixed-length message content coding sequence.
  • step 1207 is executed, that is, the abnormal to-be-tested process is directly sent to the supervised abnormal location module; if the classification result of the classification model is normal, step 1209 is executed, that is, the normal result is sent Go to the analysis result output module.
  • the signaling analysis device constructs an abnormality evaluation curve of the signaling process to be tested.
  • the signaling analysis device can construct an abnormality evaluation curve of the signaling process to be tested using the message type feature sequence; if abnormality is detected in the signaling process to be tested in step 1205, the signaling analysis device The above message content feature sequence can be used to construct an abnormality evaluation curve of the signaling process to be tested.
  • the signaling analysis device determines the abnormal signaling interval in the signaling process to be tested according to the abnormal evaluation curve of the signaling process to be tested.
  • the signaling analysis device may first use the time series classification algorithm to identify the type of each curve, and then obtain the abnormal signaling interval in the process to be tested according to the abnormal interval locating method in Table 3.
  • Figure 13 is an abnormality evaluation curve of the signaling process to be tested, where the 0 coordinate of the horizontal axis corresponds to the first 3 signalings of the signaling process to be tested, and the x coordinate corresponds to the front of the signaling process to be tested. (x+3) pieces of signaling, the dashed box is the abnormal signaling interval.
  • the signaling analysis device outputs the abnormality detection result and the position of the abnormal signaling interval.
  • the analysis result output module outputs in the format "abnormal, [location of abnormal signaling interval]".
  • the output corresponding to Figure 13 is "abnormal, the 7th to 12th signaling is abnormal”.
  • the analysis result output module outputs "normal".
  • the sequence classification model is used to carry out the message type information and message content information in the signaling process under test. It provides a more comprehensive signal flow abnormality detection, and locates the interval position of abnormal signalling.
  • FIG. 14 is a flowchart of a method for detecting and locating a signaling anomaly according to an embodiment of the application.
  • Fig. 14 is a further refinement and improvement of the method flow in Fig. 2. As shown in Fig. 14, the method includes:
  • the signaling analysis device collects signaling data under the S1AP protocol from the wireless domain of the communication network.
  • the signaling analysis device analyzes the collected signaling data and extracts the signaling process to be tested.
  • the signaling analysis device first uses the S1IP signaling analysis tool to analyze the protocol, interface, time stamp, process identifier, message type, and message content of each signaling in the signaling data. Then complete the extraction of the signaling process according to the protocol, interface, time stamp, and process identifier.
  • the steps for extracting the signaling flow are as follows: the signaling analysis device may first group the signaling that has the same protocol, interface, and flow identification in the signaling data into the same group, and then group the signaling data according to the signaling data in each group. Let the sequence of the included timestamps sort the signaling in each group of signaling.
  • the protocols, interfaces, and process identifiers of each signaling in each group of signaling are the same, and the sorted signaling of each group of signaling corresponds to a signaling process to be tested.
  • the signaling analysis device can perform abnormality detection and location on each signaling process to be tested.
  • the signaling analysis device performs abnormality detection on the signaling process to be tested based on the message type of each signaling in the signaling process to be tested.
  • the signaling analysis device first fills in w space-saving signalings before the first signaling of the process to be tested (the message type and message content are replaced by placeholders ⁇ bos>), and then fills in the post-fills in sequence
  • One-hot encoding is performed on the message type to obtain the message type characteristic sequence corresponding to the process to be tested.
  • the trained NNLM anomaly detection model based on the message type (corresponding to the signaling anomaly detection model A) is used to predict the message type feature sequence obtained above in the form of sliding window (window length w) one by one to obtain the test The abnormal detection result of the signaling process.
  • any piece of signaling in the signaling flow to be tested is not within its prediction range, it is determined that the flow to be tested is abnormal, and an abnormality detection result indicating the abnormality of the signaling flow to be tested is obtained; if any piece of signaling in the signaling flow to be tested is abnormal When the commands are within the prediction range, it is determined that the signaling flow to be tested is normal, that is, an abnormality detection result indicating that the signaling flow to be tested is normal is obtained.
  • step 1407 is performed, that is, the signaling process to be tested is directly sent to the unsupervised abnormal location module; otherwise, step 1405 is performed, that is, the process to be tested is sent to the first In the second round of fine-grained anomaly detection.
  • the signaling analysis device performs abnormality detection on the signaling process to be tested based on the message type and key information elements of each signaling in the signaling process to be tested.
  • the signaling analysis device sequentially uses the combination of the message type and key information element of each signaling in the signaling process to be tested as a word for feature construction to obtain a feature sequence, and calculates the prediction of each signaling in the form of sliding window As a result, the abnormality detection result of the process is obtained.
  • the signaling analysis device may first extract a cell named cause-result from the message content of each signaling as a key cell, and then use the cell value of the key cell and the message type "
  • a characteristic sequence with key cell information added to the signaling process to be tested can be obtained.
  • the trained NNLM anomaly detection model based on message types and key cells (corresponding to signaling anomaly detection model B) is used to predict the obtained feature sequences one by one in the form of sliding windows.
  • step 1405 is similar to step 1403, and the difference lies in the way of constructing the feature sequence.
  • the signaling analysis device can determine whether the signaling process to be tested is abnormal according to the abnormality detection result obtained in step 1405. If the signaling process to be tested is abnormal, step 1407 is performed; if the signaling process to be tested is normal, step 1408 is performed.
  • the signaling analysis device determines the location of the abnormal signaling in the signaling flow to be tested.
  • the signaling analysis device starts from the first signaling of the signaling process to be tested, and performs detection one by one in the form of a sliding window.
  • the signaling can be considered abnormal, and the position of the signaling in the process to be tested is output to the analysis result output module as the final abnormal location result.
  • One way to omit this step is to directly output the position of the first abnormal signaling as the final abnormal location result for the abnormal signaling process to be tested in the abnormal detection process of step 1403 and step 1405.
  • Analysis result output module is to directly output the position of the first abnormal signaling as the final abnormal location result for the abnormal signaling process to be tested in the abnormal detection process of step 1403 and step 1405. Analysis result output module.
  • the signaling analysis device outputs the abnormality detection result and the location of the abnormal signaling.
  • the analysis result output module outputs in the format "abnormal, [location of abnormal signaling]". For example, "abnormal, the 7th signaling is abnormal”. For each signaling process whose signaling analysis result is normal, the analysis result output module outputs "normal".
  • the data-driven embodiment For each signaling protocol on the control plane of the communication network, the data-driven embodiment extracts signaling information supported by each protocol and uses a common feature construction method to effectively eliminate format differences between signaling protocols and avoid The problem of high cost and poor self-renewal ability caused by the summary of expert rules.
  • this embodiment uses the sequence model to process the long dependency relationship between the signaling messages in the signaling process, and performs feature encoding and abnormality detection on the message type information and key cell information in the signaling process. It can cover the abnormal signaling problems other than most message types, effectively avoiding the incomplete analysis of abnormal key cells.
  • this embodiment is easier to start.
  • FIG. 15 is a schematic diagram of the hardware structure of a training device provided by an embodiment of the present application.
  • the training device 1500 of the convolutional neural network shown in FIG. 15 includes a memory 1501, a processor 1502, a communication interface 1503, and a bus 1504. Among them, the memory 1501, the processor 1502, and the communication interface 1503 communicate with each other through the bus 1504.
  • the memory 1501 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 1501 may store programs and training data. When the programs stored in the memory 1501 are executed by the processor 1502, the processor 1502 is used to execute the training method of the embodiment of the present application.
  • the processor 1502 may adopt a general-purpose central processing unit (Central Processing Unit, CPU), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), GPU or one or more integrated circuits for executing related programs. In order to realize the functions required by the units in the training device of the embodiment of the present application, or execute the training method of the method embodiment of the present application.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the communication interface 1503 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 1500 and other devices or a communication network.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 1500 and other devices or a communication network.
  • training data (such as the first training set mentioned in the embodiment of the present application) can be obtained through the communication interface 1503.
  • the bus 1504 may include a path for transferring information between various components of the device 1500 (for example, the memory 1501, the processor 1502, and the communication interface 1503).
  • FIG. 16 is a schematic diagram of the hardware structure of a signaling analysis device provided by an embodiment of the present application.
  • the signaling analysis apparatus 1600 shown in FIG. 16 includes a memory 1601, a processor 1602, a communication interface 1603, and a bus 1604.
  • the memory 1601, the processor 1602, and the communication interface 1603 implement communication connections between each other through the bus 1604.
  • the memory 1601 may be a read-only memory, a static storage device, a dynamic storage device, or a random access memory.
  • the memory 1601 may store a program.
  • the processor 1602 is configured to execute each step of the signaling analysis method in the embodiment of the present application.
  • the processor 1602 may adopt a general-purpose central processing unit, a microprocessor, an application specific integrated circuit, a graphics processing unit (GPU), or one or more integrated circuits for executing related programs to implement the embodiments of the present application
  • the processor can implement the functions of each module in Figure 1B.
  • the processor 1602 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the image segmentation method of the present application can be completed by an integrated logic circuit of hardware in the processor 1602 or instructions in the form of software.
  • the aforementioned processor 1602 may also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • FPGA Field Programmable Gate Array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1601, and the processor 1602 reads the information in the memory 1601, and combines its hardware to complete the functions required by the units included in the signaling analysis apparatus of this embodiment of the application, or execute the information of the method embodiment of the application Let the analysis method.
  • the communication interface 1603 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 1600 and other devices or a communication network.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device 1600 and other devices or a communication network.
  • the signaling data can be acquired through the communication interface 1603.
  • the bus 1604 may include a path for transferring information between various components of the device 1600 (for example, the memory 1601, the processor 1602, and the communication interface 1603).
  • training device 1500 and the signaling analysis device 1600 shown in FIG. 15 and FIG. 16 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the training device The 1500 and the signaling analysis device 1600 also include other devices necessary for normal operation. At the same time, according to specific needs, those skilled in the art should understand that the training device 1500 and the signaling analysis device 1600 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the training device 1500 and the signaling analysis device 16000 may also only include the necessary devices to implement the embodiments of the present application, and not necessarily all the devices shown in FIG. 15 or FIG. 16.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the above-mentioned computer-readable storage medium stores instructions, which when run on a computer, cause the computer to execute the method provided in the foregoing embodiment.
  • An embodiment of the present application also provides a computer-readable storage medium, and the above-mentioned computer-readable storage medium stores instructions, which when run on a computer, cause the computer to execute the training method provided in the foregoing embodiment.
  • the embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the methods provided in the foregoing embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请实施例公开了一种信令分析方法和相关装置,涉及人工智能领域,具体涉及信令异常检测领域,该方法包括:获取待测信令流程,所述待测信令流程包括N条信令,所述N为大于1的整数;分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列;所述第一特征序列包括N个第一特征向量,所述N个第一特征向量与所述N条信令一一对应;将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果,所述第一异常检测结果指示所述待测信令流程正常或者异常;能够覆盖由信元错误引发的异常,并在不同协议间复用。

Description

信令分析方法和相关装置
本申请要求于2019年11月25日提交中国国家知识产权局、申请号为201911168167.8、申请名称为“信令分析方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种信令分析方法和相关装置。
背景技术
为了使通信网络中的各种设备间能协调运转,信令作为设备间传递的一种控制指令,它不仅可以说明设备自身的运行情况,还可以提出对相关设备的接续要求。例如,当用户打电话时,指令需要按照特定协议的封装发送给对方,使对方能够识别并进行处理。指令的信息包含主叫方、被叫方以及音频格式等多种信息,使对方能够完成指令的要求;被叫方需要对指令进行回应,指示指令完成的情况,使通信双方知晓彼此的运行情况。作为记载业务流程最细粒度的数据,运维中的很多故障问题都需要分析信令数据才能完成异常识别、故障问题定界等工作。
实际的信令数据分析具有以下三大特点。首先,数据分析规模大。工程师单次需要分析的信令数据常达数千、甚至上万条。其次,信令数据内容复杂。不仅不同协议的信令格式不同,而且每条信令除包含消息类型信息外,还包含至少几十个具有不同业务含义的信元。最后,不同协议背后互不相同的业务逻辑让信令分析变得更加复杂,其往往需要有丰富的业务知识和经验作为支撑。
目前的信令分析方法和装置都只能满足某一特定信令协议的应用需要,难以在不同协议间复用。考虑到通信网络的故障信令分析常需涉及多个域内的多种不同协议,它们通常的解决方案是为不同协议分别定制相应的信令分析装置。这种方式会导致资源的浪费与维护的困难。因此,需要研究能够在不同协议间复用的信令分析方法。
发明内容
本申请实施例公开了一种信令分析方法和相关装置,能够覆盖由信元错误引发的异常,并在不同协议间复用。
第一方面,本申请实施例提供了一种信令分析方法,该方法包括:获取待测信令流程,所述待测信令流程包括N条信令,所述N为大于1的整数;分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列;所述第一特征序列包括N个第一特征向量,所述N个第一特征向量与所述N条信令一一对应;将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果,所述第一异常检测结果指示所述待测信令流程正常或者异常。
本申请实施例的执行主体可以是服务器、计算机、云化网元等具备数据处理功能的设备。可选的,所述待测信令流程中的各信令均对应于同一协议以及同一接口。一个信令对 应的接口是指产生该信令的接口。接口是指通信网络中两个系统之间的边界,由特定的协议或规范进行定义,用于确保边界处的格式、功能、信号和互连的兼容性。该待测信令流程中的各信令采用的协议可以是会话发起协议(Session Initiation Protocol,SIP),也可以是S1应用协议(S1Application Protocol,S1AP),还可以是其他信令协议,本申请不作限定。也就是说,本申请实施例提供的信令分析方法适用于不同的协议,即可在不同协议间复用。由于该第一特征序列中的各第一特征向量是由其对应的信令包括的消息类型和信元进行特征构造得到,将该第一特征序列输入至第一信令异常检测模型进行异常检测处理,可以分析出消息类型以及信元是否发生异常。可见,本申请实施例提供的方法能够全面覆盖由信元错误引发的异常。因此,本申请实施例提供的方法能够覆盖由信元错误引发的异常,并在不同协议间复用。
在一个可选的实现方式中,所述分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列之前,所述方法还包括:分别对所述N条信令中每一条信令包括的消息类型进行第二特征构造,得到第二特征序列;所述第二特征序列包括N个第二特征向量,所述N个第二特征向量与所述N条信令一一对应;将所述第二特征序列输入至第二信令异常检测模型进行异常检测处理,得到第二异常检测结果,所述第二异常检测结果指示所述待测信令流程正常或者异常;所述分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列包括:在所述第二异常检测结果指示所述待测信令流程正常的情况下,分别对所述N条信令中每一条信令进行所述第一特征构造,得到所述第一特征序列。
将第二特征序列输入至第二信令异常检测模型进行异常检测处理可以理解为基于消息类型的粗粒度(Coarse-level)异常检测;将第一特征序列输入至第一信令异常检测模型进行异常检测处理可以理解为基于消息类型和信元的细粒度(Fine-level)异常检测。也就是说,将第二特征序列输入至第二信令异常检测模型进行异常检测处理(即第一轮异常检测)相比于将第一特征序列输入至第一信令异常检测模型进行异常检测处理(即第二轮异常检测),所花费处理的时间要短,但是异常检测的准确率较低。若仅采用第一信令异常检测模型进行异常检测,异常检测的准确率较高,但是检测效率较低;若仅采用第二信令异常检测模型进行异常检测,异常检测的准确率较低,但是检测效率较高。可见,仅采用一种信令异常检测模型进行异常检测,难以同时兼顾检测效率和准确率。由于采用第二信令异常检测模型进行异常检测所消耗的时间很短且可以准确地检测出大部分异常,因此可以先采用第二信令异常检测模型进行异常检测;在采用第二信令异常检测模型未检测出待测信令流程异常之后,再采用第一信令异常检测模型进行异常检测。在采用第二信令异常检测模型检测出待测信令流程异常之后,不再采用第一信令异常检测模型进行异常检测。在该实现方式中,通过先采用第二信令异常检测模型进行异常检测,在采用第二信令异常检测模型未检测出待测信令流程异常之后,再采用第一信令异常检测模型进行异常检测的方式,能够兼顾检测效率和准确率。
在一个可选的实现方式中,所述分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列包括:按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列。
在一个可选的实现方式中,所述按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列包括:将第一消息类型和目标信元的组合作为一个整体进行特征构造,得到第一向量;所述第一消息类型为第一信令包括的消息类型,所述目标信元为所述第一信令包括的信元,所述第一信令为所述N条信令中的任一条信令,所述第一向量包含于所述第一特征序列。
信令分析装置可将每条信令包括的消息类型和信元的组合作为一个单词进行特征构造以得到每条信令对应的第一特征向量。也就是说,信令分析装置可以将每种消息类型和信元的组合看作一个单词,利用自然语言处理(Natural Language Processing,NLP)中对单词的特征构造方法进行特征构造。其中,对单词的特征构造方法包括但不仅限于独热编码(One-hot)、词袋编码(Bag of Words,BoW)。在该实现方式,通过将信令包括的消息类型和信元的组合作为一个单词进行特征构造,能够将任意信令转换为特征向量,适用于不同协议下的信令。
在一个可选的实现方式中,所述目标信元包括指示所述第一信令的发送原因的信元。
可选的,所述目标信元仅包括指示所述第一信令的发送原因的信元。一个信令中指示该信令发送原因的信元可以称为cause类信元,例如GTPv2-C协议中指示业务成功或失败原因的cause信元,diameter协议中指示eMM业务情况的eMM-cause信元。diameter协议是国际互联网工程任务组(The Internet Engineering Task Force,IETF)开发的新一代aaa协议。在该实现方式中,通过对第一信令包括的消息类型和目标信元的组合作为一个单词进行特征构造,能够提高异常检测的准确率。
在一个可选的实现方式中,所述按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列包括:将第二信令中的M个信元作为自然语言处理NLP算法中包括一个或多个单词的文本进行特征构造,得到第二向量;所述第二信令为所述N条信令中的任一条信令,所述第二向量包含于所述第一特征序列,所述M为大于1的整数。
所述M个信元为所述第二信令的消息内容中的所有信元或部分信元。在该实现方式中,将第二信令中的M个信元作为包括一个或多个单词的文本进行特征构造得到第二向量,以便于利用包括第二向量的第一特征序列进行异常检测时能够检测出信元发生异常的情况。
在一个可选的实现方式中,所述分别对所述N条信令中每一条信令包括的消息类型进行第二特征构造,得到第二特征序列包括:按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第二特征构造,得到所述第二特征序列。
在一个可选的实现方式中,所述按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第二特征构造,得到所述第二特征序列包括:将第二消息类型作为自然语言处理NLP算法中的一个单词进行特征构造,得到第三向量;所述第二消息类型为第三信令包括的消息类型,所述第三信令为所述N条信令中的任一条信令,所述第三向量包含于所述第二特征序列。
在该实现方式中,通过将信令包括的消息类型作为一个单词进行特征构造,能够将任意信令转换为特征向量,以便于利用包括第三向量的第二特征序列进行异常检测时能够检测出消息类型发生异常的情况。
在一个可选的实现方式中,所述第一特征序列中的第F个第一特征向量与所述N条信令中的第F条信令相对应;所述将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果包括:在第F轮异常检测处理中,将第三特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第一集合;所述第一集合包括至少一个消息类型和信元的组合,所述第三特征序列中的特征向量依次为所述第一特征序列中的第(F-K)个第一特征向量至第(F-1)个第一特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;在所述N条信令中的第F条信令的消息类型和信元的组合未包含于所述第一集合的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程异常。
应理解,所述第一集合中的组合均为预测得到的所述第F条信令的消息类型和信元的组合。可选的,针对第一特征序列中的各特征向量,从该第一特征序列的第1个特征向量开始,信令分析装置可以利用窗长为w、步长为1的滑动窗进行逐条检测。w为大于1的整数。其中,信令分析装置可以在第一特征序列的第1个第一特征向量前填补w个占位符,用于预测该第一特征序列中的第1个第一特征向量。信令分析装置进行异常检测时,将第一特征序列中的第(t-w)个第一特征向量至第(t-1)个第一特征向量输入到第一信令异常检测模型进行异常检测处理,可得到至少一个第t条信令可能的消息类型与第t条信令可能的信元的组合(对应于第一集合)。若待测信令流程中的第t条信令的消息类型和信元的组合不包含于预测得到的至少一个消息类型和信元的组合内,即可确定该待测信令流程异常。也就是说,信令分析装置可以利用第t条信令前面的w条信令(对应于第一特征序列中的第(t-w)个第一特征向量至第(t-1)个第一特征向量)来检测第t条信令是否异常。第一信令异常检测模型可以是N元模型(N-Gram)、神经网络语言模型(Neural Network Language Model,NNLM)等。该第一信令异常检测模型可以是利用正常的信令流程进行无监督学习得到的。正常的信令流程是指未发生异常的信令流程。
在该实现方式中,信令分析装置可准确、快速地检测出发生异常的信令。
在一个可选的实现方式中,所述方法还包括:在所述第F条信令的消息类型和信元的组合包含于所述第一集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;在所述第F条信令的消息类型和信元的组合包含于所述第一集合,且所述F等于所述N的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程正常。
在该实现方式中,可及时输出异常检测结果,并停止异常检测流程。
在一个可选的实现方式中,所述第二特征序列中的第F个第二特征向量与所述N条信令中的第F条信令相对应;所述将所述第二特征序列输入至第二信令异常检测模型进行异常检测处理,得到第二异常检测结果包括:在第F轮异常检测处理中,将第四特征序列输入至所述第二信令异常检测模型进行异常检测处理,得到第二集合;所述第二集合包括至少一个消息类型,所述第四特征序列中的特征向量依次为所述第二特征序列中的第(F-K)个第二特征向量至第(F-1)个第二特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;在所述N条信令中的第F条信令的消息类型未包含于所述第二集合的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程异 常。
在该实现方式中,信令分析装置每轮检测一条信令,可快速地检测出发生异常的信令,并且不会遗漏任一个信令。
在一个可选的实现方式中,所述方法还包括:在所述第F条信令的消息类型包含于所述第二集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;在所述第F条信令的消息类型包含于所述第二集合,且所述F等于所述N的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程正常。
在该实现方式中,可及时输出异常检测结果,并停止异常检测流程。
在一个可选的实现方式中,所述将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果之后,所述方法还包括:在所述第一异常检测结果指示所述待测信令流程异常的情况下,确定所述待测信令流程中发生异常的位置。
在该实现方式中,在检测到待测信令流程异常时,可进一步确定待测信令流程中发生异常的位置。
在一个可选的实现方式中,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;所述确定所述待测信令流程中发生异常的位置包括:在第H轮信令异常定位中,将第五特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第三集合;所述第三集合包括至少一个消息类型和信元的组合,所述第五特征序列中的特征向量依次为所述第一特征序列中的第(H-L)个第一特征向量至第(H-1)个第一特征向量;所述H为大于1的整数,所述L为大于1且小于所述H的整数;在所述N条信令中的第H条信令的消息类型和信元的组合未包含于所述第三集合的情况下,确定所述第H条信令发生异常。
在该实现方式中,可准确地从待测信令流程中确定出发生异常的信令。
在一个可选的实现方式中,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;所述确定所述待测信令流程中发生异常的位置包括:获得所述N条信令对应的异常概率序列,所述异常概率序列中的第G个概率表示所述N个信令中的前(D+G)个信令中包括异常信令的概率,所述G和所述D均为大于0的整数;根据所述异常概率序列,确定所述待测信令流程中发生异常的位置。
该异常概率序列中的第G个概率表示所述N个信令中的前(D+G)个信令中包括异常信令的概率。也就是说,该异常概率序列可反映待测信令流程的异常程度变化。因此,可通过分析待测信令流程的异常程度变化进行异常信令的区间定位。
在该实现方式中,根据N条信令对应的异常概率序列来确定待测信令流程中发生异常的位置,可以准确地确定待测信令流程中发生异常的位置。
在一个可选的实现方式中,所述获得所述N条信令对应的异常概率序列包括:将第六特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到所述异常概率序列中的所述第G个概率;所述第六特征序列包括的特征向量依次为所述第一特征序列中的前(D+G)个第一特征向量。
在该实现方式中,可以快速、准确地获得N条信令对应的异常概率序列。
在一个可选的实现方式中,所述根据所述异常概率序列,确定所述待测信令流程中发 生异常的位置包括:在所述异常概率序列中的所述第G个概率与第(G-1)个概率的差值大于概率阈值的情况下,确定第一信令区间的信令发生异常,所述第一信令区间包括所述N条信令中的第(G+D-1)至第N条信令,所述G为大于1的整数;在所述异常概率序列中的各概率均不小于其前面的概率的情况下,确定第二信令区间中的信令发生异常;所述第二信令区间包括所述N条信令中的第(P+D)至第N条信令,所述异常概率序列中第P个概率与第(P+1)个概率的差值不小于所述异常概率序列中任意两个相邻概率的差值,所述P为大于0的整数;在所述异常概率序列中的概率存在由第一值递增至第二值且保持所述第二值之前由第三值递减至所述第一值的情况下,确定第三信令区间中的信令发生异常;所述第一值小于第一阈值,所述第二值和所述第三值均大于第二阈值,所述第一阈值小于所述第二阈值,所述第三信令区间包括所述N条信令中的第(Q+D)至第N条信令,所述异常概率序列中的第Q个概率为所述异常概率曲线中最后一个上升段的起点的概率,所述Q为大于0的整数;在所述异常概率序列中的各概率均不小于所述概率阈值的情况下,确定第四信令区间中的信令发生异常,所述第四信令区间包括所述N个信令中的第D个信令至第N个信令。所述第一阈值可以是0.2、0.25、0.3等,所述第二阈值可以是0.6、0.75、0.8等。
在该实现方式中,可以准确地确定待测信令流程中发生异常的位置。
在一个可选的实现方式中,所述获取待测信令流程之前,所述方法还包括:采集信令数据;所述信令数据包括所述N条信令;解析所述信令数据中的每条信令,得到每条信令对应的接口、时间戳、协议以及流程标识;将所述信令数据中对应的接口、协议以及流程标识均相同的信令分到相同组,得到至少一组信令;按照目标组信令中各信令包括的时间戳的先后顺序对所述目标组信令中的各信令进行先后排序,得到所述待测信令流程,所述目标组信令为所述至少一组信令中任一组信令。
在该实现方式中,可准确、快速地筛选出属于同一信令流程的信令,进而得到待测信令流程。
第二方面,本申请实施例提供了一种训练方法,该方法包括:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一向量序列;所述第一向量序列中的特征向量与所述训练信令流程中的信令一一对应;将所述第一向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第一训练特征序列输入至第一训练模型进行无监督学习,得到第一信令异常检测模型;所述第一训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。所述第一信令异常检测模型可以是NNLM,也可以是其他序列模型,本申请不作限定。
本申请实施例中,可以训练得到一个利用一条信令的前W条信令对应的特征向量预测该条信令的消息类型和信元的第一信令异常检测模型,训练效率高。
第三方面,本申请实施例提供了一种训练方法,该方法包括:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二向量序列;所述第二向量序列中的特征向量与所述训练信令流程中的信令一一对应;将所述第二向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第二训练特征序列输入至第二训练模型进行无监督学习,得到第二信令异常检测模型;所述第二训练模型为W元语言模型,所述W为大于1的整 数,所述R和所述S均为大于0的整数。所述第二信令异常检测模型可以是NNLM,也可以是其他序列模型,本申请不作限定。
本申请实施例中,可以训练得到一个利用一条信令的前W条信令对应的特征向量预测该条信令的消息类型的第二信令异常检测模型,训练效率高。
第四方面,本申请实施例提供了一种训练方法,该方法包括:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一训练样本;所述第一训练样本中的特征向量与所述训练信令流程中的信令一一对应;将第一训练样本和第一标注信息输入至第三训练模型进行异常检测处理,得到第一异常检测处理结果;所述第一异常检测处理结果指示所述第一训练样本为正常信令流程或者异常信令流程;根据所述第一异常检测处理结果和第一标准结果,确定所述第一训练样本对应的损失;所述第一标准结果为所述第一标注信息指示的所述第一训练样本的真实结果;利用所述第一训练样本对应的损失,通过优化算法更新所述第三训练模型的参数,得到第一信令异常检测模型。所述第三训练模型可以是循环神经网络(Recurrent Neural Networks,RNN)、长短时记忆(Long Short-Term Memory,LSTM)模型,也可以是其他模型,本申请不作限定。第三训练模型训练得到的模型为上述第一信令异常检测模型。该训练信令流程可以是任意信令流程。在实际应用中,训练装置可采用带标注的正常信令流程与异常信令流程来训练得到第一信令异常检测模型。
本申请实施例中,利用带标注的正常信令流程与异常信令流程来训练第三训练模型,以便于训练得到的第一信令异常检测模型能够准确地检测各信令流程是否异常。
第五方面,本申请实施例提供了一种训练方法,该方法包括:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二训练样本;所述第二训练样本中的特征向量与所述训练信令流程中的信令一一对应;将第二训练样本和第二标注信息输入至第四训练模型进行异常检测处理,得到第二异常检测处理结果;所述第二异常检测处理结果指示所述第二训练样本为正常信令流程或者异常信令流程;根据所述第二异常检测处理结果和第二标准结果,确定所述第一训练样本对应的损失;所述第二标准结果为所述第二标注信息指示的所述第二训练样本的真实结果;利用所述第二训练样本对应的损失,通过优化算法更新所述第四训练模型的参数,得到第二信令异常检测模型。
所述第四训练模型可以是RNN,也可以是LSTM模型,还可以是其他模型,本申请不作限定。第四训练模型训练得到的模型为上述第二信令异常检测模型。该训练信令流程可以是任意信令流程。在实际应用中,训练装置可采用带标注的正常信令流程与异常信令流程来训练得到第二信令异常检测模型。
本申请实施例中,利用带标注的正常信令流程与异常信令流程来训练第四训练模型,以便于训练得到的第二信令异常检测模型能够准确地检测各信令流程是否异常。
第六方面,本申请实施例提供了一种信令分析装置,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:获取待测信令流程,所述待测信令流程包括N条信令,所述N为大于1的整数;分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列;所述第一特征序列包括N个第一特征向量,所述N个第一特征向量与所述N条信令一一对应;将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果,所 述第一异常检测结果指示所述待测信令流程正常或者异常。
在一个可选的实现方式中,所述处理器,还用于分别对所述N条信令中每一条信令包括的消息类型进行第二特征构造,得到第二特征序列;所述第二特征序列包括N个第二特征向量,所述N个第二特征向量与所述N条信令一一对应;将所述第二特征序列输入至第二信令异常检测模型进行异常检测处理,得到第二异常检测结果;所述第二异常检测结果指示所述待测信令流程正常或者异常;所述处理器,具体用于在所述第二异常检测结果指示所述待测信令流程正常的情况下,分别对所述N条信令中每一条信令进行所述第一特征构造,得到所述第一特征序列。
在一个可选的实现方式中,所述处理器,具体用于按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列。
在一个可选的实现方式中,所述处理器,具体用于将第一消息类型和目标信元的组合作为一个整体进行特征构造,得到第一向量;所述第一消息类型为第一信令包括的消息类型,所述目标信元为所述第一信令包括的信元,所述第一信令为所述N条信令中的任一条信令,所述第一向量包含于所述第一特征序列。
在一个可选的实现方式中,所述目标信元包括指示所述第一信令的发送原因的信元。
在一个可选的实现方式中,所述处理器,具体用于将第二信令中的M个信元作为自然语言处理NLP算法中包括一个或多个单词的文本进行特征构造,得到第二向量;所述第二信令为所述N条信令中的任一条信令,所述第二向量包含于所述第一特征序列,所述M为大于1的整数。
在一个可选的实现方式中,所述处理器,具体用于按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第二特征构造,得到所述第二特征序列。
在一个可选的实现方式中,所述处理器,具体用于将第二消息类型作为自然语言处理NLP算法中的一个单词进行特征构造,得到第三向量;所述第二消息类型为第三信令包括的消息类型,所述第三信令为所述N条信令中的任一条信令,所述第三向量包含于所述第二特征序列。
在一个可选的实现方式中,所述处理器,具体用于在第F轮异常检测处理中,将第三特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第一集合;所述第一集合包括至少一个消息类型和信元的组合,所述第三特征序列中的特征向量依次为所述第一特征序列中的第(F-K)个第一特征向量至第(F-1)个第一特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;在所述N条信令中的第F条信令的消息类型和信元的组合未包含于所述第一集合的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程异常。
在一个可选的实现方式中,所述处理器,还用于在所述第F条信令的消息类型和信元的组合包含于所述第一集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;在所述第F条信令的消息类型和信元的组合包含于所述第一集合,且所述F等于所述N的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程正常。
在一个可选的实现方式中,所述第二特征序列中的第F个第二特征向量与所述N条信 令中的第F条信令相对应;所述处理器,具体用于在第F轮异常检测处理中,将第四特征序列输入至所述第二信令异常检测模型进行异常检测处理,得到第二集合;所述第二集合包括至少一个消息类型,所述第四特征序列中的特征向量依次为所述第二特征序列中的第(F-K)个第二特征向量至第(F-1)个第二特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;在所述N条信令中的第F条信令的消息类型未包含于所述第二集合的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程异常。
在一个可选的实现方式中,所述处理器,还用于在所述第F条信令的消息类型包含于所述第二集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;在所述第F条信令的消息类型包含于所述第二集合,且所述F等于所述N的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程正常。
在一个可选的实现方式中,所述处理器,还用于在所述第一异常检测结果指示所述待测信令流程异常的情况下,确定所述待测信令流程中发生异常的位置。
在一个可选的实现方式中,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;所述处理器,具体用于在第H轮信令异常定位中,将第五特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第三集合;所述第三集合包括至少一个消息类型和信元的组合,所述第五特征序列中的特征向量依次为所述第一特征序列中的第(H-L)个第一特征向量至第(H-1)个第一特征向量;所述H为大于1的整数,所述L为大于1且小于所述H的整数;在所述N条信令中的第H条信令的消息类型和信元的组合未包含于所述第三集合的情况下,确定所述第H条信令发生异常。
在一个可选的实现方式中,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;所述处理器,具体用于获得所述N条信令对应的异常概率序列,所述异常概率序列中的第G个概率表示所述N个信令中的前(D+G)个信令中包括异常信令的概率,所述G和所述D均为大于0的整数;根据所述异常概率序列,确定所述待测信令流程中发生异常的位置。
在一个可选的实现方式中,所述处理器,具体用于将第六特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到所述异常概率序列中的所述第G个概率;所述第六特征序列包括的特征向量依次为所述第一特征序列中的前(D+G)个第一特征向量。
在一个可选的实现方式中,所述处理器,具体用于在所述异常概率序列中的所述第G个概率与第(G-1)个概率的差值大于概率阈值的情况下,确定第一信令区间的信令发生异常,所述第一信令区间包括所述N条信令中的第(G+D-1)至第N条信令,所述G为大于1的整数;在所述异常概率序列中的各概率均不小于其前面的概率的情况下,确定第二信令区间中的信令发生异常;所述第二信令区间包括所述N条信令中的第(P+D)至第N条信令,所述异常概率序列中第P个概率与第(P+1)个概率的差值不小于所述异常概率序列中任意两个相邻概率的差值,所述P为大于0的整数;在所述异常概率序列中的概率存在由第一值递增至第二值且保持所述第二值之前由第三值递减至所述第一值的情况下,确定第三信令区间中的信令发生异常;所述第一值小于第一阈值,所述第二值和所述第三值均大于第二阈值,所述第一阈值小于所述第二阈值,所述第三信令区间包括所述N条信令中 的第(Q+D)至第N条信令,所述异常概率序列中的第Q个概率为所述异常概率曲线中最后一个上升段的起点的概率,所述Q为大于0的整数;在所述异常概率序列中的各概率均不小于所述概率阈值的情况下,确定第四信令区间中的信令发生异常,所述第四信令区间包括所述N个信令中的第D个信令至第N个信令。
在一个可选的实现方式中,所述处理器,还用于采集信令数据;所述信令数据包括所述N条信令;解析所述信令数据中的每条信令,得到每条信令对应的接口、时间戳、协议以及流程标识;将所述信令数据中对应的接口、协议以及流程标识均相同的信令分到相同组,得到至少一组信令;按照目标组信令中各信令包括的时间戳的先后顺序对所述目标组信令中的各信令进行先后排序,得到所述待测信令流程,所述目标组信令为所述至少一组信令中任一组信令。
第七方面,本申请实施例提供了一种训练装置,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一向量序列;所述第一向量序列中的特征向量与所述训练信令流程中的信令一一对应;将所述第一向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第一训练特征序列输入至第一训练模型进行无监督学习,得到第一信令异常检测模型;所述第一训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。
第八方面,本申请实施例提供了一种训练装置,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二向量序列;所述第二向量序列中的特征向量与所述训练信令流程中的信令一一对应;将所述第二向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第二训练特征序列输入至第二训练模型进行无监督学习,得到第二信令异常检测模型;所述第二训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。
第九方面,本申请实施例提供了一种训练装置,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一训练样本;所述第一训练样本中的特征向量与所述训练信令流程中的信令一一对应;将第一训练样本和第一标注信息输入至第三训练模型进行异常检测处理,得到第一异常检测处理结果;所述第一异常检测处理结果指示所述第一训练样本为正常信令流程或者异常信令流程;根据所述第一异常检测处理结果和第一标准结果,确定所述第一训练样本对应的损失;所述第一标准结果为所述第一标注信息指示的所述第一训练样本的真实结果;利用所述第一训练样本对应的损失,通过优化算法更新所述第三训练模型的参数,得到第一信令异常检测模型。
第十方面,本申请实施例提供了一种训练装置,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二训练样本;所述第二训练样本中的特征向量与所述训练信令流程中的信令一一对应;将第二训练样本和第二标注信息输入至第四训练模型进行异常检测处理,得到第二异常检测处理结果;所述第二异常检测处理 结果指示所述第二训练样本为正常信令流程或者异常信令流程;根据所述第二异常检测处理结果和第二标准结果,确定所述第一训练样本对应的损失;所述第二标准结果为所述第二标注信息指示的所述第二训练样本的真实结果;利用所述第二训练样本对应的损失,通过优化算法更新所述第四训练模型的参数,得到第二信令异常检测模型。
第十一方面,本申请实施例提供了一种计算机可读存储介质,该计算机存储介质存储有计算机程序,该计算机程序包括程序指令,该程序指令当被处理器执行时使该处理器执行上述第一方面至第五方面以及任一种可选的实现方式的方法。
第十二方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括程序指令,该程序指令当被处理器执行时使该信处理器执行上述第一方面至第五方面以及任一种可选的实现方式的方法。
附图说明
图1A为本申请实施例提供的一种信令分析系统的网络架构示意图;
图1B为本申请实施例提供的一种信令分析装置示意图;
图2为本申请实施例提供的一种信令分析方法流程图;
图3为本申请实施例提供的一种信令异常定位方法流程图;
图4为本申请实施例提供的一种信令数据预处理方法流程图;
图5为本申请实施例提供的一种信令异常检测方法流程图;
图6为本申请实施例提供的一种特征构造示意图;
图7为本申请实施例提供的另一种信令异常检测方法流程图;
图8为本申请实施例提供的另一种特征构造示意图;
图9为本申请实施例提供的一种异常定位过程示意图;
图10为本申请实施例提供的一种信令异常定位方法流程图;
图11为本申请实施例提供的一种构建异常评估曲线的示意图;
图12为本申请实施例提供的一种信令异常检测和定位方法流程图;
图13为本申请实施例提供的一个待测信令流程的异常评估曲线;
图14为本申请实施例提供的另一种信令异常检测和定位方法流程图;
图15为本申请实施例提供的一种训练装置的结构示意图;
图16为本申请实施例提供的一种信令分析装置的结构示意图。
具体实施方式
本申请的说明书实施例和权利要求书及上述附图中的术语“第一”、“第二”、和“第三”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。“和/或”用于表示在其所连接的两个对象之间选择一个或全部。例如“A和/或B”表示A、B或A+B。下面先介绍 一下本申请所用到的一些术语。
流程:在通信网络中各个设备之间传递的,由对呼叫、承载和连接等业务功能进行控制的信令消息所组成的控制流称之成为信令流程。
域:通常指信息技术(Information Technology,IT)基础资源的一个逻辑划分,用于对基础资源进行规划和管理,不同域的业务不同,使用的通信协议也不同。
接口:通信网络中两个系统之间的边界,由特定的协议或规范进行定义,用于确保边界处的格式、功能、信号和互连的兼容性。
信元:信令消息承载的信息单元,由具体信令所遵从的协议定义其含义,按照协议定义的方式封装成信令消息。具体内容如业务类型指示、承载建立参数、用户标识等。
Coarse-Fine:一种常见的先粗颗粒度(Coarse-level)、再细颗粒度(Fine-level)的分层分析方法。
控制面信令:通常指通信网络中为用户建立业务的控制型信令数据。
目前的信令分析方法和信令分析装置都只能满足某一特定信令协议的应用需要,难以在不同协议间复用。考虑到通信网络中的信令故障分析常涉及多个域内的多种不同协议,当前通常的解决方案是为不同协议分别定制相应的信令分析装置。这种方式会导致资源的浪费与维护的困难。此外,这些信令分析方法和装置难以全面覆盖由信元错误引发的异常,导致信令异常分析的覆盖面不全。在此背景下,本申请提出了基于数据驱动的智能化分析信令方法与装置,适用于通信网络控制面的所有信令协议,它能够极大的提升分析信令数据的效率,降低分析成本,在运维领域中产生巨大的应用价值。
本申请实施例提供的信令分析方法能够应用在信令异常检测、信令异常定位等场景。下面分别对申请实施例提供的信令分析方法在信令异常检测场景和信令异常定位场景中的应用进行简单的介绍。
信令异常检测场景:信令分析装置实时对从通信网络中采集到的信令数据进行异常检测,在检测到任一信令流程发生异常之后,输出相应的异常检测结果,以便于运维人员及时获知发生异常的信令流程。也就是说,信令分析装置可实时分析是否有发生异常的信令流程,在检测到发生异常的信令流程时,向运维人员发送异常检测结果,以便于运维人员及时获知发生异常的信令流程。
信令异常定位场景:信令分析装置实时对从通信网络中采集到的信令数据进行异常检测,在检测到任一信令流程发生异常之后,进一步确定该信令流程中发生异常的信令区间,并向运维人员发送指示该信令区间中的信令发生异常的信息,以便于运维人员有针对性的解决信令异常的问题。
图1A为本申请实施例提供的一种信令分析系统的网络架构示意图,如图1A所示,该信令分析系统包括信令采集装置和信令分析装置。该信令采集装置包括信令获取模块101和信令预处理模块102,该信令分析装置包括信令异常检测模块103、信令异常定位模块104以及分析结果输出模块105。其中,信令异常定位模块104是可选的。也就是说,信令分析装置可不包括信令异常定位模块104。该信令采集装置,用于从通信网络中采集信令数据;从采集的信令数据中提取出待测信令流程,并发送至信令分析装置。该信令分析装置用于实现信令异常检测以及信令异常定位(即确定发生异常的信令区间)。后续再详述各 模块的功能。
图1B为本申请实施例提供的一种信令分析装置示意图,如图1B所示,该信令分析装置包括信令获取模块101、信令预处理模块102、信令异常检测模块103、信令异常定位模块104以及分析结果输出模块105。其中,信令异常定位模块104是可选的。对比图1A和图1B可知,信令分析装置可独立实现信令异常检测,也可以同其他装置相配合来实现信令异常检测。应理解,图1A和图1B中的信令分析装置为两种可选的实现方式。下面对图1A和图1B中各模块进行介绍。
信令获取模块101:完成通信网络中待分析信令数据的采集,即采集信令数据。
信令预处理模块102:针对获取到的信令数据,先解析出和信令分析相关的信息,再以信令流程为单位提取出待测信令流程。
信令异常检测模块103:通过分析信令中的消息类型与消息内容,实现了一套高效且准确的coarse-fine信令异常检测。其中,根据训练数据的标注情况,分别提供了有监督和无监督两套异常检测方式。
信令异常定位模块104:针对信令异常检测模块103检测出的异常信令流程,实现了一套可解释的异常信令定位方式。其中,根据训练数据的标注情况,分别提供了有监督和无监督两套异常定位方式。
分析结果输出模块105:将信令异常检测模块103和信令异常定位模块104的结果整理输出。
后续再详述各模块的实现方式,这里先不作详述。
图2为本申请实施例提供的一种信令分析方法流程图,如图2所示,该方法可包括:
201、信令分析装置获取待测信令流程。
上述待测信令流程包括N条信令,上述N条信令中任一条信令包括消息类型和信元,上述N为大于1的整数。该信令分析装置可以是服务器、计算机、云化网元等具备数据处理功能的设备。可选的,上述待测信令流程中的各信令均对应于同一协议以及同一接口。可选的,信令分析装置中的信令获取模块101和信令预处理模块102来实现步骤201。
在一些实施例中,信令分析装置在执行步骤201之前,可执行如下操作:采集信令数据;上述信令数据包括上述N条信令;解析上述信令数据中的每条信令,得到每条信令对应的接口、时间戳、协议以及流程标识;将上述信令数据中对应的接口、协议以及流程标识均相同的信令分到相同组,得到至少一组信令;按照目标组信令中各信令包括的时间戳的先后顺序对上述目标组信令中的各信令进行先后排序,得到上述待测信令流程,上述目标组信令为上述至少一组信令中任一组信令。
202、分别对N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列。
上述第一特征序列包括N个第一特征向量,上述N个第一特征向量与上述N条信令一一对应。在一些实施例中,步骤202可替换为:分别对N条信令中每一条信令包括的各信元进行第一特征构造,得到第一特征序列。后续再详述由信令包括的消息类型和信元进行特征构造得到特征向量的实现方式。
203、将上述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一 异常检测结果。
上述第一异常检测结果指示上述待测信令流程正常或者异常。该第一信令异常检测模型可以是通过无监督学习得到的模型,也可以是通过监督学习得到的模型。可选的,信令分析装置利用信令异常检测模块103执行步骤202和步骤203。可选的,在上述第一异常检测结果指示上述待测信令流程异常的情况下,向运维人员发送上述第一异常检测结果,以便于运维人员及时获知发生异常的信令流程。
本申请实施例提供的方法能够覆盖由消息类型以及信元错误引发的异常,并且能够在不同协议间复用。
为了在保证异常检测准确率的情况下提高检测效率,信令分析装置可利用信令异常检测模块103先进行第一轮基于消息类型的coarse-level异常检测,若第一轮检测未检测到异常,则再进行第二轮基于消息类型和信元的fine-level异常检测。在一个可选的实现方式中,信令分析装置在执行步骤202之前,执行如下操作:分别对上述N条信令进行第二特征构造,得到第二特征序列;将上述第二特征序列输入至第二信令异常检测模型进行异常检测处理,得到第二异常检测结果;上述第二异常检测结果指示上述待测信令流程正常或者异常。信令分析装置可以在上述第二异常检测结果指示上述待测信令流程正常的情况下,执行步骤203。上述第二特征序列包括N个第二特征向量,上述N个第二特征向量与上述N条信令一一对应,上述N个第二特征向量中参考特征向量由对上述参考特征向量对应的信令包括的消息类型进行特征构造得到,上述参考特征向量为上述N个第二特征向量中的任一向量。
在该实现方式中,coarse-level的第一轮异常检测可以在保证召回率、较高准确率的情况下提高检测效率,而fine-level的第二轮异常检测则可以更进一步提高整体召回率。
图2仅描述了信令分析装置实现信令异常检测的方法流程。在一些实施例中,信令分析装置在检测到待测信令流程发生异常之后,还可以确定该待测信令流程中发生异常的位置,即异常信令区间。可选的,信令分析装置可采用信令异常定位模块104确定该待测信令流程中发生异常的位置。在一些实施例中,信令分析装置可包括无监督的信令异常检测模块201、有监督的信令异常检测模块202、无监督的信令异常定位模块203以及有监督的信令异常定位模块204。也就是说,在一些实施例中,信令异常检测模块103可包括监督的信令异常检测模块201和有监督的信令异常检测模块202,信令异常定位模块104可无监督的信令异常定位模块203以及有监督的信令异常定位模块204。
下面介绍信令分析装置在检测到待测信令流程发生异常之后,进行信令异常定位的一个实施例。
图3为本申请实施例提供的一种信令异常定位方法流程图,如图3所示,该方法包括:
301、信令分析装置获取信令数据。
可选的,信令分析装置中的信令获取模块101获取通信网络中待分析的信令数据。在一些实施例中,信令获取模块101利用设备厂商发布的信令采集工具或者网络抓包工具或者专用信令仪从通信网络中采集信令数据。其中,采集到的信令数据可以来自但不仅限于无线域、电路交换(Circuit Switched)域、分组交换(Packet Switched)域、IP多媒体子系统(IP MultimediaSubsystem,IMS)域的交互控制类型的信令数据。
302、信令分析装置对上述信令数据进行预处理得到待测信令流程。
可选的,信令分析装置中的信令预处理模块102对上述信令数据进行预处理得到待测信令流程。后续再详述步骤302的实现方式。该待测信令流程包括N条信令,上述N条信令均属于同一信令流程,上述N条信令中任一条信令包括消息类型和信元,上述N为大于1的整数。在一些实施例中,信令预处理模块102可以先对信令数据进行解析,得到和信令分析相关的协议、接口、时间戳、流程标识符、消息类型和消息内容;再根据协议、接口、时间戳和流程标识符提取出待测信令流程。
303、信令分析装置判断有监督的信令异常检测模块202对该待测信令流程进行异常的准确率是否高于目标阈值。
若是,执行步骤307;若否,执行步骤304。该目标阈值可以是80%、90%、95%等。在一些实施例中,信令分析装置可存储有有监督的信令异常检测模块202对该待测信令流程进行异常检测的准确率。可以理解,若有监督的信令异常检测模块202对该待测信令流程进行异常检测的准确率低于目标阈值,则采用无监督的信令异常检测模块202对该待测信令流程进行异常检测;否则,采用有监督的信令异常检测模块202对该待测信令流程进行异常检测。在实际应用中,信令分析装置可存储有监督的信令异常检测模块202对各信令流程进行异常检测的准确率。
304、信令分析装置采用无监督的信令异常检测模块201对该待测信令流程进行异常检测。
可选的,无监督的信令异常检测模块201先基于该待测信令流程中各信令的消息类型进行第一轮异常检测,再基于该待测信令流程中各信令的消息类型和信元进行第二轮异常检测。后续再详述步骤304的实现方式。
305、信令分析装置判断无监督的异常检测模块201是否检测到待测信令流程异常。
若是,执行步骤306;若否,执行步骤310。
306、信令分析装置采用无监督的信令异常定位模块203确定该待测信令流程中发生异常的位置。
307、信令分析装置采用有监督的信令异常检测模块202对该待测信令流程进行异常检测。
可选的,有监督的信令异常检测模块202先基于该待测信令流程中各信令的消息类型进行第一轮异常检测,再基于该待测信令流程中各信令的消息内容进行第二轮异常检测。后续再详述步骤307的实现方式。每个信令的消息内容包括至少一个信元。
308、信令分析装置判断有监督的异常检测模块202是否检测到待测信令流程异常。
若是,执行步骤309;若否,执行步骤310。
309、信令分析装置采用有监督的信令异常定位模块204确定该待测信令流程中发送异常的位置。
310、信令分析装置输出异常检测结果和异常定位结果。
可选的,信令分析装置的分析结果输出模块105在待测信令流程异常时,输出指示该待测信令流程正常的信息;在该待测信令流程异常时,输出该待测信令流程中发生异常的信令区间或者异常信令的位置。
在一些实施例中,信令分析装置可包括无监督的信令异常检测模块201(即信令异常检测模块103)和无监督的信令异常定位模块203(即信令异常定位模块104),而不包括有监督的信令异常检测模块202和有监督的信令异常定位模块204。在该实施例中,信令分析装置可执行图3中的步骤301、步骤302、步骤304、步骤305、步骤306以及步骤310。
在一些实施例中,信令分析装置可包括有监督的信令异常检测模块202(即信令异常检测模块103)和有监督的信令异常定位模块204(即信令异常定位模块104),而不包括无监督的信令异常检测模块201和无监督的信令异常定位模块203。在该实施例中,信令分析装置可执行图3中的步骤301、步骤302、步骤307、步骤308、步骤309以及步骤310。
下面分别对前面涉及的各模块的功能和实现方式进行更详细的介绍。
信令预处理模块102,用于对信令数据进行预处理以得到待测信令流程。如图4所示,信令预处理模块102执行的步骤如下:401、信令解析;402、信令流程提取。信令解析可以是:针对信令数据中的每条信令解析出和信令分析相关的信息,例如协议、接口、时间戳和流程标识符等。信令流程提取可以是:以信令流程为单位提取出待测信令流程,用于后续信令分析。示例性的,信令预处理模块可利用信令解析工具完成信令解析,基于各信令的协议、接口、时间戳和流程标识符等信息完成信令流程提取。
信令解析401:为了减少不同协议下信令数据间的差异性,例如S1AP协议下的信令消息是二进制形式而SIP协议下的信令消息是类超文本标记语言(Hyper Text Markup Language,HTML)形式,以及提高信令数据的可读性,信令预处理模块可以解析每条信令得到和信令分析相关的多种信息。表1中展示了和信令分析相关的多种信息。应理解,表1中的信息仅作为一种示例,信令预处理模块解析得到的和信令分析相关的信息不限于表1。时间戳可以是该信令消息发出/接收的时间标识,不仅限于绝对的具体时间(如2018-07-10 15:38:10.031)、通信设备上的相对的时间计数值(如1122867)。
表1
Figure PCTCN2020102680-appb-000001
信令流程提取402:由于信令数据的分析是上下文相关的,信令预处理模块可以先根据各信令对应的协议、接口和流程标识等信息对解析后的所有信令进行分组,再按照每组 信令中各信令的时间戳的先后顺序对各信令进行排序,从而得到最终的待测信令流程。每组信令中各信令对应的流程标识以及协议均相同,因此每组信令属于同一信令流程。举例来说,信令预处理模块根据各信令对应的协议、接口和流程标识等信息对解析后的所有信令进行分组,得到5组信令;然后,按照每组信令中各信令包括的时间戳的先后顺序对每组信令中的各信令进行先后排序,这样可以得到5个待测信令流程。
无监督的信令异常检测模块201,用于检测待测信令流程是否异常。当没有充足的带标注的信令数据用于模型训练时,信令分析装置可采用无监督的异常检测模块进行模型训练与异常检测。如图5所示,无监督的异常检测模块可执行如下操作:
501、无监督的信令异常检测模块对待测信令流程进行第一轮基于消息类型的coarse-level异常检测。
502、无监督的信令异常检测模块判断待测信令流程是否异常。
若是,执行步骤506;若否,执行步骤503。可选的,无监督的信令异常检测模块根据步骤501得到的异常检测结果判断待测信令流程是否异常。
503、无监督的信令异常检测模块对待测信令流程进行第二轮基于消息类型和消息内容中关键信元的fine-level异常检测。
504、无监督的信令异常检测模块判断待测信令流程是否异常。
若是,执行步骤506;若否,执行步骤505。可选的,无监督的信令异常检测模块根据步骤503得到的异常检测结果判断待测信令流程是否异常。
505、无监督的信令异常检测模块将无异常的分析结果输出至分析结果输出模块。
无异常的分析结果指示待测信令流程正常。
506、无监督的信令异常检测模块将待测信令流程输出至无监督的异常定位模块。
为了在保证异常检测准确率的情况下提高异常检测效率,无监督的信令异常检测模块201可先进行第一轮基于消息类型信息的coarse-level异常检测,若第一轮检测未检测到异常,则再进行第二轮基于消息类型和消息内容中关键信元(对应于目标信元)的fine-level异常检测。关键信元可以包括消息内容中指示信令的发送原因的信元。当以上任意一轮检测到异常时,异常的待测信令流程可直接输出至无监督的异常定位模块203进行异常定位。否则,可直接将无异常的分析结果输出至分析结果输出模块105。无监督的信令异常检测模块201的第一轮异常检测可以在保证一定召回率、较高准确率的情况下提高检测效率,而fine-level的第二轮异常检测则可以更进一步提高整体召回率。
由于通信网络中各网元间的信令交互类似于生活中人们的交互对话,最新时刻产生的信令都是以其之前时刻产生的信令作为依据的,即可看作对历史信令的响应。因此,无监督的异常检测模块实现的一种可选的方案是:利用NLP技术,基于正常信令数据建立信令异常检测模型(对应于第一信令异常检测模型和第二信令异常检测模型),通过判断待测信令消息是否在预测信令范围内进行异常检测。正常信令数据是指未发生异常的信令流程。下面来详细描述无监督的异常检测模块进行异常检测的实现方式,即步骤501和步骤503的实现方式。
基于消息类型的异常检测:无监督的异常检测模块在接收到解析好的待测信令流程后,可以先依次对待测流程中各信令的消息类型进行特征构造,从而得到该待测流程对应的消 息类型特征序列(对应于第二特征序列)。随后,利用基于消息类型的信令异常检测模型A(对应于第二信令异常检测模型)对待测信令特征序列进行coarse-level的异常检测。
第一轮特征构造:将每条信令的消息类型作为NLP算法中的一个单词,利用NLP中对单词的特征构造方法进行特征构造。其中,对单词的特征构造方法包括但不仅限于独热编码、词袋编码。无监督的异常检测模块依次对待测流程中各信令的消息类型进行特征构造,可得到一个特征序列,该特征序列中的每个特征向量对应一个消息类型。
第一轮异常检测:针对特征构造后得到的特征序列(即上述消息类型特征序列),从特征序列的第一个特征向量(对应于第二特征向量)开始,利用窗长为w、步长为1的滑动窗进行逐条检测。其中,可以在特征序列的第一个特征向量前填补w个占位符,用于预测序列的第一个特征向量。无监督的异常检测模块进行异常检测时,将特征序列中第(t-w)个特征向量、第(t-w+1)个特征向量、…,第(t-1)个特征向量输入到基于消息类型的信令异常检测模型A中进行异常检测处理,可得到第t条信令可能的消息类型范围。其中,w和t均为大于1的整数。如果待测流程中的第t条信令的消息类型不在预测得到的消息类型范围(对应于第二集合)内时,即可认为该待测信令流程异常,并将该异常的待测信令流程直接输出至无监督的异常定位模块203进行异常定位,否则进入第二轮基于消息类型和关键信元的异常检测。其中,基于消息类型的信令异常检测模型所用到的序列模型可以是但不仅限于N元模型、NNLM。
基于消息类型和关键信元的异常检测:由于个别信令故障问题较难甚至无法在信令消息类型中体现,在第一轮基于消息类型的异常检测未检测到异常后,无监督的异常检测模块可以对待测信令流程进行第二轮基于消息类型和关键信元的异常检测。其中,针对一条信令中由若干信元组成的消息内容,无监督的异常检测模块可以将cause类信元视作关键信元,在消息类型的基础上,利用消息内容中的cause类信元的信元值进行异常检测。其中,cause类信元指信令消息中能显式指示本条信令发送原因的信元。无监督的异常检测模块可以先依次对待测流程中各信令的消息类型和关键信元进行特征构造,从而得到该待测信令流程对应的特征序列。随后,利用基于消息类型和关键信元的信令异常检测模型B(对应于第一信令异常检测模型)对待测信令流程对应的特征序列进行fine-level异常检测。
第二轮特征构造:将每条信令的消息类型和关键信元的组合看作NLP算法中的一个单词,利用NLP中对单词的特征构造方法进行特征构造。其中,一种可选的消息类型和关键信元的组合方法是将消息类型与关键信元的信元值进行拼接,并将拼接结果看作一个新的单词,如图6所示。单词的特征构造方法包括但不仅限于One-hot编码和BoW编码。无监督的异常检测模块可以依次对待测流程中每条信令的消息类型和关键信元进行特征构造,得到一个特征序列(对应于第一特征序列),该特征序列中的每个特征向量对应该待测信令流程中一条信令的消息类型和关键信元的组合。
第二轮异常检测:针对第二轮特征构造后得到的特征序列,无监督的异常检测模块以窗长为w′、步长为1的滑动窗形式进行逐条检测。示例性的,无监督的异常检测模块进行第二异常检测时,将第一特征序列中第(t-w)个特征向量、第(t-w+1)个特征向量、…,第(t-1)个特征向量输入到基于消息类型和关键信元的信令异常检测模型B中进行异常检测处理,可得到第t条信令可能的消息类型和关键信元的组合。当待测信令流程中的第t 条信令的消息类型与关键信元和组合不包含于信令异常检测模型B预测得到的至少一个消息类型与关键信元的组合(对应于第一集合)内时,可确定该待测信令流程异常,并将该异常的待测信令流程直接输出至无监督的异常定位模块进行异常定位,否则,可直接将无异常的分析结果输出至分析结果输出模块105。其中,基于消息类型和关键信元的信令异常检测模型B可采用的序列型模型可以是但不仅限于N-Gram模型、NNLM模型。
上述异常检测中所用到的信令异常检测模型A和信令异常检测模型B需要在正常的信令训练数据集上通过机器学习的方法获得。下面介绍训练得到信令异常检测模型A和信令异常检测模型B的方式。
可选的,训练得到信令异常检测模型A的训练方法如下:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二向量序列;所述第二向量序列中的特征向量与所述训练信令流程中的信令一一对应;将所述第二向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第二训练特征序列输入至第二训练模型进行无监督学习,得到第二信令异常检测模型;所述第二训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。在训练信令异常检测模型A之前,可先获取通信网络中未发生异常的多个信令流程(即正常信令流程),作为训练信令异常检测模型A的数据集;针对该数据集中的各信令流程,对其每条信令中的消息类型进行特征构造,从而得到各信令流程的特征序列。以三元语言模型(3-grams)为例,若一信令流程中的消息类型在特征构造后的特征序列为[a,b,c,d](对应于第二向量序列),则该信令流程可得到如下用于训练信令异常检测模型A的特征序列(对应于第二特征序列):[<bos>,<bos>,<bos>,a],[<bos>,<bos>,a,b],[<bos>,a,b,c],[a,b,c,d],[b,c,d,<eos>]。其中,<bos>和<eos>分别为占位符和终止符在特征构造后对应的向量。以三元语言模型为例,针对输入的特征序列,模型会统计所有特征序列内的条件概率P(即特征序列的第四条消息|特征序列中前三条消息),从而得到正常信令流程的信令异常检测模型。
可选的,训练得到信令异常检测模型B的训练方法如下:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一向量序列;所述第一向量序列中的特征向量与所述训练信令流程中的信令一一对应;将所述第一向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第一训练特征序列输入至第一训练模型进行无监督学习,得到第一信令异常检测模型;所述第一训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。在训练信令异常检测模型B之前,可先获取通信网络中未发生异常的多个信令流程(即正常信令流程),作为训练信令异常检测模型B的数据集;针对该数据集中的各信令流程,对其每条信令中的消息类型和关键信元进行特征构造,从而得到各信令流程的特征序列。以三元语言模型(3-grams)为例,若一信令流程中的消息类型和关键信元在特征构造后的向量序列为[a,b,c,d](对应于第一向量序列),则该流程得到的特征序列为[<bos>,<bos>,<bos>,a],[<bos>,<bos>,a,b],[<bos>,a,b,c],[a,b,c,d],[b,c,d,<eos>]。其中<bos>和<eos>分别为占位符和终止符在特征构造后对应的向量。以三元语言模型为例,针对输入的特征序列,该模型会统计所有正常特征序列内的条件概率P(特征序列的第四条消息|特征序列中前三条消息),从而得到正常信令流程的信令异常检测模型B。
有监督的信令异常检测模块202,用于检测待测信令流程是否异常。随着带标注的信令数据的逐渐积累,当有大量带标注的信令数据可用于模型训练时,信令分析装置可使用有监督异常检测模块进行模型训练与异常检测。与无监督的信令异常检测模块相比有监督的信令异常检测模块实现异常检测的准确率和召回率更高。如图7所示,无监督的coarse-fine异常检测模块可执行如下操作:
701、有监督的信令异常检测模块对待测信令流程进行第一轮基于消息类型的coarse-level异常检测。
702、有监督的信令异常检测模块判断待测信令流程是否异常。
若是,执行步骤706;若否,执行步骤703。
703、有监督的信令异常检测模块对待测信令流程进行第二轮基于消息内容的fine-level异常检测。
704、有监督的信令异常检测模块判断待测信令流程是否异常。
若是,执行步骤706;若否,执行步骤705。
705、有监督的信令异常检测模块将无异常的分析结果输出至分析结果输出模块。
无异常的分析结果指示待测信令流程正常。
706、有监督的信令异常检测模块将待测信令流程输出至有监督的异常定位模块。
对比图7和图5可知,有监督的信令异常检测模块与无监督的信令异常检测模块相比,异常检测流程类似,主要区别在于步骤701和步骤501的实现方式不同,以及步骤703和步骤503的实现方式不同。由于图7中的异常检测流程覆盖了消息内容中的所有信元,而非只有关键信元,这图7中的异常检测方法流程的异常检测范围更广。
有监督的异常检测模块的一种可选的方案是:利用NLP技术,基于带标注的信令数据建立信令流程的分类模型(正常、异常),通过对待测信令流程进行分类,完成异常检测。下面来详细描述有监督的异常检测模块进行异常检测的实现方式,即步骤701和步骤703的实现方式。
基于消息类型的异常检测:有监督的异常检测模块在接收到解析好的待测信令流程后,可以先依次对待测流程中各信令消息的消息类型信息进行特征构造,从而得到该待测流程对应的消息类型特征序列(对应于第二特征序列)。随后,利用基于消息类型的信令流程分类模型C(对应于第二信令异常检测模型)对待测信令特征序列进行coarse-level的异常检测。
第一轮特征构造:将每种消息类型作为NLP算法中的一个单词,利用NLP中对单词的特征构造方法进行特征构造。其中,对单词的特征构造方法包括但不仅限于One-hot编码、BoW编码。有监督的异常检测模块依次对待测流程中各信令的消息类型进行特征构造,可得到一个特征序列,该特征序列中的每个特征向量对应一个消息类型。
第一轮异常检测:针对特征构造后得到的特征序列,有监督的异常检测模块基于消息类型的信令流程分类模型C对该特征序列进行分类,从而得到待测信令流程是否异常的分类结果。如果信令流程分类模型C将待测流程分类至异常,则该待测信令流程异常,并将该异常的待测信令流程直接输出至有监督的异常定位模块204进行异常定位,否则进入第二轮基于消息内容的异常检测。其中,基于消息类型的信令流程分类模型可采用的模型可 以是但不仅限于循环神经网络(Recurrent Neural Networks,RNN)、长短时记忆(Long Short-Term Memory,LSTM)模型。
基于消息内容的异常检测:由于个别信令故障问题较难甚至无法在信令的消息类型中体现,在第一轮基于消息类型的异常检测未检测到异常后,有监督的异常检测模块对待测信令流程进行第二轮基于消息内容的异常检测。其中,针对信令消息中由若干信元组成的消息内容,有监督的异常检测模块利用各条信令的消息内容中的所有信元或部分信元进行异常检测。也就是说,有监督的异常检测模块可以先分别对待测流程中各信令消息的所有信元或部分信元进行特征构造,从而得到该待测流程对应的一个特征序列。随后,利用基于消息内容的信令流程分类模型D(对应于第一信令异常检测模型)对待测信令流程对应的特征序列进行fine-level异常检测。
第二特征构造:将每条由若干信元组成的消息内容看作一段由若干单词组成的文本段落,利用NLP中对文本段落的特征构造方法进行特征构造。其中,针对每条信令中的消息内容,由于各信元在各类信令消息中是按业务功能模块分布的,而业务功能出现的顺序是固定的,一种可选的消息内容处理方式是按照各信元在消息内容中出现的顺序依次提取信元的信元值,从而将该条消息内容的转换成一种带语义的文本段落,如图8所示。有监督的异常检测模块采用的对文本段落的特征构造方法包括但不仅限于自编码器、BoW编码。有监督的异常检测模块通过依次对待测流程中各条信令的消息内容进行特征构造,可得到一个特征序列,该特征序列中的每个特征向量对应待测信令消息中一条信令的消息内容。
第二轮异常检测:针对特征构造后得到的特征序列,有监督的异常检测模块可基于消息内容的信令流程分类模型D对该特征序列进行分类,从而得到待测信令流程的是否异常的分类结果。如果信令流程分类模型D将待测流程分类至异常,确定该待测信令流程异常,并将该异常的待测信令流程直接输出至有监督的异常定位模块进行异常定位,否则,可直接将无异常的分析结果输出至分析结果输出模块105。其中,基于消息类型的信令流程分类模型D可以是但不仅限于RNN、LSTM模型。
上述异常检测中所用到的信令流程分类模型C和信令流程分类模型D需要在带标注的信令训练数据集上通过机器学习的方法获得。下面介绍训练得到信令流程分类模型C以及信令流程分类模型D的实现方式。
可选的,训练得到信令流程分类模型C的训练方法如下:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二训练样本;所述第二训练样本中的特征向量与所述训练信令流程中的信令一一对应;将第二训练样本和第二标注信息输入至第四训练模型(对应于信令流程分类模型C)进行异常检测处理,得到第二异常检测处理结果;上述第二异常检测处理结果指示上述第二训练样本为正常信令流程或者异常信令流程;根据上述第二异常检测处理结果和第二标准结果,确定上述第一训练样本对应的损失;上述第二标准结果为上述第二标注信息指示的上述第二训练样本的真实结果;利用上述第二训练样本对应的损失,通过优化算法更新上述第四训练模型的参数,得到第二信令异常检测模型(对应于信令流程分类模型C)。
训练装置在训练信令流程分类模型C之前,可先获取带标注的正常信令流程与异常信令流程,作为训练信令流程分类模型C的数据集;针对该数据集中的各信令流程,对各信 令流程中的每条信令的消息类型进行特征构造,从而得到各信令流程的特征序列(包括第二训练样本)。以最长信令流程长度为10的数据集为例,若一信令流程中的消息类型在特征构造后的特征序列为[a,b,c,d],则该信令流程可得到的特征序列为[a,b,c,d,<bos>,<bos>,<bos>,<bos>,<bos>,<bos>]。其中,<bos>为占位符在特征构造后对应的向量。训练装置可基于各信令流程的特征序列以及各信令流程的标注信息进行建模,学习正常信令流程和异常信令流程间的差异性及其各自的特点,从而得到识别异常信令流程的信令流程分类模型C。以序列长度为10的LSTM分类模型为例,信令流程分类模型C的输入数据为信令流程的特征序列及其对应的标注信息(异常或正常),训练时采用的损失函数为交叉熵。当信令流程分类模型C的损失值低于指定损失阈值时,即可得到可区分正常信令流程和异常信令流程的分类模型。信令流程分类模型C的损失值是采用损失函数计算得到的损失值。
可选的,训练得到信令流程分类模型D的训练方法如下:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一训练样本;所述第一训练样本中的特征向量与所述训练信令流程中的信令一一对应;将第一训练样本和第一标注信息输入至第三训练模型(对应于信令流程分类模型D)进行异常检测处理,得到第一异常检测处理结果;上述第一异常检测处理结果指示上述第一训练样本为正常信令流程或者异常信令流程;根据上述第一异常检测处理结果和第一标准结果,确定上述第一训练样本对应的损失;上述第一标准结果为上述第一标注信息指示的上述第一训练样本的真实结果;利用上述第一训练样本对应的损失,通过优化算法更新上述第三训练模型的参数,得到第一信令异常检测模型(对应于信令流程分类模型D)。
训练装置在训练信令流程分类模型D之前,可先获取带标注的正常信令流程与异常信令流程,作为训练信令流程分类模型D的数据集;针对该数据集中的各信令流程,对各信令流程中的每条信令的消息内容进行特征构造,从而得到各信令流程的特征序列(包括第一训练样本)。以最长流程长度为10的数据集为例,若一信令流程中的消息内容在特征构造后的特征序列为[a,b,c,d],则该信令流程可得到的特征序列为[a,b,c,d,null,null,null,null,null,null]。其中,null为占位符在特征构造后对应的向量。训练装置可基于各信令流程的特征序列以及各信令流程的标注信息进行建模,学习正常信令流程和异常信令流程间的差异性及其各自的特点,从而得到识别异常信令流程的信令流程分类模型D。以序列长度为10的LSTM分类模型为例,信令流程分类模型D的输入数据为各信令流程的特征序列及其对应的标注信息(异常或正常),训练时采用的损失函数可以为交叉熵。当信令流程分类模型D的损失值低于指定损失阈值时,即可得到可区分正常信令流程和异常信令流程的分类模型。信令流程分类模型D的损失值是采用损失函数计算得到的损失值。
无监督的异常定位模块203,用于确定待测信令流程中发生异常的位置,即确定异常信令在该待测流程中的位置。在一些实施例中,无监督的异常定位模块203在无监督的异常检测模块201检测出待测信令流程异常之后,进一步分析得到异常信令在该待测流程中的位置。可选的,无监督的异常定位模块203在接收到由无监督的异常检测模块201传入的异常待测流程及其特征序列后,可以基于传入的特征序列进行异常定位,从而完成异常 定位。
无监督的异常定位模块的一个可选的方案是:利用信令异常检测模型A对待测信令流程进行异常定位。该可选的方案的一种实现方式如下:无监督的异常定位模块从待测特征序列的第1个特征向量开始,利用窗长为w、步长为1的滑动窗进行逐条检测。该待测特征序列可以是待测信令流程对应的特征序列。待测特征序列可以是来自无监督的异常检测模块,也可以是来自有监督的异常检测模块。通过将待测特征序列中的第(t-w)个特征向量、第(t-w+1)个特征向量、…,第(t-1)个特征向量输入到信令异常检测模型中进行异常检测处理,可得到第t条信令的可能范围。如果待测流程中的第t条信令不在预测得到的信令范围之内,即可确定该待测信令流程中的第t条信令异常,并可将待测流程的异常结果与异常位置输出至分析结果输出模块105。其中,可以在待测特征序列的第1个特征向量前填补w个占位符,用于预测待测信令流程的第一条信令。图9为本申请实施例提供的一种异常定位过程示意图。如图9所示,特征序列X表示待测特征序列,X(t)表示待测特征序列X中的第t个特征向量,Y表示第t条信令的可能范围(即所有可能的第t条信令),信令异常检测模型的输入为待测特征序列中的第(t-w)个特征向量至第(t-1)个特征向量,信令异常检测模型的输出为预测的第t条信令。
上述待测特征序列可以是基于消息类型得到的特征序列,也可以是基于消息类型和关键信元得到的特征序列。当无监督的异常定位模块接收到基于消息类型的特征序列时,无监督的异常定位模块采用信令异常检测模型A来进行异常定位。当无监督的异常定位模块接收到基于消息类型和关键信元的特征序列时,无监督的异常定位模块采用信令异常检测模型B来进行异常定位。
有监督的异常定位模块204,用于确定待测信令流程中发生异常的位置,即确定异常信令在该待测流程中的位置。在一些实施例中,有监督的异常定位模块204在有监督的异常检测模块202检测出待测信令流程异常之后,进一步分析得到异常信令在该待测流程中的位置。可选的,有监督的异常定位模块204在接收到由有监督的异常检测模块202传入的异常待测流程及其特征序列后,可以基于传入的特征序列进行异常定位,从而完成异常定位。
信令分析装置将待测信令流程输入至信令流程分类模型进行异常检测处理,可得到其属于异常流程的概率。该概率又可看作待测信令流程的异常程度,因此,可通过分析待测信令流程的异常程度变化来定位异常信令区间,即异常信令在待测信令流程中所处的信令区间。
有监督的异常定位模块的一个可选的方案是:先利用信令流程分类模型计算待测信令流程的异常评估曲线,再利用该异常评估曲线中的起伏变换完成异常信令区间的定位。图10为本申请实施例提供的一种信令异常定位方法流程图。如图10所示,该方法包括:1001、有监督的异常定位模块构建待测信令流程的异常评估曲线。1002、有监督的异常定位模块利用异常评估曲线定位异常信令区间,即确定待测信令流程中发生异常的位置。
示例性的,有监督的异常定位模块构建异常评估曲线的一种举例如下:如图11所示,针对一个由N条信令组成的待测信令流程,本模块可依次将该待测信令流程对应的特征序列中的前3个特征向量(S 1)、前4个特征向量(S 2)、…、前N个特征向量(S m,m=N-2)分 别输入至信令流程分类模型中,从而得到该待测信令流程的前3条信令片段(对应于S 1)的异常评估概率P 1、前4条信令片段(对应于S 2)的异常评估概率P 2、…、前N条信令片段(对应于S m)的异常评估概率P m。随后,通过将各异常评估概率按下标进行排序,即可得到异常评估曲线。其中,异常评估曲线的变化范围在0到1之间,该异常评估曲线的值会随着信令片段异常程度的上升而上升。可以理解,该待测信令流程对应一个异常概率序列,该异常概率序列中的第G个概率表示上述N个信令中的前(D+G)个信令中包括异常信令的概率。G和D均为大于0的整数。举例来说,D为2,异常概率序列中的概率依次为P 1、P 2、…、P m,P 1为待测信令流程中前3个信令片段的异常评估概率。
应理解,当有监督的异常定位模块接收到基于消息类型进行特征构造得到的特征序列时,采用信令流程分类模型C进行异常定位,即将该特征序列输入至信令流程分类模型C进行异常检测处理。当有监督的异常定位模块接收到基于消息内容进行特征构造得到的特征序列时,采用信令流程分类模型D进行异常定位,即将该特征序列输入至信令流程分类模型D进行异常检测处理。
一般情况下,待测信令流程常可进一步划分为若干子流程。正常的异常评估概率会随着流程中子流程的开始而上升、随着子流程的正常结束而下降。流程中任意一条信令发生的任何异常,如与上文信令消息存在冲突、子流程为正常结束,都会导致其之后的异常评估概率居高不下。因此,基于异常评估曲线中的起伏情况可为用户提供一种可解释的异常检测方法。
异常信令区间定位:由于不同异常情况下产生的异常评估曲线具有不同的特点,有监督的异常定位模块可先根据异常评估曲线的特点对其进行分类得到其所属曲线类型,再根据其所属曲线类型进行异常区间定位。表2展示了异常评估区间分类的一个示例。示例性的,异常评估曲线可按表2分为陡增、缓增、波动和持续四类,每类可按表3中的方式进行异常区间定位。表3展示了异常区间定位的一些方法。由于异常评估曲线在本质上还是时间序列,曲线分类可利用时间序列分类算法完成,异常区间定位可利用时间序列分析方法完成。
表2
Figure PCTCN2020102680-appb-000002
Figure PCTCN2020102680-appb-000003
表2中,低值(对应于第一阈值)可以是0.2、0.25等,高值(对应于第二阈值)可以是0.75、0.8、0.9等,本申请不作限定。
表3
Figure PCTCN2020102680-appb-000004
针对表3中的陡增段,一种可选的判断方法是:计算概率序列中相邻两点的差值,若差值大于陡增判断阈值(对应于概率阈值)则认为其为陡增段。该陡增判断阈值可以是0.3、0.4、0.5等。针对表3中的最大增幅段,一种可选的判断方法是:计算序列中各个相邻点间的差值,选择差值最大且大于增幅阈值的一段。以上描述了异常评估曲线中陡增段和最大增幅段的两种判断方法,但不仅限于与这两种方法。
分析结果输出模块105,用于根据其接收到的信令分析结果完成以下分析结果输出。信令分析结果可以是异常检测模块输出的异常检测结果,也可以是异常定位模块输出的异常信令的位置或者异常信令区间的位置。在一些实施例中,若待测信令流程正常,则输出待测信令流程及其分析结果“正常”;若待测信令流程异常,且分析结果是由无监督的异常定位模块203产生,则输出待测信令流程、分析结果“异常”及异常信令的位置;若待测信令流程异常,且分析结果是由有监督的异常定位模块204产生,则输出待测信令流程、分析结果“异常”及异常信令区间的位置。
前面介绍了前述实施例所涉及的模块的功能以及实现方式,下面结合应用场景来介绍两种信令异常检测和定位的实施例。
图12为本申请实施例提供的一种信令异常检测和定位方法流程图。图12为图2中的方法流程的进一步细化和完善,如图12所示,该方法包括:
1201、信令分析装置从通信网络的IMS域中采集SIP协议下的信令数据。
1202、信令分析装置解析采集的信令数据,并提取出待测信令流程。
可选的,针对采集到的信令数据,信令分析装置先利用SIP信令解析工具解析出信令数据中每条信令的协议、接口、时间戳、流程标识、消息类型和消息内容,再根据协议、接口、时间戳和流程标识完成信令流程提取。示例性的,信令流程提取的步骤如下:信令分析装置可先将信令数据中对应的协议、接口以及流程标识均相同的信令分到相同组,再按照每组信令中各信令包括的时间戳的先后顺序对每组信令中的信令进行先后排序。应理解,每组信令中各信令的协议、接口以及流程标识均相同,每组信令排序后对应一个待测信令流程。在实际应用中,信令分析装置可分别对各待测信令流程进行异常检测和定位。
1203、信令分析装置基于待测信令流程中各信令的消息类型对该待测信令流程进行异常检测。
示例性的,信令分析装置先依次对待测信令流程中各信令的消息类型进行One-hot编码,得到该待测流程对应的消息类型特征序列(对应于第二特征序列),再利用训练好的基于消息类型的LSTM分类模型(对应于信令流程分类模型C)对上述消息类型特征序列进行分类。
1204、判断待测信令流程是否异常。
若LSTM分类模型的分类结果为异常,则执行步骤1207,即直接将该待测信令流程输出到有监督的异常定位模块中;否则,执行步骤1205。
1205、信令分析装置基于待测信令流程中各信令的消息内容对该待测信令流程进行异常检测。
针对在步骤1203中未检测出异常的待测信令流程,信令分析装置可先基于待测信令流程中各信令的消息内容进行特征构造得到消息内容特征序列(对应于第一特征序列),再利用训练好的基于消息内容的LSTM分类模型(对应于信令流程分类模型D)对上述消息内容特征序列进行分类。基于待测信令流程中各信令的消息内容进行特征构造得到消息内容编码序列的一种举例如下:(1)、针对每条信令消息中的消息内容,识别每个信元属于名词型、数值型还是枚举型。其中,各个信元的信元类型可利用统计分类的方法在信令流程分类模型D的训练阶段获得。其中,名词型信元的取值在不同信令流程中几乎都不相同,枚举型信元的取值在不同流程中均来自于一个数量有限的离散值集合,数值型信元的取值来自于一段连续的值空间。(2)、过滤掉消息内容中的名词型信元。由于名词型信元在信令分析过程中的区分性过低,因此过滤掉消息内容中的名词性信元,即消息内容中的名词型信元不再进入后续的信令分析。(3)、将消息内容中的数值型信元进行离散化。由于数值型信元在NLP技术中较难处理,消息内容中的数值型信元会参考工程师的区间判断方式进行连续数值空间的离散化。假设某信元的值空间最大值为Vmax、最小值为Vmin、离散后的区间数量为n,则离散后的单位区间长度dl为(Vmax-Vmin)/n。当该信元取值为x(Vmin≤x≤Vmax)时,则对应离散化后的值为
Figure PCTCN2020102680-appb-000005
当该信元取值不在值空间内时,则对应离散化后的值为0。最终,经过以上处理,每条信令中的消息内容都可看作由一串枚举型信元与离散后的数值型信元组成的文本段落,都可利用自编码方法转换为长度固定的特征向量,从而完成消息内容的特征构造。(4)、利用自编码方法将消息内容作为由一串枚举型 信元与离散后的数值型信元组成的文本段落转换为长度固定的消息内容编码序列。
1206、判断待测信令流程是否异常。
若分类模型的分类结果为异常,则执行步骤1207,即直接将该异常待测流程送到有监督的异常定位模块中;若分类模型的分类结果为正常,则执行步骤1209,即将正常结果送到分析结果输出模块。
1207、信令分析装置构建待测信令流程的异常评估曲线。
若步骤1203检测到待测信令流程异常,信令分析装置可利用上述消息类型特征序列构建待测信令流程的异常评估曲线;若步骤1205检测到待测信令流程异常,信令分析装置可利用上述消息内容特征序列构建待测信令流程的异常评估曲线。
1208、信令分析装置根据待测信令流程的异常评估曲线,确定该待测流程中的异常信令区间。
针对上述得到的异常评估曲线,信令分析装置可先利用时间序列分类算法识别各曲线的类型,再根据表3中的异常区间定位方法,得到该待测流程中的异常信令区间。举例来说,图13为待测信令流程的异常评估曲线,其中,横轴的0坐标对应该待测信令流程的前3条信令,x坐标对应该待测信令流程中的前(x+3)条信令,虚线框为异常信令区间。
1209、信令分析装置输出异常检测结果以及异常信令区间的位置。
示例性的,针对信令分析结果为异常的各信令流程,分析结果输出模块分别按格式“异常,[异常信令区间的位置]”输出。例如,图13对应的输出为“异常,第7条至第12条信令异常”。针对信令分析结果为正常的各信令流程,分析结果输出模块输出“正常”。
基于数据驱动的本实施例通过利用各信令协议都支持的信令信息、各信令协议都适用的特征构建方法,使用序列分类模型对待测信令流程中的消息类型信息和消息内容信息进行了更全面的信令流程异常检测,并定位了异常信令的区间位置。
图14为本申请实施例提供的一种信令异常检测和定位方法流程图。图14为图2中的方法流程的进一步细化和完善,如图14所示,该方法包括:
1401、信令分析装置从通信网络的无线域中采集S1AP协议下的信令数据。
1402、信令分析装置解析采集的信令数据,并提取出待测信令流程。
可选的,针对采集到的信令数据,信令分析装置先利用S1IP信令解析工具解析出信令数据中每条信令的协议、接口、时间戳、流程标识、消息类型和消息内容,再根据协议、接口、时间戳和流程标识完成信令流程提取。示例性的,信令流程提取的步骤如下:信令分析装置可先将信令数据中对应的协议、接口以及流程标识均相同的信令分到相同组,再按照每组信令中各信令包括的时间戳的先后顺序对每组信令中的各信令进行先后排序。应理解,每组信令中各信令的协议、接口以及流程标识均相同,每组信令排序后的信令对应一个待测信令流程。在实际应用中,信令分析装置可分别对各待测信令流程进行异常检测和定位。
1403、信令分析装置基于待测信令流程中各信令的消息类型对待测信令流程进行异常检测。
示例性的,信令分析装置先在待测流程的第一条信令前填补w条占位空信令(消息类型和消息内容均由占位符<bos>代替),再依次对填补后的消息类型进行One-hot编码,从 而得到该待测流程对应的消息类型特征序列。随后,利用训练好的基于消息类型的NNLM异常检测模型(对应于信令异常检测模型A)对上述得到的消息类型特征序列以滑窗(窗长w)的形式进行逐条预测,得到该待测信令流程的异常检测结果。当待测信令流程中任意一条信令不在其预测范围之内时,确定该待测流程异常,即得到指示待测信令流程异常的异常检测结果;若待测信令流程中任意一条信令均在其预测范围之内时,则确定该待测信令流程正常,即得到指示该待测信令流程正常的异常检测结果。
1404、判断待测信令流程是否异常。
若异常检测结果指示待测信令流程异常,则执行步骤1407,即直接将该待测信令流程送到无监督的异常定位模块中;否则,执行步骤1405,即将该待测流程送到第二轮的细粒度异常检测中。
1405、信令分析装置基于待测信令流程中各信令的消息类型和关键信元对该待测信令流程进行异常检测。
可选的,信令分析装置依次将待测信令流程中每条信令的消息类型和关键信元的组合作为一个单词进行特征构建得到特征序列,以滑窗形式计算每条信令的预测结果,得到该流程的异常检测结果。示例性的,信令分析装置可先从每条信令的消息内容中提取名为cause-result的信元作为关键信元,再将该关键信元的信元值与其所属消息类型用“|”号进行拼接。若某类信令没有名为cause-result的信元,则可将该信令的消息类型直接作为拼接结果。随后,通过对待测流程中每条信令的拼接结果依次进行One-hot编码,可得到该待测信令流程添加了关键信元信息的特征序列。最后,利用训练好的基于消息类型和关键信元的NNLM异常检测模型(对应于信令异常检测模型B)对上述得到的特征序列以滑窗形式进行逐条预测。当待测信令流程中任意一条信令不在其预测范围之内时,确定该待测流程异常,即得到指示该待测信令流程异常的异常检测结果;若该待测信令流程中任意一条信令均在其预测范围之内时,则确定该待测信令流程正常,即得到指示该待测信令流程正常的异常检测结果。应理解,步骤1405与步骤1403类似,区别在于构造特征序列的方式不同。
1406、判断待测信令流程是否异常。
信令分析装置可根据步骤1405得到的异常检测结果判断待测信令流程是否异常。若待测信令流程异常,则执行步骤1407;若待测信令流程正常,执行步骤1408。
1407、信令分析装置确定待测信令流程中异常信令的位置。
可选的,信令分析装置从待测信令流程第一条信令开始,以滑窗形式进行逐条检测。当一条信令不在其预测范围之内时,即可认为该信令异常,并将该信令在该待测流程中的位置作为最终的异常定位结果输出至分析结果输出模块。一种可省略该步骤的方法是:在步骤1403和步骤1405的异常检测过程中,针对异常的待测信令流程,直接将第一条异常信令出现的位置作为最终的异常定位结果输出至分析结果输出模块。
1408、信令分析装置输出异常检测结果以及异常信令的位置。
可选的,针对信令分析结果为异常的各信令流程,分析结果输出模块分别按格式“异常,[异常信令的位置]”输出。例如,“异常,第7条信令异常”。针对信令分析结果为正常的各信令流程,分析结果输出模块输出“正常”。
针对通信网络控制面的各信令协议,以数据驱动的本实施例通过提取各协议都支持的信令信息、利用通用的特征构造方法,有效消除了各信令协议间的格式差异,避免了由专家规则总结导致的成本高、自更新能力差问题。信令分析过程中,本实施例利用序列模型处理了信令流程中各信令消息间的长依赖关系,通过对信令流程中的消息类型信息和关键信元信息进行特征编码、异常检测,可覆盖大多数消息类型之外的信令异常问题,有效避免了对异常关键信元分析不全面问题。与图12的实施例一相比,由于本实施例的模型训练只需采集正常情况下的信令数据,本实施例更易启动。
图15是本申请实施例提供的一种训练装置的硬件结构示意图。图15所示的卷积神经网络的训练装置1500(该装置1500具体可以是一种计算机设备)包括存储器1501、处理器1502、通信接口1503以及总线1504。其中,存储器1501、处理器1502、通信接口1503通过总线1504实现彼此之间的通信连接。
存储器1501可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1501可以存储程序以及训练数据,当存储器1501中存储的程序被处理器1502执行时,处理器1502用于执行本申请实施例的训练方法。
处理器1502可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),GPU或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的训练装置中的单元所需执行的功能,或者执行本申请方法实施例的训练方法。
通信接口1503使用例如但不限于收发器一类的收发装置,来实现装置1500与其他设备或通信网络之间的通信。例如,可以通过通信接口1503获取训练数据(例如本申请实施例上述的第一训练集)。
总线1504可包括在装置1500各个部件(例如,存储器1501、处理器1502、通信接口1503)之间传送信息的通路。
图16是本申请实施例提供的信令分析装置的硬件结构示意图。图16所示的信令分析装置1600(该装置1600具体可以是一种计算机设备)包括存储器1601、处理器1602、通信接口1603以及总线1604。其中,存储器1601、处理器1602、通信接口1603通过总线1604实现彼此之间的通信连接。
存储器1601可以是只读存储器,静态存储设备,动态存储设备或者随机存取存储器。存储器1601可以存储程序,当存储器1601中存储的程序被处理器1602执行时,处理器1602用于执行本申请实施例的信令分析方法的各个步骤。
处理器1602可以采用通用的中央处理器,微处理器,应用专用集成电路,图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的信令分析装置中的单元所需执行的功能,或者执行本申请方法实施例的图像分割方法。处理器可实现图1B中各模块的功能。
处理器1602还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的图像分割方法的各个步骤可以通过处理器1602中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1602还可以是通用处理器、数字信号处理器(Digital Signal  Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1601,处理器1602读取存储器1601中的信息,结合其硬件完成本申请实施例的信令分析装置中包括的单元所需执行的功能,或者执行本申请方法实施例的信令分析方法。
通信接口1603使用例如但不限于收发器一类的收发装置,来实现装置1600与其他设备或通信网络之间的通信。例如,可以通过通信接口1603获取信令数据。
总线1604可包括在装置1600各个部件(例如,存储器1601、处理器1602、通信接口1603)之间传送信息的通路。
应注意,尽管图15和图16所示的训练装置1500和信令分析装置1600仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,训练装置1500和信令分析装置1600还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,训练装置1500和信令分析装置1600还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,训练装置1500和信令分析装置16000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图15或图16中所示的全部器件。
本申请实施例还提供一种计算机可读存储介质,上述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行前述实施例所提供的方法。
本申请实施例还提供一种计算机可读存储介质,上述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行前述实施例所提供的训练方法。
本申请实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行前述实施例所提供方法。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (45)

  1. 一种信令分析方法,其特征在于,包括:
    获取待测信令流程,所述待测信令流程包括N条信令,所述N为大于1的整数;
    分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列;所述第一特征序列包括N个第一特征向量,所述N个第一特征向量与所述N条信令一一对应;
    将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果,所述第一异常检测结果指示所述待测信令流程正常或者异常。
  2. 根据权利要求1所述的方法,其特征在于,所述分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列之前,所述方法还包括:
    分别对所述N条信令中每一条信令包括的消息类型进行第二特征构造,得到第二特征序列;所述第二特征序列包括N个第二特征向量,所述N个第二特征向量与所述N条信令一一对应;
    将所述第二特征序列输入至第二信令异常检测模型进行异常检测处理,得到第二异常检测结果,所述第二异常检测结果指示所述待测信令流程正常或者异常;
    所述分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列包括:
    在所述第二异常检测结果指示所述待测信令流程正常的情况下,分别对所述N条信令中每一条信令进行所述第一特征构造,得到所述第一特征序列。
  3. 根据权利要求1或2所述的方法,其特征在于,所述分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列包括:
    按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列。
  4. 根据权利要求3所述的方法,其特征在于,所述按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列包括:
    将第一消息类型和目标信元的组合作为一个整体进行特征构造,得到第一向量;所述第一消息类型为第一信令包括的消息类型,所述目标信元为所述第一信令包括的信元,所述第一信令为所述N条信令中的任一条信令,所述第一向量包含于所述第一特征序列。
  5. 根据权利要求4所述的方法,其特征在于,所述目标信元包括指示所述第一信令的发送原因的信元。
  6. 根据权利要求3所述的方法,其特征在于,所述按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列包括:
    将第二信令中的M个信元作为自然语言处理NLP算法中包括一个或多个单词的文本 进行特征构造,得到第二向量;所述第二信令为所述N条信令中的任一条信令,所述第二向量包含于所述第一特征序列,所述M为大于1的整数。
  7. 根据权利要求2至6任一项所述的方法,其特征在于,所述分别对所述N条信令中每一条信令包括的消息类型进行第二特征构造,得到第二特征序列包括:
    按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第二特征构造,得到所述第二特征序列。
  8. 根据权利要求7所述的方法,其特征在于,所述按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第二特征构造,得到所述第二特征序列包括:
    将第二消息类型作为自然语言处理NLP算法中的一个单词进行特征构造,得到第三向量;所述第二消息类型为第三信令包括的消息类型,所述第三信令为所述N条信令中的任一条信令,所述第三向量包含于所述第二特征序列。
  9. 根据权利要求1至8任一项所述的方法,其特征在于,所述第一特征序列中的第F个第一特征向量与所述N条信令中的第F条信令相对应;所述将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果包括:
    在第F轮异常检测处理中,将第三特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第一集合;所述第一集合包括至少一个消息类型和信元的组合,所述第三特征序列中的特征向量依次为所述第一特征序列中的第(F-K)个第一特征向量至第(F-1)个第一特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;
    在所述N条信令中的第F条信令的消息类型和信元的组合未包含于所述第一集合的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程异常。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    在所述第F条信令的消息类型和信元的组合包含于所述第一集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;
    在所述第F条信令的消息类型和信元的组合包含于所述第一集合,且所述F等于所述N的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程正常。
  11. 根据权利要求1至8任一项所述的方法,其特征在于,所述第二特征序列中的第F个第二特征向量与所述N条信令中的第F条信令相对应;所述将所述第二特征序列输入至第二信令异常检测模型进行异常检测处理,得到第二异常检测结果包括:
    在第F轮异常检测处理中,将第四特征序列输入至所述第二信令异常检测模型进行异常检测处理,得到第二集合;所述第二集合包括至少一个消息类型,所述第四特征序列中的特征向量依次为所述第二特征序列中的第(F-K)个第二特征向量至第(F-1)个第二特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;
    在所述N条信令中的第F条信令的消息类型未包含于所述第二集合的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程异常。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    在所述第F条信令的消息类型包含于所述第二集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;
    在所述第F条信令的消息类型包含于所述第二集合,且所述F等于所述N的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程正常。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,所述将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果之后,所述方法还包括:
    在所述第一异常检测结果指示所述待测信令流程异常的情况下,确定所述待测信令流程中发生异常的位置。
  14. 根据权利要求13所述的方法,其特征在于,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;所述确定所述待测信令流程中发生异常的位置包括:
    在第H轮信令异常定位中,将第五特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第三集合;所述第三集合包括至少一个消息类型和信元的组合,所述第五特征序列中的特征向量依次为所述第一特征序列中的第(H-L)个第一特征向量至第(H-1)个第一特征向量;所述H为大于1的整数,所述L为大于1且小于所述H的整数;
    在所述N条信令中的第H条信令的消息类型和信元的组合未包含于所述第三集合的情况下,确定所述第H条信令发生异常。
  15. 根据权利要求13所述的方法,其特征在于,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;所述确定所述待测信令流程中发生异常的位置包括:
    获得所述N条信令对应的异常概率序列,所述异常概率序列中的第G个概率表示所述N个信令中的前(D+G)个信令中包括异常信令的概率,所述G和所述D均为大于0的整数;
    根据所述异常概率序列,确定所述待测信令流程中发生异常的位置。
  16. 根据权利要求15所述的方法,其特征在于,所述获得所述N条信令对应的异常概率序列包括:
    将第六特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到所述异常概率序列中的所述第G个概率;所述第六特征序列包括的特征向量依次为所述第一特征序列中的前(D+G)个第一特征向量。
  17. 根据权利要求15或16所述的方法,其特征在于,所述根据所述异常概率序列,确定所述待测信令流程中发生异常的位置包括:
    在所述异常概率序列中的所述第G个概率与第(G-1)个概率的差值大于概率阈值的情况下,确定第一信令区间的信令发生异常,所述第一信令区间包括所述N条信令中的第(G+D-1)至第N条信令,所述G为大于1的整数;
    在所述异常概率序列中的各概率均不小于其前面的概率的情况下,确定第二信令区间中的信令发生异常;所述第二信令区间包括所述N条信令中的第(P+D)至第N条信令,所述异常概率序列中第P个概率与第(P+1)个概率的差值不小于所述异常概率序列中任意两个相邻概率的差值,所述P为大于0的整数;
    在所述异常概率序列中的概率存在由第一值递增至第二值且保持所述第二值之前由第三值递减至所述第一值的情况下,确定第三信令区间中的信令发生异常;所述第一值小于第一阈值,所述第二值和所述第三值均大于第二阈值,所述第一阈值小于所述第二阈值,所述第三信令区间包括所述N条信令中的第(Q+D)至第N条信令,所述异常概率序列中的第Q个概率为所述异常概率曲线中最后一个上升段的起点的概率,所述Q为大于0的整数;
    在所述异常概率序列中的各概率均不小于所述概率阈值的情况下,确定第四信令区间中的信令发生异常,所述第四信令区间包括所述N个信令中的第D个信令至第N个信令。
  18. 根据权利要求1至17任一项所述的方法,其特征在于,所述获取待测信令流程之前,所述方法还包括:
    采集信令数据;所述信令数据包括所述N条信令;
    解析所述信令数据中的每条信令,得到每条信令对应的接口、时间戳、协议以及流程标识;
    将所述信令数据中对应的接口、协议以及流程标识均相同的信令分到相同组,得到至少一组信令;
    按照目标组信令中各信令包括的时间戳的先后顺序对所述目标组信令中的各信令进行先后排序,得到所述待测信令流程,所述目标组信令为所述至少一组信令中任一组信令。
  19. 一种训练方法,其特征在于,包括:
    分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一向量序列;所述第一向量序列中的特征向量与所述训练信令流程中的信令一一对应;
    将所述第一向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第一训练特征序列输入至第一训练模型进行无监督学习,得到第一信令异常检测模型;所述第一训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。
  20. 一种训练方法,其特征在于,包括:
    分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二向量序列;所述第二向量序列中的特征向量与所述训练信令流程中的信令一一对应;
    将所述第二向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第二训练特征序列输入至第二训练模型进行无监督学习,得到第二信令异常检测模型;所述第二训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。
  21. 一种训练方法,其特征在于,包括:
    分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一训练样本;所述第一训练样本中的特征向量与所述训练信令流程中的信令一一对应;
    将第一训练样本和第一标注信息输入至第三训练模型进行异常检测处理,得到第一异常检测处理结果;所述第一异常检测处理结果指示所述第一训练样本为正常信令流程或者异常信令流程;
    根据所述第一异常检测处理结果和第一标准结果,确定所述第一训练样本对应的损失;所述第一标准结果为所述第一标注信息指示的所述第一训练样本的真实结果;
    利用所述第一训练样本对应的损失,通过优化算法更新所述第三训练模型的参数,得到第一信令异常检测模型。
  22. 一种训练方法,其特征在于,包括:
    分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二训练样本;所述第二训练样本中的特征向量与所述训练信令流程中的信令一一对应;
    将第二训练样本和第二标注信息输入至第四训练模型进行异常检测处理,得到第二异常检测处理结果;所述第二异常检测处理结果指示所述第二训练样本为正常信令流程或者异常信令流程;
    根据所述第二异常检测处理结果和第二标准结果,确定所述第一训练样本对应的损失;所述第二标准结果为所述第二标注信息指示的所述第二训练样本的真实结果;
    利用所述第二训练样本对应的损失,通过优化算法更新所述第四训练模型的参数,得到第二信令异常检测模型。
  23. 一种信令分析装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:获取待测信令流程,所述待测信令流程包括N条信令,所述N为大于1的整数;
    分别对所述N条信令中每一条信令包括的消息类型和信元进行第一特征构造,得到第一特征序列;所述第一特征序列包括N个第一特征向量,所述N个第一特征向量与所述N条信令一一对应;
    将所述第一特征序列输入至第一信令异常检测模型进行异常检测处理,输出第一异常检测结果,所述第一异常检测结果指示所述待测信令流程正常或者异常。
  24. 根据权利要求23所述的装置,其特征在于,所述处理器,还用于分别对所述N条信令中每一条信令包括的消息类型进行第二特征构造,得到第二特征序列;所述第二特征序列包括N个第二特征向量,所述N个第二特征向量与所述N条信令一一对应;
    将所述第二特征序列输入至第二信令异常检测模型进行异常检测处理,得到第二异常检测结果;所述第二异常检测结果指示所述待测信令流程正常或者异常;
    所述处理器,具体用于在所述第二异常检测结果指示所述待测信令流程正常的情况下,分别对所述N条信令中每一条信令进行所述第一特征构造,得到所述第一特征序列。
  25. 根据权利要求23或24所述的装置,其特征在于,
    所述处理器,具体用于按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第一特征构造,得到所述第一特征序列。
  26. 根据权利要求25所述的装置,其特征在于,
    所述处理器,具体用于将第一消息类型和目标信元的组合作为一个整体进行特征构造,得到第一向量;所述第一消息类型为第一信令包括的消息类型,所述目标信元为所述第一信令包括的信元,所述第一信令为所述N条信令中的任一条信令,所述第一向量包含于所述第一特征序列。
  27. 根据权利要求26所述的装置,其特征在于,所述目标信元包括指示所述第一信令的发送原因的信元。
  28. 根据权利要求25所述的装置,其特征在于,
    所述处理器,具体用于将第二信令中的M个信元作为自然语言处理NLP算法中包括一个或多个单词的文本进行特征构造,得到第二向量;所述第二信令为所述N条信令中的任一条信令,所述第二向量包含于所述第一特征序列,所述M为大于1的整数。
  29. 根据权利要求24至28任一项所述的装置,其特征在于,
    所述处理器,具体用于按照所述N条信令的时间戳先后顺序依次对所述N条信令进行所述第二特征构造,得到所述第二特征序列。
  30. 根据权利要求29所述的装置,其特征在于,
    所述处理器,具体用于将第二消息类型作为自然语言处理NLP算法中的一个单词进行特征构造,得到第三向量;所述第二消息类型为第三信令包括的消息类型,所述第三信令为所述N条信令中的任一条信令,所述第三向量包含于所述第二特征序列。
  31. 根据权利要求23至30任一项所述的装置,其特征在于,
    所述处理器,具体用于在第F轮异常检测处理中,将第三特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第一集合;所述第一集合包括至少一个消息类型和信元的组合,所述第三特征序列中的特征向量依次为所述第一特征序列中的第(F-K)个第一特征向量至第(F-1)个第一特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;
    在所述N条信令中的第F条信令的消息类型和信元的组合未包含于所述第一集合的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程异常。
  32. 根据权利要求31所述的装置,其特征在于,
    所述处理器,还用于在所述第F条信令的消息类型和信元的组合包含于所述第一消集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;
    在所述第F条信令的消息类型和信元的组合包含于所述第一集合,且所述F等于所述N的情况下,输出所述第一异常检测结果;所述第一异常检测结果指示所述待测信令流程正常。
  33. 根据权利要求23至30任一项所述的装置,其特征在于,所述第二特征序列中的第F个第二特征向量与所述N条信令中的第F条信令相对应;
    所述处理器,具体用于在第F轮异常检测处理中,将第四特征序列输入至所述第二信令异常检测模型进行异常检测处理,得到第二集合;所述第二集合包括至少一个消息类型,所述第四特征序列中的特征向量依次为所述第二特征序列中的第(F-K)个第二特征向量至第(F-1)个第二特征向量;所述F为大于1的整数,所述K为大于1且小于所述F的整数;
    在所述N条信令中的第F条信令的消息类型未包含于所述第二集合的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程异常。
  34. 根据权利要求33所述的装置,其特征在于,
    所述处理器,还用于在所述第F条信令的消息类型包含于所述第二集合,且所述F小于所述N的情况下,执行第(F+1)轮异常检测处理;
    在所述第F条信令的消息类型包含于所述第二集合,且所述F等于所述N的情况下,得到所述第二异常检测结果;所述第二异常检测结果指示所述待测信令流程正常。
  35. 根据权利要求23至34任一项所述的装置,其特征在于,
    所述处理器,还用于在所述第一异常检测结果指示所述待测信令流程异常的情况下,确定所述待测信令流程中发生异常的位置。
  36. 根据权利要求35所述的装置,其特征在于,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;
    所述处理器,具体用于在第H轮信令异常定位中,将第五特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到第三集合;所述第三集合包括至少一个消息类型和信元的组合,所述第五特征序列中的特征向量依次为所述第一特征序列中的第(H-L)个第一特征向量至第(H-1)个第一特征向量;所述H为大于1的整数,所述L为大于1且小于所述H的整数;
    在所述N条信令中的第H条信令的消息类型和信元的组合未包含于所述第三集合的情 况下,确定所述第H条信令发生异常。
  37. 根据权利要求35所述的装置,其特征在于,所述第一特征序列中的第H个第一特征向量与所述N条信令中的第H条信令相对应;
    所述处理器,具体用于获得所述N条信令对应的异常概率序列,所述异常概率序列中的第G个概率表示所述N个信令中的前(D+G)个信令中包括异常信令的概率,所述G和所述D均为大于0的整数;
    根据所述异常概率序列,确定所述待测信令流程中发生异常的位置。
  38. 根据权利要求37所述的装置,其特征在于,
    所述处理器,具体用于将第六特征序列输入至所述第一信令异常检测模型进行异常检测处理,得到所述异常概率序列中的所述第G个概率;所述第六特征序列包括的特征向量依次为所述第一特征序列中的前(D+G)个第一特征向量。
  39. 根据权利要求37或38所述的装置,其特征在于,
    所述处理器,具体用于在所述异常概率序列中的所述第G个概率与第(G-1)个概率的差值大于概率阈值的情况下,确定第一信令区间的信令发生异常,所述第一信令区间包括所述N条信令中的第(G+D-1)至第N条信令,所述G为大于1的整数;
    在所述异常概率序列中的各概率均不小于其前面的概率的情况下,确定第二信令区间中的信令发生异常;所述第二信令区间包括所述N条信令中的第(P+D)至第N条信令,所述异常概率序列中第P个概率与第(P+1)个概率的差值不小于所述异常概率序列中任意两个相邻概率的差值,所述P为大于0的整数;
    在所述异常概率序列中的概率存在由第一值递增至第二值且保持所述第二值之前由第三值递减至所述第一值的情况下,确定第三信令区间中的信令发生异常;所述第一值小于第一阈值,所述第二值和所述第三值均大于第二阈值,所述第一阈值小于所述第二阈值,所述第三信令区间包括所述N条信令中的第(Q+D)至第N条信令,所述异常概率序列中的第Q个概率为所述异常概率曲线中最后一个上升段的起点的概率,所述Q为大于0的整数;
    在所述异常概率序列中的各概率均不小于所述概率阈值的情况下,确定第四信令区间中的信令发生异常,所述第四信令区间包括所述N个信令中的第D个信令至第N个信令。
  40. 根据权利要求23至39任一项所述的装置,其特征在于,
    所述处理器,还用于采集信令数据;所述信令数据包括所述N条信令;
    解析所述信令数据中的每条信令,得到每条信令对应的接口、时间戳、协议以及流程标识;
    将所述信令数据中对应的接口、协议以及流程标识均相同的信令分到相同组,得到至少一组信令;
    按照目标组信令中各信令包括的时间戳的先后顺序对所述目标组信令中的各信令进行 先后排序,得到所述待测信令流程,所述目标组信令为所述至少一组信令中任一组信令。
  41. 一种训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一向量序列;所述第一向量序列中的特征向量与所述训练信令流程中的信令一一对应;
    将所述第一向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第一训练特征序列输入至第一训练模型进行无监督学习,得到第一信令异常检测模型;所述第一训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。
  42. 一种训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二向量序列;所述第二向量序列中的特征向量与所述训练信令流程中的信令一一对应;
    将所述第二向量序列中的第(R)个特征向量至第(R+W)个特征向量作为第二训练特征序列输入至第二训练模型进行无监督学习,得到第二信令异常检测模型;所述第二训练模型为W元语言模型,所述W为大于1的整数,所述R和所述S均为大于0的整数。
  43. 一种训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型和信元进行特征构造,得到第一训练样本;所述第一训练样本中的特征向量与所述训练信令流程中的信令一一对应;
    将第一训练样本和第一标注信息输入至第三训练模型进行异常检测处理,得到第一异常检测处理结果;所述第一异常检测处理结果指示所述第一训练样本为正常信令流程或者异常信令流程;
    根据所述第一异常检测处理结果和第一标准结果,确定所述第一训练样本对应的损失;所述第一标准结果为所述第一标注信息指示的所述第一训练样本的真实结果;
    利用所述第一训练样本对应的损失,通过优化算法更新所述第三训练模型的参数,得到第一信令异常检测模型。
  44. 一种训练装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令来执行如下操作:分别对训练信令流程中每一条信令包括的消息类型进行特征构造,得到第二训练样本;所述第二训练样本中的特征向量与所述训练信令流程中的信令一一对应;
    将第二训练样本和第二标注信息输入至第四训练模型进行异常检测处理,得到第二异常检测处理结果;所述第二异常检测处理结果指示所述第二训练样本为正常信令流程或者异常信令流程;
    根据所述第二异常检测处理结果和第二标准结果,确定所述第一训练样本对应的损失; 所述第二标准结果为所述第二标注信息指示的所述第二训练样本的真实结果;
    利用所述第二训练样本对应的损失,通过优化算法更新所述第四训练模型的参数,得到第二信令异常检测模型。
  45. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被移动设备的处理器执行时,使所述处理器执行权利要求1至22任意一项所述的方法。
PCT/CN2020/102680 2019-11-25 2020-07-17 信令分析方法和相关装置 WO2021103589A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/752,848 US20220286263A1 (en) 2019-11-25 2022-05-24 Signaling analysis method and related apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911168167.8A CN112838943B (zh) 2019-11-25 2019-11-25 信令分析方法和相关装置
CN201911168167.8 2019-11-25

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/752,848 Continuation US20220286263A1 (en) 2019-11-25 2022-05-24 Signaling analysis method and related apparatus

Publications (1)

Publication Number Publication Date
WO2021103589A1 true WO2021103589A1 (zh) 2021-06-03

Family

ID=75922408

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/102680 WO2021103589A1 (zh) 2019-11-25 2020-07-17 信令分析方法和相关装置

Country Status (3)

Country Link
US (1) US20220286263A1 (zh)
CN (1) CN112838943B (zh)
WO (1) WO2021103589A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363947A (zh) * 2021-12-31 2022-04-15 紫光展锐(重庆)科技有限公司 日志分析方法及相关装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014180400A1 (zh) * 2013-11-25 2014-11-13 中兴通讯股份有限公司 问题定位处理方法及装置
CN106685674A (zh) * 2015-11-05 2017-05-17 华为技术有限公司 网络事件预测以及建立网络事件预测模型的方法和装置
CN109995566A (zh) * 2017-12-31 2019-07-09 中国移动通信集团辽宁有限公司 网络故障定位方法、装置、设备及介质
CN109993185A (zh) * 2017-12-31 2019-07-09 中国移动通信集团贵州有限公司 无线信令分析方法、装置、计算设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180006900A1 (en) * 2016-06-29 2018-01-04 Microsoft Technology Licensing, Llc Predictive anomaly detection in communication systems
CN109902832B (zh) * 2018-11-28 2023-11-17 华为技术有限公司 机器学习模型的训练方法、异常预测方法及相关装置
CN110276409A (zh) * 2019-06-27 2019-09-24 腾讯科技(深圳)有限公司 一种时间序列异常检测方法、装置、服务器和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014180400A1 (zh) * 2013-11-25 2014-11-13 中兴通讯股份有限公司 问题定位处理方法及装置
CN106685674A (zh) * 2015-11-05 2017-05-17 华为技术有限公司 网络事件预测以及建立网络事件预测模型的方法和装置
CN109995566A (zh) * 2017-12-31 2019-07-09 中国移动通信集团辽宁有限公司 网络故障定位方法、装置、设备及介质
CN109993185A (zh) * 2017-12-31 2019-07-09 中国移动通信集团贵州有限公司 无线信令分析方法、装置、计算设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363947A (zh) * 2021-12-31 2022-04-15 紫光展锐(重庆)科技有限公司 日志分析方法及相关装置
CN114363947B (zh) * 2021-12-31 2023-09-22 紫光展锐(重庆)科技有限公司 日志分析方法及相关装置

Also Published As

Publication number Publication date
CN112838943B (zh) 2022-06-10
US20220286263A1 (en) 2022-09-08
CN112838943A (zh) 2021-05-25

Similar Documents

Publication Publication Date Title
CN111475370A (zh) 基于数据中心的运维监控方法、装置、设备及存储介质
CN108965340B (zh) 一种工业控制系统入侵检测方法及系统
CN109768952B (zh) 一种基于可信模型的工控网络异常行为检测方法
CN114993669B (zh) 多传感器信息融合的传动系统故障诊断方法及系统
CN113762377B (zh) 网络流量识别方法、装置、设备及存储介质
CN111431819A (zh) 一种基于序列化的协议流特征的网络流量分类方法和装置
CN113554094A (zh) 网络异常检测方法、装置、电子设备及存储介质
CN111881164B (zh) 基于边缘计算和路径分析的数据处理方法及大数据云平台
WO2021103589A1 (zh) 信令分析方法和相关装置
CN114915575B (zh) 一种基于人工智能的网络流量检测装置
CN114151293B (zh) 风机变桨系统的故障预警方法、系统、设备及存储介质
CN114610613A (zh) 一种面向在线实时的微服务调用链异常检测方法
CN111800289A (zh) 通信网络故障分析方法和装置
CN106850339B (zh) 一种总线信号协议解码方法
CN111478861B (zh) 流量识别方法、装置、电子设备、及存储介质
CN113688953B (zh) 基于多层gan网络的工控信号分类方法、装置和介质
CN114422515B (zh) 一种适配电力行业的边缘计算架构设计方法及系统
CN113905405B (zh) 一种电力无线接入专网异常流量检测方法
CN114064486B (zh) 一种接口自动化测试方法
CN115622787A (zh) 异常流量检测方法、装置、电子设备及存储介质
CN115587007A (zh) 基于RoBERTa的网络日志安全检测方法及系统
CN115328753A (zh) 一种故障预测方法及装置、电子设备、存储介质
CN115174435A (zh) 电力通信传输网性能综合评估方法
CN111522750B (zh) 一种功能测试问题的处理方法及系统
CN113572770A (zh) 检测域名生成算法生成的域名的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20894036

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20894036

Country of ref document: EP

Kind code of ref document: A1