CN112565133B

CN112565133B - Complex format analysis method based on high-dimensional information feature extraction

Info

Publication number: CN112565133B
Application number: CN202110213661.2A
Authority: CN
Inventors: 刘博�; 忻向军; 任建新; 毛雅亚; 朱旭; 王瑞春; 沈磊; 吴泳锋; 孙婷婷; 赵立龙
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-05-28
Anticipated expiration: 2041-02-26
Also published as: CN112565133A

Abstract

The invention discloses a complex format analysis method based on high-dimensional information feature extraction, which belongs to the technical field of optical communication and is used for mapping optical signals transmitted in a polarization multiplexing coherent optical system in a Stokes space to obtain high-dimensional signal classification features. And then, performing dimensionality reduction and classification on the high-dimensional signal features by using a linear discriminant analysis method in machine learning, and obtaining a classification prediction result of the information by using a maximum likelihood function estimation mode. The method realizes the identification of the signal modulation format in the polarization multiplexing coherent optical communication system, and has high identification efficiency, high processing speed and high result accuracy. According to the modulation format identification scheme provided by the invention, the influence of influence factors such as chromatic dispersion and polarization-dependent loss on the transmission signal is reduced as much as possible in the extraction stage of the signal characteristics, and the algorithm complexity and the uncertainty of the signal quality are reduced. And simultaneously processing the modulation format identification of the multiple signals by using linear discriminant analysis.

Description

Complex format analysis method based on high-dimensional information feature extraction

Technical Field

The invention belongs to the technical field of optical communication, and particularly relates to a complex format analysis method based on high-dimensional information feature extraction.

Background

With the continuous improvement of transmission capacity and transmission efficiency of optical fiber communication systems, optical fiber communication networks are developing towards more complicated and dynamic trends. In order to meet the increasing information exchange requirements of people, more and more communication signal modulation formats appear in the market, and with the wide application of the polarization multiplexing technology, the transmission capacity, the transmission distance and the transmission rate of a coherent optical communication system are all greatly improved. The polarization multiplexing technology mainly utilizes the polarization characteristics of optical signals, and two independent and orthogonal polarization states are respectively used as two channels to simultaneously transmit the optical signals, so that the transmission capacity and the transmission rate of a system are improved in multiples.

The future optical fiber communication system is a mixed form of a plurality of transmission rates and a plurality of modulation formats, and parameters such as signal rate, modulation format, wavelength and the like can be adaptively adjusted according to the dynamic channel condition, system resources and user service requirements, so that the maximum utilization of resources is realized. Therefore, monitoring and identification of system parameters is one of the core technologies of future intelligent optical networks.

However, in the case of no known signal information, the receiver is a difficult problem to demodulate the signal, and the modulation format recognition technology knows the corresponding signal modulation format in advance, so as to demodulate the corresponding original transmission signal accurately. For a coherent receiver, if a corresponding modulation format can be identified in advance, reference and assistance can be provided for relevant operations such as polarization demultiplexing, optical signal-to-noise ratio monitoring, polarization correlation loss compensation and the like by selecting a more appropriate algorithm subsequently.

The current modulation format recognition technology generally selects a constellation diagram of a transmission signal as a classification characteristic, can only realize a modulation format recognition mode under a high signal-to-noise ratio generally, and because the signal is influenced by various damages in an optical fiber, the signal of a receiving end loses key characteristic information to distort the constellation diagram, so that the final modulation format is wrongly recognized.

Meanwhile, although the constellation diagrams of simple modulation formats (such as BPSK, QPSK, 8PSK, and the like) have obvious classification conditions, the high-order quadrature modulation format (such as 16 QAM) of the signal has the problems that the constellation points are disturbed too much by the influence factors, the signal classification is not obvious, the characteristics are not easily distinguished, and the like, so that the system judges the modulation format of the signal by mistake, and the identification rate of the modulation format is reduced.

In a polarization multiplexing coherent optical communication system, two independent and mutually orthogonal polarized optical signals are easily affected by factors such as birefringence when transmitted in an optical fiber, the polarization states of the signals randomly rotate and form crosstalk between the signals, so that the signals in a coherent receiver are subjected to aliasing in different degrees, and the quality of the signals is greatly affected by phase shift, polarization correlation loss and the like.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to provide a complex format analysis method based on high-dimensional information feature extraction, which analyzes the complex format of a signal in a polarization multiplexing system by using Stokes mapping and linear discriminant analysis, tolerates dispersion and polarization related loss to a certain extent based on a Stokes domain, realizes high-speed, high-efficiency and high-precision identification of a transmission signal modulation format, and performs equalization, demodulation and other processing on the transmission signal by using a better algorithm.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:

a complex format analysis method based on high-dimensional information feature extraction comprises the following steps:

s1: training characteristic learning, namely obtaining probability density functions of various modulation formats through training and using the probability density functions as the basis of subsequent discrimination;

s2: signal feature extraction, namely performing preprocessing such as filtering, polarization imbalance compensation and the like on a signal, and extracting corresponding high-dimensional parameters as classification features to provide a basis for subsequent discrimination;

s3: format analysis, namely selecting a result with the maximum probability from the set of all modulation types by adopting a maximum likelihood function estimation mode, and judging the modulation format of the unknown signal;

further, the step S1 specifically includes the following steps:

1.1) firstly, taking the polarization signal information of various modulation format types as sample data;

1.2) through a Stokes formula, the signals in polarization multiplexing can reflect the relevant information of modulation characteristics and are mapped into a four-dimensional Stokes vector, and the amplitude and the phase difference of two paths of signals are only involved in the mapping process, so that certain phase frequency offset and polarization related loss are avoided;

1.3) taking the mapped four-dimensional Stokes vector as the classification characteristic of the signal, and projecting the four-dimensional signal characteristic to an optimal classification plane by using a linear discriminant analysis method; after the high-dimensional features are projected, calculating the mean value and the variance of the data projected by various modulation formats by using a maximum likelihood estimation mode, and further obtaining a probability density function corresponding to each modulation format;

further, in the step 1.3), solving the modulation format of the unknown signal by using the probability density function specifically includes the following steps:

1.31) when a signal with unknown modulation format enters a modulation format identification module, carrying out pretreatment of filtering and dispersion compensation;

1.32) performing a process of one liter dimension in the stokes space, projecting the obtained four-dimensional signal features towards a vector space;

1.33) then respectively bringing the projected signal characteristics into the probability density function of each modulation format class, and calculating the probability that the unknown signal belongs to the class; the category corresponding to the maximum probability is the prediction result;

further, the step S2 specifically includes the following steps:

in a polarization multiplexing coherent optical communication system, a coherent receiver at a receiving end firstly utilizes a polarization beam splitter to separate polarized light signals transmitted in an optical fiber into two paths, and simultaneously mixes the two paths of polarized light signals with two paths of polarized light generated by local oscillation in two 90-degree optical mixers to respectively obtain the two paths of polarized light signalsI _x ，Q _x ，I _y ，Q _yFour paths of signals are converted by digital-to-analog conversion, and the polarization signals of the X path are obtained by adding an imaginary part ie _xAnd Y-path polarization signale _y。

The obtained complex signals representing the polarization states of the X path and the Y pathe _xAnde _ymapping is carried out through the mapping rule of the upper graph, and the upper graph is converted into a four-dimensional Stokes vector

Whereine _x ^*Ande _y ^*respectively represente _xAnde _yconjugation of (1). Wherein

Which represents the total power of the two signals,S ₁representing the energy difference of the two signals,S ₂andS ₃respectively represents two phase differences of two paths of signals,a _xanda _yrespectively representing the amplitudes of the two paths of polarization signals, and phi represents the phase difference of the two paths of polarization signals.

Furthermore, the mapping rule is operated on the basis of relative cross polarization signal power and phase difference, in the mapping process, the amplitude and the relative phase of the signal are kept unchanged, phase noise and frequency offset disappear, and the high-dimensional Stokes vector after the information passes through the mapping is well used as the classification characteristic of the signal to be processed, so that the signal characteristic is provided for subsequent linear discriminant analysis.

Further, the step S3 specifically includes the following steps:

carrying out dimension reduction and classification processing on the information classification characteristics of the transmission signals by using linear discriminant analysis; performing dimensionality reduction and classification on the high-dimensionality signal features by a linear discriminant analysis method;

the basic idea of linear identification analysis is to project a high-dimensional pattern sample to an optimal identified vector space so as to achieve the effects of extracting signal feature classification information and compressing feature space dimensions, and after projection, the pattern sample is ensured to have the maximum inter-class distance and the minimum intra-class distance in a new subspace;

linear discriminant analysis projects high-dimensional vector data with labels to a classification space with lower dimensionality; the various original data after projection can be distinguished according to different categories, and according to the property of the generalized Rayleigh quotient, the following data are obtained:

wherein A, B are Hermitan matrix of n x n, Z is an n-dimensional vector, and

，λ _minandλ _maxare respectively a matrix

The minimum value and the maximum value of the characteristic value of (a); the high-dimensional data is subjected to dimensionality reduction and classification by utilizing the special property of the generalized Rayleigh quotient.

Further, the dimensionality reduction and classification of the high-dimensional data specifically includes the following steps:

d = { (z) for one multi-dimensional dataset₁，v₁)， (z₂，v₂)... (z_m，v_m) And f, wherein the former item of each group of data is an n-dimensional vector, and the latter item is a data category.

Definition ofN _j （j=1，2，...k）Is as followsjThe number of the class samples is set as,Z _j （j=1，2，...k）is as followsjSet of class samples, then:

is as followsjThe mean vector of the class samples is then calculated,

is the covariance matrix of the j-th class sample.

Assuming that the dimension of the projected low-dimensional space is d, the corresponding basis vector is（W ₁ ，W ₂ ，...W _d ）The basis vectors form a matrix W of n x d. Then, for any one high-dimensional sample dataZ _iIts projection in a reduced-dimension space isW ^T Z _i. In order to make the distance between the different classes of data as large as possible, it is desirable to maximize

，μAll sample mean vectors. Meanwhile, it is also required that the projection points of the same kind of data are as close as possible, that is, the covariance of the projection points of the same kind of sample data is as small as possible, that is, the covariance is minimized

。

Once the data that needs to be maximized or minimized is defined into two new matrices:

，

。

whereinS _bIs an inter-class divergence matrix, and is,S _Wfor intra-class divergence matrices, the optimization objective is set as follows:

therein II_diag AIs the product of the main diagonal elements of a.

The optimization process of J (W) is converted into:

wherein d is a lower dimensionThe dimensions of the space are such that,W ^Tis a transpose of the matrix W,W _ifor the ith basis vector, the number of vectors,W _i ^Tis the transpose of the ith basis vector.

Setting A as the interspecies divergence matrix of the dataS _bSetting B as the in-class divergence matrix of the dataS _WThat is, the optimization target is converted into a similar form of a generalized Rayleigh quotient, the maximum value of the generalized Rayleigh quotient is found by utilizing the characteristic values and the characteristic vectors of the intra-class divergence matrix and the inter-class divergence matrix, and then the variance and the mean value after the projection of the different classes of sample data are obtained, and further the respective probability density functions are calculated;

the probability density function obtained by linear discriminant analysis is used as the basis for complex format analysis; and respectively bringing the input of the unknown signal into probability density functions of various modulation types, finding a maximum value from all the calculation results as output, wherein the modulation format corresponding to the probability density function is the modulation format of the unknown signal.

Further, the system model of the modulation format dynamically transmits the polarization signals of various modulation formats at the transmitting end of the polarization multiplexing coherent optical transmission system according to the requirements of users and the conditions of channels, and reasonably configures the resources of the communication system; dividing the continuous light wave into two paths by a polarization beam splitter at a transmitting end, and loading information on an I path and a Q path of the two paths of polarized light respectively by an IQ modulator; after modulation, two paths of mutually independent and orthogonal polarization signals are transmitted through a few-mode optical fiber, are received and converted into four paths of signals by a coherent receiver at a receiving end, and are converted into digital signals through a digital-to-analog converter to enter a DSP (digital signal processor) for subsequent calculation of the signals; before identifying the modulation format of the signal, a series of simple processing which is irrelevant to the modulation format is carried out, such as filtering, resampling, IQ imbalance compensation, dispersion compensation, polarization demultiplexing and the like; then, entering a complex format analysis stage to realize the modulation format identification of the signal; firstly, preprocessing a signal by power normalization, and then mapping a liter dimension by a Stokes formula to obtain a high-dimensional Stokes vector closely related to a signal modulation format; calculating the probability of various categories by using a maximum likelihood estimation function obtained by linear discriminant analysis, and outputting the maximum value as a result of modulation format identification; after the modulation format of the signal is known, different algorithms are selected to perform a series of signal processing such as adaptive equalization, frequency offset recovery, carrier phase recovery, demodulation and the like on the signal according to different modulation formats.

Has the advantages that: compared with the prior art, the complex format analysis method based on the high-dimensional information feature extraction maps the light signals transmitted in the polarization multiplexing coherent light system in the Stokes space to obtain the high-dimensional signal classification features. And then, performing dimensionality reduction and classification on the high-dimensional signal features by using a linear discriminant analysis method in machine learning, and obtaining a classification prediction result of the information by using a maximum likelihood function estimation mode. The method realizes the identification of the signal modulation format in the polarization multiplexing coherent optical communication system, and has high identification efficiency, high processing speed and high result accuracy. According to the modulation format identification scheme provided by the invention, loss factors such as phase noise, frequency offset and the like are eliminated through Stokes space analysis, the influence of influence factors such as chromatic dispersion, polarization-dependent loss and the like on a transmission signal is reduced as much as possible in the extraction stage of signal characteristics, and the algorithm complexity and the uncertainty of signal quality are reduced. And simultaneously processing the modulation format identification of the multiple signals by using linear discriminant analysis. In the process of using linear discriminant analysis to reduce the dimension of the high-dimensional signal feature, the high-dimensional information of the signal in the Stokes space is used as the classification feature of the modulation format, and the prior knowledge experience of different modulation format categories can be used, so that the high-precision identification of the modulation format is realized. Compared with the traditional modulation format identification and classification mode, the method has good classification effect, and further promotes the identification accuracy.

Drawings

FIG. 1 is a flow chart of a complex format parsing method based on high-dimensional information feature extraction;

FIG. 2 is a schematic diagram of a coherent receiver;

fig. 3 is a clustering distribution diagram and a constellation diagram of a BPSK modulation format in a stokes space;

fig. 4 is a clustering distribution diagram and a constellation diagram of a QPSK modulation format in a stokes space;

FIG. 5 is a clustering distribution diagram and a constellation diagram of an 8PSK modulation format in a Stokes space;

FIG. 6 is a clustering distribution diagram and a constellation diagram of an 8QAM modulation format in a Stokes space;

FIG. 7 is a clustering distribution diagram and a constellation diagram of a 12QAM modulation format in a Stokes space;

FIG. 8 is a clustering distribution diagram and a constellation diagram of a 16QAM modulation format in a Stokes space

FIG. 9 is a flow chart of linear discriminant analysis computation;

FIG. 10 is a diagram of a complex format parsing system model based on high-dimensional information feature extraction.

Detailed Description

The present invention will be further described with reference to the following embodiments.

The flow of the complex format analysis scheme based on high-dimensional information extraction is shown in fig. 1 and is divided into three parts, namely feature learning training, signal feature extraction and complex format analysis. In the stage of signal feature extraction, preprocessing such as filtering, polarization imbalance compensation and the like is firstly carried out on the signals, and then corresponding high-dimensional parameters are extracted to serve as classification features, so that basis is provided for subsequent discrimination. And selecting a result with the maximum probability from the set of all modulation types by adopting a maximum likelihood function estimation mode at the format analysis stage of the signal, and judging the modulation format of the unknown signal. Therefore, before these two stages, the probability density functions of various modulation formats need to be obtained through training, and are used as the basis for subsequent discrimination.

In the learning training stage, the polarized signal information of various modulation format categories is taken as sample data. Through a Stokes formula, the signals in polarization multiplexing can reflect the relevant information of modulation characteristics and are mapped into a four-dimensional Stokes vector, and the amplitude and the phase difference of two paths of signals are only involved in the mapping process, so that certain phase frequency offset and polarization related loss are avoided. Then, the mapped four-dimensional Stokes vector is taken as the classification feature of the signal, and the four-dimensional signal feature is projected to an optimal classification plane by using a linear discriminant analysis method. After the high-dimensional features are projected, calculating the mean value and the variance of the data after projection of various modulation formats by using a maximum likelihood estimation mode, and further obtaining a probability density function corresponding to each modulation format.

The probability density function is then used to solve for the modulation format of the unknown signal. When a signal with an unknown modulation format enters a modulation format identification module, a series of simple preprocessing such as filtering, dispersion compensation and the like is firstly carried out, then a one-liter-dimensional process is executed in a Stokes space, the obtained four-dimensional signal characteristics are projected towards a vector space, finally the projected signal characteristics are respectively introduced into probability density functions of various modulation format categories, and the probability that the unknown signal belongs to the category is calculated. Wherein the category corresponding to the maximum probability is the prediction result.

Signal feature extraction submodule details

As shown in fig. 2, in a polarization-multiplexed coherent optical communication system, a coherent receiver at a receiving end first uses a polarization splitter to split a polarized light signal transmitted in an optical fiber into two paths, and simultaneously mixes the two paths of polarized light with two paths of polarized light generated by local oscillation in two 90-degree optical mixers to obtain the two paths of polarized light respectivelyI _x ，Q _x ，I _y ，Q _yFour paths of signals are converted by digital-to-analog conversion, and the polarization signals of the X path are obtained by adding an imaginary part ie _xAnd Y-path polarization signale _y。

Which represents the total power of the two signals,S ₁representing the energy difference of the two signals,S ₂andS ₃respectively represents two phase differences of two paths of signals,a _xanda _yrespectively representing the amplitudes of the two paths of polarization signals, and phi represents the phase difference of the two paths of polarization signals. The mapping rule operates substantially on the basis of relative cross-polarization signal power and phase difference. Therefore, the transformed signal is independent of polarization mixing, carrier frequency offset and phase offset, and well reflects the signal characteristics. The four-dimensional Stokes vector is visualized in the Pongarley sphere after power normalization, and the feasibility of the high-dimensional information as the classification characteristic of the signal is verified. The lower diagram lists the different distribution clustering of the signal constellation points of several classical modulation formats in the poincare sphere.

As shown in fig. 3 (a) - (b), the BPSK modulation format has two constellation points on the constellation diagram, and after mapping, there are two cluster distributions on the stokes space.

As shown in fig. 4 (a) - (b), the QPSK modulation format has 4 constellation points on the constellation diagram, and after mapping, there are 4 cluster distributions on the stokes space.

As shown in fig. 5 (a) - (b), the 8PSK modulation format has 8 constellation points on the constellation diagram, and after mapping, 8 cluster distributions on the stokes space.

As shown in fig. 6 (a) - (b), the 8QAM modulation format has 8 constellation points on the constellation diagram, and after mapping, 16 cluster distributions on the stokes space.

As shown in fig. 7 (a) - (b), the 12QAM modulation format has 12 constellation points on the constellation diagram, and after mapping, 32 cluster distributions on the stokes space.

As shown in fig. 8 (a) - (b), the 16QAM modulation format has 16 constellation points on the constellation diagram, and after mapping, 60 clusters are distributed on the stokes space.

During the mapping process, the amplitude and the relative phase of the signal are kept unchanged, the phase noise and the frequency offset disappear, and the visualized three-dimensional constellation diagram of the signal in the Stokes space is independent of the phase noise and the frequency offset. Therefore, the information is well processed as the classification characteristic of the signal through the mapped high-dimensional Stokes vector, and good signal characteristics are provided for the subsequent linear discriminant analysis.

Complex format parsing submodule details

In the training stage, after the polarization signal is mapped into a high-dimensional Stokes vector, the invention uses a mode identification method in machine learning, namely Linear Discriminant Analysis (LDA), to carry out dimension reduction and classification processing on the information classification characteristics of the transmission signal. And performing dimensionality reduction and classification on the high-dimensionality signal features by a linear discriminant analysis method. Linear discriminant analysis belongs to a supervised learning algorithm and is a classic algorithm for mode recognition in machine learning. The basic idea of linear discriminant analysis is to project high-dimensional pattern samples to the best discriminant vector space to achieve the effects of extracting signal feature classification information and compressing feature space dimensions, and after projection, the pattern samples are guaranteed to have the maximum inter-class distance and the minimum intra-class distance in a new subspace, that is, the pattern has the best separability in the space. Therefore, it is an effective feature extraction method. Using this method, the inter-class divergence matrix of the post-projection mode pattern can be maximized, while the intra-class divergence matrix is minimized.

The specific flow of linear discriminant analysis is shown in fig. 9, which projects the high-dimensional vector data with labels into a classification space with lower dimensionality. The projected original data can be distinguished according to different categories and divided into a cluster and a cluster, and the points of the same category are closer to each other in the projected space, so that good effects of dimension reduction and classification are achieved.

The specific steps of linear discriminant analysis are given below, and according to the properties of the generalized rayleigh quotient, the following properties are obtained:

wherein A, B are Hermitan matrix of n x n, Z is an n-dimensional vector, and

，λ _minandλ _maxare respectively a matrix

The minimum value and the maximum value of the characteristic value of (c).

The special property of the generalized Rayleigh quotient is utilized to effectively reduce and classify the high-dimensional data of various categories.

is as followsjThe mean vector of the class samples is then calculated,

is the covariance matrix of the j-th class sample.

Assuming that the dimension of the projected low-dimensional space is d, the corresponding basis vector is（W ₁ ，W ₂ ，...W _d ）Base vector ofForming a matrix W of n x d. Then, for any one high-dimensional sample dataZ _iIts projection in a reduced-dimension space isW ^T Z _i. In order to make the distance between the different classes of data as large as possible, it is desirable to maximize

。

，

。

therein II_diag AIs the product of the main diagonal elements of a.

The optimization process of J (W) is converted into:

where d is the dimension of the low dimensional space,W ^Tis a transpose of the matrix W,W _ifor the ith basis vector, the number of vectors,W _i ^Tis the transpose of the ith basis vector.

In the form of analog generalized Rayleigh quotient, A is set as an inter-class divergence matrix of dataS _bSetting B as the in-class divergence matrix of the dataS _WThat is, the optimization target is converted into a similar form of a generalized Rayleigh quotient, the maximum value of the generalized Rayleigh quotient is found by using the eigenvalues and eigenvectors of the intra-class divergence matrix and the inter-class divergence matrix, and then the variance and the mean value after the projection of the different classes of sample data are obtained to further calculate the respective probability density functions.

Therefore, the probability density function obtained by linear discriminant analysis is used as the basis for complex format analysis. And respectively bringing the input of the unknown signal into probability density functions of various modulation types, finding a maximum value from all the calculation results as output, wherein the modulation format corresponding to the probability density function is the modulation format of the unknown signal.

Complex format analytic system model based on high-dimensional information feature extraction

The modulation format recognition system model of the present invention is shown in fig. 10, and dynamically transmits polarization signals of various modulation formats at the transmitting end of the polarization multiplexing coherent optical transmission system according to the requirements of users and the conditions of channels, and reasonably configures the resources of the communication system. The continuous light wave is divided into two paths by a polarization beam splitter at the transmitting end, and information is loaded on an I path and a Q path of the two paths of polarized light respectively by an IQ modulator. After modulation, two paths of mutually independent and orthogonal polarization signals are transmitted through few-mode optical fibers, are received and converted into four paths of signals by a coherent receiver at a receiving end, and are converted into digital signals through a digital-to-analog converter to enter a DSP (digital signal processor) for subsequent calculation of the signals. Before identifying the modulation format of the signal, a series of simple processing independent of the modulation format is performed, such as filtering, resampling, IQ imbalance compensation, dispersion compensation, polarization demultiplexing, and the like.

And then entering a complex format analysis stage to realize the modulation format identification of the signal. Firstly, preprocessing the signal by power normalization, and then mapping a liter dimension by a Stokes formula to obtain a high-dimensional Stokes vector closely related to a signal modulation format. And calculating the probability of various categories by using a maximum likelihood estimation function obtained by linear discriminant analysis, and outputting the maximum value as a modulation format identification result. After the modulation format of the signal is known, different algorithms are selected to perform a series of signal processing such as adaptive equalization, frequency offset recovery, carrier phase recovery, demodulation and the like on the signal according to different modulation formats.

The above is only a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be construed as the scope of the present invention.

Claims

1. A complex format analysis method based on high-dimensional information feature extraction is characterized by comprising the following steps: the method comprises the following steps:

s2: signal feature extraction, namely filtering and polarization imbalance compensation preprocessing are carried out on signals, and then corresponding high-dimensional information features are extracted to serve as classification features to provide basis for subsequent discrimination;

s3: format analysis, namely selecting a result with the maximum probability from the set of all modulation formats by adopting a maximum likelihood function estimation mode, and judging the modulation format of the unknown signal;

in step S1, the method specifically includes the following steps:

1.2) through a Stokes formula, the signal in polarization multiplexing can reflect the relevant information of the modulation characteristic, map into a four-dimensional Stokes vector, only involve the amplitude and phase difference of two routes of polarization signals in the mapping process, therefore avoid certain phase frequency offset and polarization correlation loss;

1.3) taking the mapped four-dimensional Stokes vector as the classification characteristic of the signal, and projecting the four-dimensional signal characteristic to an optimal classification plane by using a linear discriminant analysis method; after high-dimensional information characteristic projection, calculating the mean value and variance of data projected by various modulation formats by using a maximum likelihood function estimation mode, and further obtaining a probability density function corresponding to each modulation format;

in step S2, the method specifically includes the following steps:

in a polarization multiplexing coherent optical communication system, a coherent receiver at a receiving end firstly utilizes a polarization beam splitter to separate polarized light signals transmitted in an optical fiber into two paths, and simultaneously mixes the two paths of polarized light signals with two paths of polarized light generated by local oscillation in two 90-degree optical mixers to respectively obtain the two paths of polarized light signalsI _x ，Q _x ，I _y ，Q _yFour paths of signals are converted by digital-to-analog conversion, and the polarization signals of the X path are obtained by adding an imaginary part ie _xAnd Y-path polarization signale _y；

The obtained complex signals representing the polarization states of the X path and the Y pathe _xAnde _ythe mapping is carried out through the mapping rule of the formula to be converted into a four-dimensional Stokes vector

Whereine _x ^*Ande _y ^*respectively represente _xAnde _yconjugation of (1); wherein

Representing the total power of the two polarized signals,S ₁representing the energy difference of the two polarized signals,S ₂andS ₃respectively represents two phase differences of two paths of polarization signals,a _xanda _yrespectively representing the amplitudes of the two paths of polarization signals, and phi represents the phase difference of the two paths of polarization signals;

in step S3, the method specifically includes the following steps:

performing dimensionality reduction and classification processing on classification features of the transmission signals by using a linear discriminant analysis method;

projecting high-dimensional vector data with labels to a classification space with lower dimensionality by a linear discriminant analysis method; the various original data after projection can be distinguished according to different categories, and according to the property of the generalized Rayleigh entropy, the following data are obtained:

wherein A, B are Hermitan matrix of n x n, Z is an n-dimensional vector, and

，λ _minandλ _maxare respectively a matrix

The minimum value and the maximum value of the characteristic value of (a); reducing and classifying the high-dimensional data by using the special property of the generalized Rayleigh entropy;

the method for reducing and classifying the high-dimensional data specifically comprises the following steps:

d = { (z) for one multi-dimensional dataset₁，v₁)， (z₂，v₂)... (z_m，v_m) The former item of each group of data is an n-dimensional vector, and the latter item is a data category;

is as followsjThe mean vector of the class samples is then calculated,

a covariance matrix of a j-th sample;

assuming that the dimension of the projected low-dimensional space is d, the corresponding basis vector is（W ₁ ，W ₂ ，...W _d ）The basis vectors form a matrix W of n x d; then, for any one high-dimensional sample dataZ _iIts projection in a low-dimensional space isW ^T Z _i(ii) a Maximization

，μFor all sample mean vectors, minimize

；

Data that needs to be maximized or minimized is defined as two new matrices:

，

；

whereinS _bIs an inter-class divergence matrix, and is,S _Wsetting an optimization target for the intra-class divergence matrix:

therein II_diagX is the product of the major diagonal elements of X;

the optimization process of J (W) is converted into:

where d is the dimension of the low dimensional space,W ^Tis a transpose of the matrix W,W _iis as followsThe number of i basis vectors is,W _i ^Tis the transpose of the ith base vector;

setting A as the interspecies divergence matrix of the dataS _bSetting B as the in-class divergence matrix of the dataS _WFinding out the inter-class divergence matrix by using the eigenvalues and eigenvectors of the intra-class divergence matrix and the inter-class divergence matrixS _bAnd intra-class divergence matrixS _WObtaining the variance and the mean of the projected sample data of different classes to further calculate respective probability density functions;

a probability density function obtained by a linear discriminant analysis method is used as a basis for complex format analysis; and respectively bringing the input of the unknown signal into the probability density functions of all the modulation formats, and finding out a maximum value from all the calculation results as output, wherein the modulation format corresponding to the probability density function is the modulation format of the unknown signal.

2. The method for parsing a complex format according to claim 1, wherein the method comprises: in step S3, the method for solving the modulation format of the unknown signal by using the probability density function specifically includes the following steps:

1.33) then respectively bringing the projected signal characteristics into the probability density function of each modulation format class, and calculating the probability that the unknown signal belongs to the class; wherein the category corresponding to the maximum probability is the prediction result.

3. The method for parsing a complex format according to claim 2, wherein the method comprises: the mapping rule is operated on the basis of relative cross polarization signal power and phase difference, in the mapping process, the amplitude and the relative phase of the signal are kept unchanged, phase noise and frequency offset disappear, and information is processed by taking a high-dimensional Stokes vector after being mapped as a classification feature of the signal, so that the signal feature is provided for a subsequent linear discriminant analysis method.

4. The method for parsing a complex format according to claim 3, wherein the method comprises: the system model of the modulation format is used for dynamically transmitting polarization signals of various modulation formats at the transmitting end of the polarization multiplexing coherent optical transmission system according to the requirements of users and the condition of a channel, and configuring the resources of the communication system; dividing the continuous light wave into two paths by a polarization beam splitter at a transmitting end, and loading information on an I path and a Q path of the two paths of polarized light respectively by an IQ modulator; after modulation, two paths of mutually independent and orthogonal polarization signals are transmitted through a few-mode optical fiber, are received and converted into four paths of signals by a coherent receiver at a receiving end, and are converted into digital signals through a digital-to-analog converter to enter a DSP (digital signal processor) for subsequent calculation of the signals; firstly, preprocessing a signal by power normalization, and then mapping a liter dimension by a Stokes formula to obtain a high-dimensional Stokes vector closely related to a signal modulation format; and calculating the probability of various categories by using a maximum likelihood function obtained by a linear discriminant analysis method, and outputting the maximum value as a modulation format identification result.