CN117675061A

CN117675061A - Few-mode multi-core fiber channel modeling method based on light knowledge distillation

Info

Publication number: CN117675061A
Application number: CN202311672337.2A
Authority: CN
Inventors: 孙伟; 刘振华; 王建江; 黄秋实; 张功会; 王林; 董智慧; 贺作为
Original assignee: Hengtong Optic Electric Co Ltd; Jiangsu Alpha Optic Electric Technology Co Ltd
Current assignee: Hengtong Optic Electric Co Ltd; Jiangsu Alpha Optic Electric Technology Co Ltd
Priority date: 2023-12-07
Filing date: 2023-12-07
Publication date: 2024-03-08

Abstract

The invention discloses a few-mode multi-core fiber channel modeling method based on light knowledge distillation, which comprises the following steps: designing a lightweight convolutional neural network MobileNet model, wherein the convolutional type is depth separable convolutional; and compressing the convolutional neural network MobileNet model through knowledge distillation to obtain the required few-mode multi-core fiber channel model. According to the light-weight knowledge distillation-based few-mode multi-core fiber channel modeling method, the convolutional neural network MobileNet model is used as a teacher model, the knowledge in the convolutional neural network MobileNet model is transferred to a student model with small body size by adopting a mode of compressing the convolutional neural network MobileNet model through knowledge distillation, the student model with small body size realizes signal transmission capacity matched with the teacher model, the method is easier to deploy into each channel, the calculation complexity is reduced, and compared with the traditional few-mode channel modeling method, the method is more accurate and more efficient.

Description

Few-mode multi-core fiber channel modeling method based on light knowledge distillation

Technical Field

The invention belongs to the technical field of optical fibers, and particularly relates to a few-mode multi-core optical fiber channel modeling method based on light-weight knowledge distillation.

Background

Modern society has been in the age of high-speed bursts of information technology, and the demands of the internet, big data, cloud services, IPTV and next-generation mobile communication technologies for broadband data transmission capacity are increasing, which puts higher demands on the transmission rate of the optical fiber transmission network. Since the 80 s of the last century, wavelength division multiplexing and dense wavelength division multiplexing technologies, erbium-doped fiber amplifiers, various high-performance optical fibers and coherent optical communication technologies have been used in optical fiber transmission lines, so that the capacity and transmission distance of an optical transmission system are greatly expanded, and meanwhile, new transmission technologies such as orthogonal frequency division multiplexing and high-order modulation formats are gradually put into practical use. However, the inherent nonlinear characteristics of Standard Single Mode Fiber (SSMF) limit the capacity boost, the shannon limit of which is being approached gradually. In order to break through the transmission bottleneck brought by single-mode optical fibers, optical transmission technologies based on spatial dimension multiplexing, including mode multiplexing and multi-core optical fiber technologies, are becoming a research hotspot.

The mode multiplexing technology based on the few-mode optical fiber utilizes a plurality of orthogonal modes in space for transmission, and can doubly improve the frequency spectrum efficiency of a single optical fiber. The few-mode multi-core fiber has a plurality of cores, each core supporting a plurality of transmission modes. In theory, the modes are orthogonal to each other, the fiber cores are not interfered with each other, and the transmission capacity of the optical fiber is greatly enlarged. Optical networks based on few-mode fibers are considered as a powerful candidate for next-generation elastic optical networks. However, at present, only a standard single mode fiber modeling method based on a nonlinear schrodinger equation is available, and mature physical principle support is lacking for few-mode multiplexing fiber modeling. Therefore, modeling, rational planning and design and feasibility demonstration of the physical theory of the novel fiber channel are of profound significance.

Disclosure of Invention

Aiming at the problems that few-mode multi-core fiber modeling lacks mature theoretical support and the complexity of the existing modeling method is high, the invention aims to provide a few-mode multi-core fiber channel modeling method based on light-weight knowledge distillation.

In order to achieve the above purpose and achieve the above technical effects, the invention adopts the following technical scheme:

a few-mode multi-core fiber channel modeling method based on light knowledge distillation comprises the following steps:

designing a lightweight convolutional neural network MobileNet model, wherein the convolutional type is depth separable convolutional;

compressing a convolutional neural network MobileNet model through knowledge distillation to obtain a required few-mode multi-core fiber channel model;

according to the invention, the convolutional neural network MobileNet model is used as a teacher model, the convolutional neural network MobileNet model is compressed by adopting knowledge distillation, knowledge in the complex teacher model is transferred to the student model with small body volume, the student model with small body volume can realize the signal transmission capacity matched with the teacher model, and the convolutional neural network MobileNet model is easier to deploy into each channel, and compared with the traditional few-mode channel modeling method, the convolutional neural network MobileNet model is more accurate and more efficient.

Further, the depth separable convolution comprises a layer of depth convolution and a layer of point-by-point convolution, the parameter quantity of the depth separable convolution comprises a parameter quantity of the depth convolution and a parameter quantity of the point-by-point convolution, and the calculated quantity of the depth separable convolution comprises a calculated quantity of the depth convolution and a calculated quantity of the point-by-point convolution.

Further, the depth separable convolutionThe parameter number of (D) _K ×D _K ×M+M×N；

The depth convolution has a parameter of D _K ×D _K X M, wherein the convolution kernel of the depth convolution has a size D _K ×D _K X 1, the number is M;

the number of parameters of the point-by-point convolution is M multiplied by N, wherein the convolution kernel of the point-by-point convolution is 1 multiplied by M, and the number is N.

Further, the depth separable convolution is calculated as D _K ×D _K ×M×D _F ×D _F +M×N×D _F ×D _F ；

The calculated amount of the depth convolution is D _K ×D _K ×M×D _F ×D _F Wherein the convolution kernel of the depth convolution has a size D _K ×D _K X1, M, D for each _F ×D _F Performing a secondary multiplication and addition operation;

the calculated amount of the point-by-point convolution is M multiplied by N multiplied by D _F ×D _F Wherein the convolution kernel of the point-by-point convolution has a size of 1×1×m, a number of N, and each of which is D _F ×D _F And (5) carrying out a secondary multiplication and addition operation.

Further, the ratio of the parameter quantity of the depth separable convolution to the parameter quantity of the standard convolution is:

further, the ratio of the calculated amount of the depth separable convolution to the calculated amount of the standard convolution is:

further, the step of compressing the convolutional neural network MobileNet model by knowledge distillation includes:

taking a convolutional neural network MobileNet model as a teacher model, performing knowledge distillation to obtain a student model, and performing supervision training on the student model by using the teacher model; the difference of output predicted values between the teacher model and the student model is calculated through the distilled loss function distillation loss, gradient updating is carried out by combining the loss function loss of the student model as the whole training loss function loss, and finally the student model with high performance and precision is obtained and is used as a required few-mode multi-core fiber channel model.

Further, the overall training loss function loss is expressed as:

L＝(1-α)CE(y,p)+αCE(q,p)T ²

wherein CE is cross entropy, y is true label, p is intermodal dispersion of the few-mode multi-core fiber, q is intermodal coupling of the few-mode multi-core fiber, alpha is weight of distillation loss, and T is temperature of knowledge distillation.

The invention also discloses an application of the few-mode multi-core fiber channel model obtained by the few-mode multi-core fiber channel modeling method based on light knowledge distillation in the field of few-mode multi-core fibers.

The invention also discloses an application method of the few-mode multi-core fiber channel model in the field of few-mode multi-core fibers, which comprises the following steps:

at the transmitting end of the optical fiber link terminal, an arbitrary waveform generator generates an electric signal, and the electric signal is transmitted to a modulator after being amplified by an amplifier; simultaneously, the optical signal generated by the light source is also injected into the modulator;

the modulation signal output by the modulator is amplified by the erbium-doped fiber amplifier and then sent to the optical coupler to be input into the few-mode fiber for transmission; collecting data from a physical layer device of a transmitting end, inputting a few-mode multi-core fiber channel model, and finally outputting an optimal signal to a receiving end;

at the receiving end, firstly, the optical coupler is adopted to decouple the multipath signals, then the signals are amplified, then the variable optical attenuator is adopted to adjust the optical power, then the photoelectric detector is input to identify the electric signal information, then the signals are sampled by the mixed oscilloscope, and finally the signals are output after the data processing.

Compared with the prior art, the invention has the beneficial effects that:

the invention discloses a few-mode multi-core fiber channel modeling method based on light knowledge distillation, which realizes knowledge distillation by using a teacher-student model mode, simplifies a complex neural network for transmitting optical signals in multiple channels and strengthens the optical transmission performance of a student model; by distilling knowledge from a complex teacher model to a simple student model, the deployment difficulty of a neural network is reduced, the calculation complexity is reduced, the signal transmission of the few-mode multi-core optical fiber is realized by a lightweight student model, and the response time is improved.

Drawings

FIG. 1 is a schematic diagram of a conventional convolutional neural network;

FIG. 2 is a schematic representation of the structure of the depth separable convolution of the present invention;

FIG. 3 is a flow chart of knowledge distillation in accordance with the present invention;

FIG. 4 is a modeling flow chart of the present invention;

fig. 5 is a flow chart of the application of the few-mode multi-core fiber channel model of the present invention in the field of few-mode multi-core fibers.

Detailed Description

The present invention is described in detail below so that advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and unambiguous the scope of the present invention.

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

The neural network has the following advantages: (1) Strong robustness and fault tolerance, because the information is distributed and stored in neurons in the network; (2) the parallel processing method makes the calculation fast; (3) The method has the characteristics of self-learning, self-organization and self-adaptability, so that the neural network can process an uncertain or unknown system; (4) any complex nonlinear relationship can be approximated sufficiently; (5) The method has strong information comprehensive capability, can process quantitative and qualitative information simultaneously, can coordinate various input information relations well, and is suitable for multi-information fusion and multimedia technology. Therefore, the invention introduces a neural network to model the few-mode multi-core optical fiber.

Because many neural network models are huge, have more parameters and larger calculation amount, the neural network models are difficult to be applied in some actual scenes, and a complex network structure can cause a plurality of efficiency problems. On the one hand, hundreds of layers of neural networks have a large number of weight parameters, and the memory requirement for the device to store these parameters is high. On the other hand, in practical applications, the processing speed of the system is often in the millisecond level, and in order to achieve the practical application standard, either the performance of the processor is improved or the calculation amount of the network model needs to be reduced. To solve these efficiency problems, a compensation method is generally adopted to perform model compression, that is, compression is performed on a trained model, so that a network carries fewer network parameters, thereby solving the memory problem and the speed problem. However, lightweight network model design is another approach than processing on already trained models. The lightweight convolutional neural network changes the internal structure and the calculation method of the neural network, has more efficient distributed training, and can effectively reduce the overhead when a new model is derived.

Therefore, the invention designs a lightweight convolutional neural network MobileNet model to greatly reduce parameters and calculated amount in the operation process, and uses knowledge distillation to compress the MobileNet model, so that network parameters are reduced, network performance is not lost, and a more accurate and faster few-mode multi-core fiber channel model is constructed, as shown in figures 1-5.

Compared with the traditional convolutional neural network shown in fig. 1, the lightweight convolutional neural network MobileNet model can greatly reduce model parameters and calculation on the premise of low accuracy and low amplitude. The convolution type of the lightweight convolutional neural network MobileNet model is Depth Separable Convolution (DSC), and is a convolution structure. The invention utilizes the MobileNet model to learn the coupling crosstalk between modes of the few-mode multi-core optical fiber under random perturbation condition, and combines the differential time delay between modes to establish a channel model with mode cross action.

The depth separable convolution is formed by combining a layer of Depth Convolution (DC) and a layer of point-by-Point Convolution (PC), each layer of convolution is followed by BN and ReLU activation functions, and the difference between the BN and the ReLU activation functions and the standard convolution is that under the condition that the precision is basically unchanged, the parameters and the calculated amount are obviously reduced.

The number of parameters and the amount of calculation are decisive factors for judging whether the network model is small or light, and the number of parameters and the amount of calculation are briefly described herein. The number of parameters refers to how many parameters are needed in the network, and for convolution, the number of all values in the convolution kernel, which is often related to the space usage. The calculated amount is how many times we have performed multiply-add operations in the network, and for convolution, the characteristic diagram we have obtained is obtained by performing a series of multiply-add operations, and the calculation formula is the dimension D of the convolution kernel _K ×D _K xM, number of convolution kernels N and output feature map size D _F ×D _F Often the amount of computation is related to the time consumption. Using conventional standard convolution, we can calculate its parameter and calculation:

parameter number: the size of the convolution kernel is D _K ×D _K X M, N total, so the parameter of the standard convolution is D _K ×D _K ×M×N；

Calculated amount: the size of the convolution kernel is D _K ×D _K X M, N in total, each of which is D _F ×D _F Minor operations (assuming that the size of the output feature map is also D _F ×D _F ) The calculated amount of the standard convolution is D _K ×D _K ×M×N×D _F ×D _F 。

Because the depth separable convolution is formed by combining a layer of DC and a layer of PC, the difference between the depth convolution and the standard convolution is that the convolution kernel of the depth convolution is in a single-channel mode, and each channel of the input needs to be convolved, so that an output characteristic diagram consistent with the channel number of the input characteristic diagram is obtained, and the channel number of the input characteristic diagram = the convolution kernel number = the output characteristic diagram number, so that the number of the output characteristic diagrams is too small, and the effectiveness of information is possibly influenced. This time a point-by-point convolution is required. The essence is that the upgrad is performed by using a convolution kernel of 1×1, and as shown in fig. 2, the parameter quantity and the calculated quantity of the depth separable convolution can be calculated:

parameter amount of depth convolution: convolution kernel size D of depth convolution _K ×D _K X1, number is M, so the parameter is D _K ×D _K ×M；

Parameter amount of point-wise convolution: the convolution kernel size of the point-by-point convolution is 1×1×m, and the number is N, so the parameter is m×n;

thus, the depth separable convolution has a parameter of D _K ×D _K ×M+M×N；

Calculated amount of depth convolution: convolution kernel size D of depth convolution _K ×D _K X1, M, D for each _F ×D _F The sum of the times is calculated as D _K ×D _K ×M×D _F ×D _F ；

Calculated amount of point-by-point convolution: the convolution kernel of the point-by-point convolution has a size of 1×1×m, a number of N, each of which is D _F ×D _F The number of operations is MXNXD because of the multiply-add operation _F ×D _F ；

Thus, the depth separable convolution is calculated as D _K ×D _K ×M×D _F ×D _F +M×N×D _F ×D _F 。

The parameters and calculations of the standard convolution and the depth separable convolution are obtained, and we use their ratios to make some comparisons:

parameter ratio:

calculated ratio:

in general, the number N is relatively large,negligible, D _k Indicating the size of the convolution kernel, if D _k =3, then->I.e. if a common 3 x 3 convolution kernel is used, the number of parameters and the amount of computation using depth separable convolutions is reduced to about one-ninth of the original, which also thus greatly reduces our amount of computation.

In order to cope with the task of modeling a channel with a complicated and complicated model, a network model with deep learning is often designed to be deep and complicated, and parameters of the model are gradually aggravated along with the complicating of the model. Although the mobile network model is utilized to reduce model parameters and calculation amount, so that the mobile network model is easier to embed into a mobile terminal, and a simulation model of the few-mode multi-core optical fiber is quickly and accurately built, how to more fully utilize the model resources and use the information contained in the model resources for guiding a new training stage is still a great problem.

Knowledge distillation is a common model compression method, and is used for model compression, namely, in a teacher-student model framework, complex network learned characteristics with strong learning ability are distilled out and transmitted to a network with small parameter and weak learning ability, so that a network with high speed and strong learning ability can be obtained. Unlike other model compression techniques, knowledge distillation models do not obscure key information during compression, nor do they lose learning accuracy of the model, are not limited to compression models only, but rather learn transfer learning. The method has the greatest advantages that the existing model resources can be reused, the information contained in the model resources is used for guiding a new training stage, the dilemma that a data set and a training model are required to be manufactured again in the past for task or scene change is also changed in the cross-domain application, and the cost of deep neural network training and application is greatly saved. From another perspective, distillation may allow the student model to learn more softened knowledge of the teacher model, which contains inter-category information that is not available with conventional single thermal tags. Distillation can also be considered a regularized strategy due to the nature of the softening tag in distillation.

Thus, the present invention uses knowledge distillation to compress the MobileNet model, steps are shown in fig. 3.

In the knowledge distillation stage, a user needs to train a teacher model on a known data set in advance, and then when training a student model, the obtained teacher model is used for supervision training to achieve the purpose of distillation, and the training precision of the teacher is higher than that of the student model, and the distillation effect is more obvious as the difference value is larger. Generally, model parameters of a teacher are kept unchanged in the distillation training process, so that the purpose of training a student model is achieved. Based on the method, the convolutional neural network MobileNet model is used as a teacher model, knowledge distillation is carried out, a student model is obtained, and the teacher model is used for supervising and training the student model. The distilled loss function distillation loss calculates the difference of output predicted values between the teacher model and the student model, and the difference is added with the loss function loss of the student model to be used as the whole training loss function loss to update the gradient, so that a high-performance and high-precision student model is finally obtained and is used as the required few-mode multi-core fiber channel model. The entire training loss function loss can be written as:

L＝(1-α)CE(y,p)+αCE(q,p)T ²

wherein CE is Cross Entropy (Cross-Entropy), y is true label, p is intermodal dispersion of the few-mode multi-core fiber, q is intermodal coupling of the few-mode multi-core fiber, alpha is weight of distillation loss, and T is temperature of knowledge distillation. The intermodal coupling and intermodal dispersion of the student model can be calculated by a loss function formula.

As shown in the modeling flow chart of fig. 4, the modeling process includes the steps of:

firstly, initializing basic parameters including parameters such as transmission distance, dispersion coefficient, normalized frequency, inter-core crosstalk, fiber core radius and the like; generating random original digital information as input data of a channel model, modulating the original input information, and then sampling, wherein the sampling process is to sample a sample sequence at intervals of a plurality of samples once to obtain a new transmission sequence, so that the condition that a filter is too steep during filtering is avoided; inputting the up-sampled data into a trained few-mode multi-core fiber channel model obtained by a few-mode multi-core fiber channel modeling method based on light knowledge distillation, carrying out few-mode multi-core fiber channel simulation transmission, and then adding simulation noise; after digital down conversion at a receiving end, the sampling rate of the signal is still high, and the data volume is large, so that down sampling is performed, the sampling frequency is reduced, and the operation amount is reduced; after that, dispersion compensation is carried out, so that the inter-code overlapping caused by pulse stretching is avoided, and more accurate transmission information is obtained; in the MIMO equalization process, a blind equalization method is adopted, and the received signal sequence is utilized to carry out self-adaptive equalization on the channel, so that the received signal is more similar to the original signal; and finally, demodulating the received signal to recover the original transmitted data.

The invention also discloses an application method of the few-mode multi-core fiber channel model in the field of few-mode multi-core fibers, as shown in fig. 5, comprising the following steps:

at the transmitting end of the optical fiber link terminal, an arbitrary waveform generator generates an electric signal, and the electric signal is transmitted to a modulator after being amplified by an amplifier; the light source is a continuous wave laser which works at 1550nm and has the optical power of 14.5dBm, and the generated optical signal is also injected into the modulator; the modulation signal output by the modulator is amplified by the erbium-doped fiber amplifier and then sent to the optical coupler to be input into the few-mode fiber for transmission; collecting data from a physical layer device of a transmitting end, inputting a few-mode multi-core channel model obtained by performing knowledge distillation on a MobileNet model, and finally outputting an optimal signal to a receiving end; the modulator is preferably a mach-zehnder modulator;

Parts or structures of the present invention, which are not specifically described, may be existing technologies or existing products, and are not described herein.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related arts are included in the scope of the present invention.

Claims

1. The few-mode multi-core fiber channel modeling method based on light knowledge distillation is characterized by comprising the following steps of:

and compressing the convolutional neural network MobileNet model through knowledge distillation to obtain the required few-mode multi-core fiber channel model.

2. The method for modeling a few-mode multi-core fiber channel based on lightweight knowledge distillation according to claim 1, wherein the depth separable convolution comprises a layer of depth convolution and a layer of point-by-point convolution, the parameter amounts of the depth separable convolution comprise a parameter amount of the depth convolution and a parameter amount of the point-by-point convolution, and the calculated amounts of the depth separable convolution comprise a calculated amount of the depth convolution and a calculated amount of the point-by-point convolution.

3. The method for modeling a few-mode multi-core fiber channel based on lightweight knowledge distillation according to claim 2, wherein the parameter of the depth separable convolution is D _K ×D _K ×M+M×N；

4. The method for modeling a few-mode multi-core fiber channel based on lightweight knowledge distillation according to claim 2, wherein said depth separable convolution is calculated as D _K ×D _K ×M×D _F ×D _F +M×N×D _F ×D _F ；

5. A method of modeling a few-mode multi-core fibre channel based on lightweight knowledge distillation as claimed in claim 3, wherein the ratio of the parameter quantity of the depth separable convolution to the parameter quantity of the standard convolution is:

6. the method for modeling a few-mode multi-core fiber channel based on lightweight knowledge distillation according to claim 4, wherein the ratio of the calculated amount of depth separable convolution to the calculated amount of standard convolution is:

7. the method for modeling a few-mode multi-core fiber channel based on lightweight knowledge distillation of claim 1, wherein the step of compressing the convolutional neural network MobileNet model by knowledge distillation comprises:

8. The method for modeling a few-mode multi-core fiber channel based on lightweight knowledge distillation as claimed in claim 7, wherein the whole training loss function loss is expressed as:

L＝(1-α)CE(y,p)+αCE(q,p)T ²

9. Use of a few-mode multi-core fibre channel model obtained by a few-mode multi-core fibre channel modeling method based on lightweight knowledge distillation according to any of claims 1-8 in the field of few-mode multi-core fibres.

10. Application method according to claim 9, characterized in that it comprises the following steps: