CN114596571A

CN114596571A - Intelligent lens-free character recognition system

Info

Publication number: CN114596571A
Application number: CN202210246740.8A
Authority: CN
Inventors: 张颖而; 皇甫江涛
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-06-07
Anticipated expiration: 2042-03-14
Also published as: CN114596571B

Abstract

The invention discloses an intelligent lensless character recognition system. The system comprises an optical module and a calculation imaging and intelligent character positioning recognition module, wherein the optical module consists of a mask plate with adjustable amplitude and a sensor, and the transmission light amplitude distribution of the mask plate is modeled into a two-dimensional convolution layer and can be optimized as a parameter; the calculation imaging and character positioning identification module comprises a calculation imaging model, a character positioning model and a character identification model, input data is original data obtained on a sensor after passing through the optical module, the original data is output to be a text form of predicted characters, and meanwhile, the light transmission amplitude distribution of a mask plate in the optical module and the parameters of a calculation imaging network are optimized through result feedback. The invention realizes the software and hardware integrated lens-free imaging and character recognition deep learning model optimization, improves the accuracy of character positioning and character recognition under the condition of no lens, and each module of the system has universality and strong practical applicability.

Description

An Intelligent Lensless Character Recognition System

技术领域technical field

本发明属于无透镜成像领域，具体涉及一种智能无透镜文字识别系统。The invention belongs to the field of lensless imaging, in particular to an intelligent lensless character recognition system.

背景技术Background technique

随着视觉任务的快速发展和应用，相机被集成在各种硬件设备上。某些应用场景对相机尺寸有严格的要求，无透镜相机是一种使用薄掩膜版替代镜头的成像系统，因此可以大大减小相机尺寸。With the rapid development and application of vision tasks, cameras are integrated on various hardware devices. Some application scenarios have strict requirements on the size of the camera, and a lensless camera is an imaging system that uses a thin reticle to replace the lens, so the size of the camera can be greatly reduced.

和带镜头的相机相比，无透镜相机需要对传感器上收集的数据进行计算成像才能恢复图像，但是基于无透镜重建的图像存在模糊、分辨率的缺点，导致无法胜任很多视觉任务，目前尚未有对基于无透镜的非单个字符文字检测和识别的研究。Compared with a camera with a lens, a lensless camera needs to perform computational imaging on the data collected on the sensor to restore the image, but the image reconstructed based on the lensless has the shortcomings of blur and resolution, which makes it unable to perform many visual tasks. Research on non-single-character text detection and recognition based on lensless.

因此，需要一套无透镜文字识别系统。Therefore, a lensless character recognition system is required.

发明内容SUMMARY OF THE INVENTION

针对目前无透镜成像技术由于较差的成像质量而未应用于非单个字母的文字定位和识别的情况，本发明提供了一种基于无透镜的文字定位和识别系统。识别准确率高且该系统方法具有通用性。Aiming at the situation that the current lensless imaging technology is not applied to character location and recognition of non-single letters due to poor imaging quality, the present invention provides a lensless-based character location and recognition system. The recognition accuracy is high and the system method has generality.

本发明采用的技术方案如下：The technical scheme adopted in the present invention is as follows:

本发明的智能无透镜文字识别系统包括光学模块和计算成像及文字定位识别模块，光学模块主要由平行放置的可调制幅度掩膜板和光学传感器组成，待识别目标放置于光学模块前方，待识别目标发出的光线经可调制幅度掩膜板散射后，在光学传感器的平面上投射形成投影图像(原始数据)，光学传感器将投影图像传输至计算成像及文字定位识别模块；The intelligent lensless character recognition system of the present invention includes an optical module and a computational imaging and character positioning recognition module. The optical module is mainly composed of an adjustable amplitude mask and an optical sensor placed in parallel. The target to be recognized is placed in front of the optical module, and the target to be recognized After the light emitted by the target is scattered by the adjustable amplitude mask, it is projected on the plane of the optical sensor to form a projected image (raw data), and the optical sensor transmits the projected image to the computational imaging and text positioning and recognition module;

计算成像及文字识别模块包括计算成像模型、文字定位模型和文字识别模型，三个模型串行连接；计算成像及文字识别模块的输入为经光学模块后在传感器上得到的投影图像，输出为投影图像上文字的文本形式。The computational imaging and text recognition module includes a computational imaging model, a text positioning model and a text recognition model, and the three models are connected in series; the input of the computational imaging and text recognition module is the projection image obtained on the sensor after the optical module, and the output is the projection image The textual form of the text on the image.

所述的可调制幅度掩膜板为由k*k个单元格组成的二值化掩膜版，每个单元格的值为1或0，1表示光线能通过，0表示光线不能通过。The modulated amplitude mask is a binarized mask composed of k*k cells, and the value of each cell is 1 or 0, 1 means that light can pass, and 0 means that light cannot pass.

投影图像经计算成像模型输出预测的重建图像；文字定位模型对输入的重建图像进行处理，输出图像中文字的位置；将文字定位模型的输出结果输入文字识别模型后，输出图像的文字识别结果；The projected image outputs the predicted reconstructed image through the computational imaging model; the text localization model processes the input reconstructed image to output the position of the text in the image; after the output result of the text localization model is input into the text recognition model, the text recognition result of the image is output;

计算成像及文字识别模块训练过程中，仅计算成像模型参与训练，需更新参数，文字定位模型和文字识别模型不参与训练。During the training process of the computational imaging and text recognition module, only the computational imaging model participates in the training, and the parameters need to be updated. The text positioning model and text recognition model do not participate in the training.

计算成像模型为编码器-解码器体系的神经网络，具体采用U-NET；文字定位模型采用任意文字定位模型结构，具体采用CTPN；文字识别模型采用任意文字识别模型结构，具体采用CRNN。The computational imaging model is a neural network of the encoder-decoder system, specifically U-NET; the text localization model adopts an arbitrary text localization model structure, specifically CTPN; the text recognition model adopts an arbitrary text recognition model structure, specifically CRNN.

可调制幅度掩膜板上的图案通过液晶显示器显示，掩模版上的图案随机生成或通过训练优化后确定；通过训练优化后确定掩模版图案的方法包括以下步骤：The pattern on the modulated amplitude mask is displayed on a liquid crystal display, and the pattern on the mask is randomly generated or determined after optimization by training; the method for determining the pattern on the mask after optimization by training includes the following steps:

1)将待识别目标与光学模块的成像过程建模为二维卷积层，具体为：1) Model the imaging process of the target to be recognized and the optical module as a two-dimensional convolution layer, specifically:

m＝w*om=w*o

其中，w表示掩模版上的幅度分布，即掩模版上单元格的值分布；以掩模版中心点为原点构建坐标系，(i，j)为掩膜板上单元格中心点的坐标，w_i，j表示掩膜板上坐标为(i，j)的单元格的值；Among them, w represents the amplitude distribution on the reticle, that is, the value distribution of the cells on the reticle; the coordinate system is constructed with the center point of the reticle as the origin, (i, j) is the coordinate of the center point of the cell on the reticle, w _{i, j} represents the value of the cell whose coordinates are (i, j) on the mask;

o表示待识别目标不经过掩模版时在传感器平面上缩放后的图像(即o表示待识别目标经过孔径时在传感器平面上缩放后的图像)；以传感器平面中心点为原点构建坐标系，(x，y)表示投影图像的像素点在传感器平面上的坐标值，o_x，y表示待识别目标不经过掩模版时在传感器平面的(x，y)处的像素值；o_x+i，y+j表示在传感器平面上(x+i,y+j)处的像素值；o represents the image scaled on the sensor plane when the target to be recognized does not pass through the reticle (that is, o represents the image scaled on the sensor plane when the target to be recognized passes through the aperture); the coordinate system is constructed with the center point of the sensor plane as the origin, ( x, y) represents the coordinate value of the pixel of the projected image on the sensor plane, o _{x, y} represents the pixel value at (x, y) of the sensor plane when the target to be recognized does not pass through the reticle; o _{x+i, y+j} represents the pixel value at (x+i, y+j) on the sensor plane;

m表示待识别目标经过掩模版后投影在传感器平面上的图像；m_x，y表示待识别目标经过掩模版后在传感器平面的(x，y)处的像素值；m represents the image projected on the sensor plane after the target to be recognized passes through the reticle; m _{x, y} represents the pixel value at (x, y) of the sensor plane after the target to be recognized passes through the reticle;

k表示掩模版上单元格的行数或列数，i∈[1，k]；k represents the number of rows or columns of cells on the reticle, i∈[1,k];

2)将二维卷积层进行二值化得到二值神经网络二维卷积层，结果如下：2) Binarize the two-dimensional convolutional layer to obtain a two-dimensional convolutional layer of a binary neural network. The results are as follows:

其中，in,

其中，w^b表示对w进行二值化处理后的结果；Among them, w ^b represents the result of binarizing w;

由于掩模版只有0和1值，我们使用二值神经网络来训练，二值神经网络使用sign函数将二连续值映射到-1或+1，随后加1并除以2；Since the mask has only 0 and 1 values, we use a binary neural network for training, which uses the sign function to map two consecutive values to -1 or +1, then add 1 and divide by 2;

3)将二值神经网络二维卷积层的参数w^b作为模型参数与计算成像及文字定位识别模块一同训练优化；3) The parameter w ^b of the two-dimensional convolution layer of the binary neural network is used as a model parameter to be trained and optimized together with the computational imaging and text location recognition modules;

3.1)训练过程中，通过电路调整对掩膜板的图案做随机初始化，并将随机初始化的结果作为二值神经网络卷积层的初始参数；3.1) During the training process, the pattern of the mask is randomly initialized through circuit adjustment, and the result of random initialization is used as the initial parameter of the convolutional layer of the binary neural network;

3.2)系统前向传播过程的训练：固定待识别目标，在真实物理场景中测量待识别目标经掩膜版后在光学传感器的平面上得到的投影图像，并将其作为计算成像及文字定位识别模块的输入；3.2) Training of the forward propagation process of the system: fix the target to be recognized, measure the projected image of the target to be recognized on the plane of the optical sensor after passing through the mask in the real physical scene, and use it as computational imaging and text positioning recognition module input;

反向传播过程的训练：计算成像及文字定位识别模块输出的预测图像与真实图像标签的损失函数Loss，将损失函数Loss反向传播至二值神经网络卷积层，更新二值神经网络卷积层参数w^b，并根据更新的参数w^b调制可调掩膜版，调制结果作为下一轮训练时模型前向传播过程中的掩膜版图案；Training of the backpropagation process: Calculate the loss function Loss of the predicted image output by the imaging and text location recognition module and the real image label, backpropagate the loss function Loss to the convolution layer of the binary neural network, and update the convolution of the binary neural network. layer parameter w ^b , and modulates the adjustable mask according to the updated parameter w ^b , and the modulation result is used as the mask pattern in the forward propagation process of the model in the next round of training;

3.3)训练完成后得到的掩膜版图案为优化后的结果。3.3) The mask pattern obtained after training is the optimized result.

所述的可调制掩模版的单元格尺寸大小和传感器平面上的像素点尺寸大小相同；待识别目标与可调制幅度掩膜板之间的距离d1远大于可调制幅度掩膜板和光学传感器之间的距离d2，d1>100d2；因此将掩模版上的幅度分布近似等于掩模版上的幅度分布在传感器平面上的投影。The cell size of the modulated reticle is the same as the pixel size on the sensor plane; the distance d1 between the target to be identified and the modulated amplitude reticle is much greater than the distance between the modulated amplitude reticle and the optical sensor. The distance d2, d1>100d2; therefore, the amplitude distribution on the reticle is approximately equal to the projection of the amplitude distribution on the reticle on the sensor plane.

所述的计算成像及文字定位识别模块在训练过程中的损失函数Loss为：The loss function Loss of the computational imaging and text location recognition module in the training process is:

Loss＝a×Loss1+b×Loss2；Loss=a×Loss1+b×Loss2;

其中，Loss1为计算成像模型输出的预测图像与真实图像标签(待识别目标图像)之间的误差；Loss2为计算成像及文字定位识别模块最终输出的预测文本与待识别目标的真实文字标签(待识别目标图像上的文本信息)之间的误差；a和b为权重。Among them, Loss1 is the error between the predicted image output by the computational imaging model and the real image label (target image to be recognized); Loss2 is the predicted text finally output by the computational imaging and text location recognition module and the real text label of the target to be recognized (to be recognized). Recognize the error between the text information on the target image); a and b are weights.

本发明的有效效益：Effective benefits of the present invention:

本发明的无透镜文字识别系统能够减少镜头带来的尺寸限制，使得相机被集成在其他设备上更加方便。The lensless character recognition system of the present invention can reduce the size limitation brought by the lens, so that it is more convenient for the camera to be integrated on other devices.

本发明实现了软硬件一体化的无透镜成像和文字识别的深度学习模型优化，提高了在无透镜下的文字定位和文字识别的准确率，且该系统的每个模块具有通用性和普适性，具有很强的实际应用性。The invention realizes the optimization of the deep learning model of lensless imaging and character recognition integrated with software and hardware, improves the accuracy of character positioning and character recognition without lens, and each module of the system has universality and universality , has strong practical application.

附图说明Description of drawings

图1是本发明的整体数据流。Figure 1 is the overall data flow of the present invention.

图2是本发明中光学模块的原理图。FIG. 2 is a schematic diagram of an optical module in the present invention.

图3是本发明中计算成像及文字定位识别模块的原理图。FIG. 3 is a schematic diagram of a computational imaging and character location recognition module in the present invention.

具体实施方式Detailed ways

下面结合附图及具体实施例对本发明作进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

如图1所示，本发明的无透镜文字识别系统包括光学模块、计算成像及文字定位识别模块。待识别目标通过光学模块获得原始数据，原始数据通过计算成像及文字定位识别模块获得文字的文本形式。As shown in FIG. 1 , the lensless character recognition system of the present invention includes an optical module, a computational imaging and character location recognition module. The target to be recognized obtains the original data through the optical module, and the original data obtains the text form of the text through the computational imaging and text positioning and recognition module.

光学模块由可调制幅度掩膜板和传感器组成，其中可调制幅度掩膜板是一种二值化的掩模版，其分为k*k个单元格，单元格的值为1或0，1代表光线能通过，0代表光线不能通过，放置在光学传感器前，且两者平行放置，物体放置在掩膜版前，物体发出的光线经过可调制幅度掩膜板散射后，将特定的投影图像(原始数据)投射在传感器所处平面上，由传感器记录后传到计算成像及文字定位识别模块。该可调制幅度掩膜板的图案可通过电路实时控制，具体可通过液晶显示器显示，掩膜板上的图案既可以随机生成，也可以通过训练优化更新掩膜版；通过训练优化获得的掩膜版能使计算成像及文字定位识别模块的识别效果更好。当光学模块中的掩膜版图案固定时，在不优化掩膜版的情况下，使用固定掩膜版图案，仍然可以通过传感器上得到的原始数据，通过计算成像及文字定位识别模块，获得文字的定位和识别结果。The optical module consists of a modulated amplitude mask and a sensor, wherein the modulated amplitude mask is a binarized mask, which is divided into k*k cells, and the value of the cell is 1 or 0, 1 It means that the light can pass through, and 0 means that the light cannot pass through. It is placed in front of the optical sensor, and the two are placed in parallel. The object is placed in front of the mask. After the light emitted by the object is scattered by the adjustable amplitude mask, a specific projection image is displayed. The (raw data) is projected on the plane where the sensor is located, recorded by the sensor and then transmitted to the computational imaging and text positioning and recognition module. The pattern of the modulated amplitude mask can be controlled in real time through the circuit, and can be displayed on a liquid crystal display. The pattern on the mask can be randomly generated, or the mask can be updated through training optimization; the mask obtained through training optimization The version can make the recognition effect of the computational imaging and text positioning recognition module better. When the reticle pattern in the optical module is fixed, without optimizing the reticle, the fixed reticle pattern can still be used to obtain the text through the original data obtained from the sensor, through the computational imaging and text positioning recognition module. location and identification results.

如图2所示，训练优化掩模版图案的方法如下：As shown in Figure 2, the training method for optimizing the reticle pattern is as follows:

1)将成像目标与掩模版交互的过程建模为二维卷积层1) Model the process of the imaging target interacting with the reticle as a 2D convolutional layer

m＝w*om=w*o

o表示可调制幅度掩膜板全透明情况下，待识别目标不经过掩模版时在传感器平面上缩放后的图像；以传感器平面中心点为原点构建坐标系，(x，y)表示传感器平面上像素点的坐标值，o_x，y表示待识别目标不经过掩模版时在传感器平面的(x，y)处的像素值；o represents the image scaled on the sensor plane when the modulated amplitude mask is fully transparent and the target to be recognized does not pass through the mask; the coordinate system is constructed with the center point of the sensor plane as the origin, and (x, y) represents the image on the sensor plane The coordinate value of the pixel point, o _{x, y} represents the pixel value at (x, y) of the sensor plane when the target to be recognized does not pass through the reticle;

k表示掩模版上单元格的行数或列数，i∈[1，k]。k represents the number of rows or columns of cells on the reticle, i∈[1,k].

由于掩模版单元格像素大小和传感器像素大小相同，且d1>>d2(d1大于100倍的d2)，所以w可近似等于掩模版上幅度分布在传感器平面上的投影。Since the reticle cell pixel size is the same as the sensor pixel size, and d1>>d2 (d1 is greater than 100 times d2), w can be approximately equal to the projection of the amplitude distribution on the reticle on the sensor plane.

2)将二维卷积层的参数w作为模型参数与后续计算成像及文字定位识别模块一同训练优化。2) The parameter w of the two-dimensional convolution layer is used as a model parameter to train and optimize together with the subsequent computational imaging and text location recognition modules.

2.1)将二维卷积层进行二值化得到二值神经网络：2.1) Binarize the two-dimensional convolutional layer to obtain a binary neural network:

其中，w^b表示对w进行二值化处理后的结果。Among them, w ^b represents the result of binarizing w.

由于掩模版只有0和1值，我们使用二值神经网络来训练，二值神经网络使用sign函数将w映射到-1或+1，随后加1并除以2。Since the reticle has only 0 and 1 values, we train it using a binary neural network that maps w to -1 or +1 using the sign function, then adds 1 and divides by 2.

2.2)训练过程中，通过电路调整对掩膜板图案做随机初始化，并将该值作为二值神经网络卷积层的初始参数。2.2) During the training process, the mask pattern is randomly initialized by circuit adjustment, and this value is used as the initial parameter of the convolutional layer of the binary neural network.

在系统的前向传播的过程中：固定待测目标物，在真实物理场景中测量经过掩膜版以后传感器上得到的原始数据，并将其作为计算成像及文字定位识别模块的输入，In the process of forward propagation of the system: fix the target to be measured, measure the raw data obtained on the sensor after passing through the mask in the real physical scene, and use it as the input of the computational imaging and text positioning and recognition module,

在经过计算成像及文字定位识别模块后得到与真实标签的误差，After calculating the imaging and text positioning recognition module, the error with the real label is obtained,

在梯度反向传播过程中：计算成像及文字定位识别模块的输出与真实标签的误差，计算梯度并反向传播，将误差反向传播至二值神经网络卷积层，更新二值神经网络卷积层，并根据更新的权重结果去调制可调掩膜版，作为下一轮训练时模型前向传播过程中的掩膜版图案。In the process of gradient backpropagation: calculate the error between the output of the imaging and text location recognition module and the real label, calculate the gradient and backpropagate, backpropagate the error to the convolutional layer of the binary neural network, and update the volume of the binary neural network Layer, and modulate the adjustable mask according to the updated weight results, as the mask pattern in the forward propagation process of the model in the next round of training.

训练完成后，固定掩膜版图案。After training, fix the reticle pattern.

如图3所示，计算成像及文字识别模块包含计算成像模型、文字定位模型和文字识别模型。使用过程中，该模块由三个模型串行，输入数据为经过光学模块后在传感器上得到的原始数据，输出为预测的文字的文本形式。训练过程中，仅计算成像模型是需要更新参数的，文字定位模型和文字识别模型不参与训练。As shown in FIG. 3 , the computational imaging and character recognition module includes a computational imaging model, a character positioning model and a character recognition model. During use, the module is serialized by three models, the input data is the original data obtained on the sensor after passing through the optical module, and the output is the text form of the predicted text. During the training process, only the calculation of the imaging model needs to update the parameters, and the text localization model and the text recognition model do not participate in the training.

计算成像模型是一种基于深度学习的成像方法，能够由传感器上获得的原始数据(模型输入)计算得到预测的重建图像(模型输出)。计算成像模型是一种结构为编码器-解码器的神经网络，具体的可采用U-NET结构的网络。训练过程中，以包含字母和数字的图像作为待识别目标，其经过光学模块后在传感器上获得的原始数据作为模型输入，训练损失函数由两部分组成：1)待识别目标图像(真实图像标签)与模型输出的预测图像之间的误差Loss1；2)将预测的重建图像输入后接的文字定位模型和文字识别模型，得到预测的文本，计算预测文本与待识别目标的文字标签之间的误差Loss2。损失函数Loss＝a×Loss1+b×Loss2。A computational imaging model is a deep learning-based imaging method that computes a predicted reconstructed image (model output) from raw data obtained on the sensor (model input). The computational imaging model is a neural network with an encoder-decoder structure, and specifically a network with a U-NET structure can be used. In the training process, the image containing letters and numbers is used as the target to be recognized, and the raw data obtained on the sensor after passing through the optical module is used as the model input. The training loss function consists of two parts: 1) The target image to be recognized (real image label) ) and the error Loss1 between the predicted image output by the model; 2) input the predicted reconstructed image into the text localization model and the text recognition model followed by, obtain the predicted text, calculate the predicted text and the text label of the target to be recognized. Error Loss2. Loss function Loss=a×Loss1+b×Loss2.

梯度反向传播：该损失函数计算梯度回传更新计算成像模型及前文所述的二维卷积层，最终得到对应训练好的计算成像模型。Gradient back-propagation: The loss function calculates the gradient back to update the computational imaging model and the two-dimensional convolutional layer described above, and finally obtains the corresponding trained computational imaging model.

文字定位模型是一个神经网络模型。输入数据是图像，输出数据是文字所在的位置。该模型可采用任意文字定位模型结构，具体的结构可采用如CTPN：图像经过VGG16网络，在该网络最后的卷积层的每一行都计算3*3滑动窗口，并将结果通过BLSTM结构连接，最后输入一个全连接层，输出为预测的坐标和置信度分数。且该模型是训练好的，在该系统中不参与损失函数的后向传播更新参数。The text localization model is a neural network model. The input data is an image, and the output data is where the text is located. The model can use any text positioning model structure. The specific structure can be used such as CTPN: the image passes through the VGG16 network, and a 3*3 sliding window is calculated in each line of the last convolutional layer of the network, and the results are connected through the BLSTM structure. Finally, a fully connected layer is input, and the output is the predicted coordinates and confidence score. And the model is trained and does not participate in the back propagation of the loss function to update the parameters in the system.

文字识别模型是一个神经网络模型。输入数据是带字母数字的图像，输出是文字识别的结果。该模型可采用任意文字识别模型结构，如CRNN：图像首先经过若干卷积层，提取特征序列，进入BLSTM,最终输出预测字母的分数。且该模型是训练好的，在该系统中不参与损失函数的后向传播更新参数。The text recognition model is a neural network model. The input data is an alphanumeric image and the output is the result of word recognition. The model can use any text recognition model structure, such as CRNN: the image first goes through several convolutional layers, extracts the feature sequence, enters the BLSTM, and finally outputs the score of the predicted letter. And the model is trained, and does not participate in the back propagation of the loss function to update the parameters in the system.

Claims

1. An intelligent lens-free character recognition system, characterized in that it comprises an optical module and a computational imaging and character positioning recognition module, the optical module is mainly composed of a modulated amplitude mask plate and an optical sensor placed in parallel, and the target to be identified is placed in the In front of the optical module, after the light emitted by the target to be identified is scattered by the adjustable amplitude mask, a projected image is projected on the plane of the optical sensor, and the optical sensor transmits the projected image to the computational imaging and text positioning and recognition module;

The pattern on the adjustable amplitude mask is displayed on the liquid crystal display, and the pattern on the mask is randomly generated or determined after training and optimization;

The computational imaging and text recognition module includes a computational imaging model, a text positioning model and a text recognition model, and the three models are connected in series; the input of the computational imaging and text recognition module is the projection image obtained on the sensor after the optical module, and the output is the projection image The textual form of the text on the image.

2. An intelligent lensless character recognition system according to claim 1, wherein the modulated amplitude mask is a binarized mask consisting of k*k cells, each The value of the cell is 1 or 0, 1 means light can pass through, 0 means light can't pass through.

3. An intelligent lensless character recognition system according to claim 1, wherein the projected image outputs a predicted reconstructed image through a computational imaging model; the character localization model processes the input reconstructed image, and outputs a position; after inputting the output result of the text positioning model into the text recognition model, the text recognition result of the image is output;

During the training process of the computational imaging and text recognition module, only the computational imaging model participates in the training, and the text localization model and text recognition model do not participate in the training.

4. a kind of intelligent lensless character recognition system according to claim 3, is characterized in that, the computational imaging model is the neural network of the encoder-decoder system, specifically adopts U-NET; Character location model adopts arbitrary character location model Structure, specifically using CTPN; text recognition model using arbitrary text recognition model structure, specifically using CRNN.

5. A kind of intelligent lensless character recognition system according to claim 1, is characterized in that, the method for determining reticle pattern after optimization by training comprises the following steps:

1) Model the imaging process of the target to be recognized and the optical module as a two-dimensional convolution layer, specifically:

m=w*o

Among them, w represents the amplitude distribution on the reticle, that is, the value distribution of the cells on the reticle; the coordinate system is constructed with the center point of the reticle as the origin, (i, j) is the coordinate of the center point of the cell on the reticle, w _{i, j} represents the value of the cell whose coordinates are (i, j) on the mask;

o represents the image scaled on the sensor plane when the target to be recognized does not pass through the reticle; the coordinate system is constructed with the center point of the sensor plane as the origin, (x, y) represents the coordinate value of the pixel of the projected image on the sensor plane, o _{x, y} represents the pixel value at (x, y) on the sensor plane when the target to be recognized does not pass through the reticle; o _{x+i, y+j} represents the pixel value at (x+i, y+j) on the sensor plane Pixel values;

m represents the image projected on the sensor plane after the target to be recognized passes through the reticle; m _{x, y} represents the pixel value at (x, y) of the sensor plane after the target to be recognized passes through the reticle;

k represents the number of rows or columns of cells on the reticle, i∈[1,k];

2) Binarize the two-dimensional convolutional layer to obtain a two-dimensional convolutional layer of a binary neural network. The results are as follows:

in,

Among them, w ^b represents the result of binarizing w;

3) The parameter w ^b of the two-dimensional convolution layer of the binary neural network is used as a model parameter to be trained and optimized together with the computational imaging and text location recognition modules;

3.1) During the training process, the pattern of the mask is randomly initialized through circuit adjustment, and the result of random initialization is used as the initial parameter of the convolutional layer of the binary neural network;

3.2) Training of the forward propagation process of the system: fix the target to be recognized, measure the projected image of the target to be recognized on the plane of the optical sensor after passing through the mask in the real physical scene, and use it as computational imaging and text positioning recognition module input;

Training of the backpropagation process: Calculate the loss function Loss of the predicted image output by the imaging and text location recognition module and the real image label, backpropagate the loss function Loss to the convolution layer of the binary neural network, and update the convolution of the binary neural network. layer parameter w ^b , and modulates the adjustable mask according to the updated parameter w ^b , and the modulation result is used as the mask pattern in the forward propagation process of the model in the next round of training;

3.3) The mask pattern obtained after training is the optimized result.

6. An intelligent lensless character recognition system according to claim 5, wherein the cell size of the modulated reticle is the same as the pixel size on the sensor plane; The distance d1 between the modulated amplitude masks is much larger than the distance d2 between the modulated amplitude masks and the optical sensor, d1>100d2; therefore, the amplitude distribution on the reticle is approximately equal to the amplitude distribution on the reticle on the sensor plane. projection on.

7. a kind of intelligent lensless character recognition system according to any one of claim 3 and 5, is characterized in that, the loss function Loss of described computational imaging and character location recognition module in training process is:

Loss=a×Loss1+b×Loss2;

Among them, Loss1 is the error between the predicted image output by the computational imaging model and the real image label; Loss2 is the error between the predicted text finally output by the computational imaging and text positioning recognition module and the real text label of the target to be recognized; a and b for weight.