Disclosure of Invention
The embodiment of the invention provides an off-line blind visual assistance method and device, which can reduce the time and energy consumption consumed by image processing. The technical scheme is as follows:
in one aspect, an offline visual assistance method for blind people is provided, which includes:
acquiring an image, wherein the acquired image is an image shot by the blind in daily life;
extracting characteristic points of the image, and splicing the image with incomplete information by using the extracted characteristic points;
and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.
Further, the stitching of the image with incomplete information by using the extracted feature points comprises:
a1, preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;
a2, screening out the same or similar features in the extracted feature points and matching the feature points;
a3, optimizing and purifying the matched characteristic points;
a4, obtaining a transformation matrix according to the matching relation between the matched feature points, and carrying out corresponding transformation on the image by using the transformation matrix;
and A5, splicing the plurality of transformed incomplete images of the information together, and eliminating seams and light differences to obtain an image with complete image information.
Further, the model compression and acceleration algorithm comprises the following steps:
b1, using Oracle pruning algorithm to evaluate the importance degree of the neuron for many times, so that the cost loss of the pruned image description model is minimum;
b2, removing the least important neurons after a plurality of operations of evaluating the importance degree of the neurons;
b3, fine adjustment is carried out on the image description model after pruning;
b4, returning to the step B1 to continue the execution until the pruning is completed.
Further, after the operation of evaluating the importance degree of the neuron, the cost function of the pruned image description model is expressed as:
wherein i is the number of executed evaluations, N represents the total number of executed evaluations of the neuron importance degree, M is the number of feature maps selected after each evaluation, W represents a parameter set of the image description model, W 'represents a parameter set of the image description model after pruning, C (D | W') represents a loss function of the image description model after pruning, C (D | W) represents a loss function of the image description model before pruning, B represents the number of nonzero parameters, and D represents a training set.
Further, the removing the least significant neurons after the operation of evaluating the significance of the neurons for a plurality of times includes:
selecting a feature diagram with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;
calculating the importance degree value of the selected feature graph and sequencing the importance degree value from small to large;
and cutting off the characteristic graph with the top rank.
Further, the fine-tuning the pruned image description model includes:
and retraining the image description model after pruning.
Further, the returning to the step B1 to continue the execution until the pruning is completed includes:
judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the pruned image description model;
if yes, stopping pruning;
otherwise, the procedure returns to step B1 to continue execution until pruning is completed.
Further, the image description model is used for describing the input spliced image in a text mode.
Further, after the spliced image is input into an image description model processed based on a model compression and acceleration algorithm to obtain image description information, the method comprises the following steps:
and broadcasting the image description information in a voice mode.
In one aspect, an offline visual aid for the blind comprises:
the device comprises a shooting module, a display module and a control module, wherein the shooting module is used for acquiring images, and the acquired images are images shot by the blind in daily life;
the portable computing module is used for extracting the feature points of the image, splicing the image with incomplete information by using the extracted feature points, and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;
and the broadcasting module is used for broadcasting the image description information in a voice mode.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
1) the problem that in the prior art, a deep learning network model is realized by means of either equipment with strong computing power or a network capable of transmitting a large amount of data is solved;
2) the time and energy consumption consumed by image processing can be reduced, and the problems of error accumulation and too long time consumed for fine tuning of neurons caused by one feature diagram clipping at a time when a network model is compressed by using a low-rank decomposition-based method are solved;
3) the problems that the blind auxiliary equipment based on the image description model needs to be operated by large-scale computing equipment and cannot be operated off-line and the like are solved;
4) the simple model can work in a low-cost portable mobile processor, enables a person with visual impairment to stably and timely sense the surrounding environment for a long time in a non-visual mode, and improves the life happiness of the person to a certain extent.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides an offline visual assistance method for blind people, including:
s101, acquiring an image, wherein the acquired image is an image shot by the blind in daily life;
the images acquired during the experiment may be existing data sets or images in real scenes. In the practical application process, the acquired image is a real-time image shot by the blind person by using a shooting module in the off-line blind person vision assisting device.
S102, extracting feature points of the image, and stitching the image with incomplete information by using the extracted feature points, which may specifically include the following steps:
a1, feature point extraction: preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;
in this embodiment, the image preprocessing mainly refers to performing geometric distortion correction, noise point suppression, and the like on the image, so that the image to be spliced does not have obvious geometric distortion. If the image preprocessing is not performed, some mismatching is easily caused by image splicing under the condition that the image quality is not ideal. The image preprocessing is mainly used for preparing for the next image registration, so that the image quality can meet the requirement of the image registration.
A2, feature point matching: screening out the same or similar features in the extracted feature points and performing feature point matching, wherein the process requires matching the same feature points as much as possible, as shown in fig. 2;
a3, optimizing and purifying the matched characteristic points;
in this embodiment, the optimization and purification of the matched point pairs mainly includes removing mismatching point pairs to obtain a better Homography Matrix (Homography Matrix) Matrix. In the feature matching process, image ghosting or matching failure caused by mismatching often occurs, and optimization aiming at the process has a very important influence on image splicing.
A4, image transformation to be matched: obtaining a transformation matrix according to the matching relation between the matched characteristic points, and carrying out corresponding transformation on the image by using the transformation matrix;
a5, image stitching: and splicing the plurality of transformed incomplete images together, and eliminating seams and light differences to obtain an image with complete image information.
In this embodiment, when there is a problem of object missing in an image or when a single photo cannot describe information of an entire scene, the image information is considered to be incomplete. For example, many people walk across the zebra crossing in a green light state, and a portion of the photograph shows only that many people are walking, while a portion of the traffic light is not present in the photograph. If the surrounding scene is continuously shot at different angles in the same scene as the supplement of the image information, the current scene can be more completely described. However, there may be a portion where images overlap with each other, and in this portion, images at different angles may be stitched by using a feature point matching algorithm between the images. The images are presented in the form of one picture after being spliced, and the difference is that the spliced images contain more scene information than before.
S103, inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.
In this embodiment, the model compression and acceleration algorithm includes the following steps:
b1, assessing the importance of neurons: the importance degree of the neuron is evaluated for multiple times by using an Oracle pruning algorithm, so that the cost loss of a pruned image description model is minimum;
in this embodiment, the way of evaluating the importance degree of the neuron is to compare the image description models before and after pruning, and when the change condition of the model is minimum, the neuron is unimportant.
In this embodiment, pruning refers to: removing the least important neurons can be understood as compression of the model.
In this embodiment, the cost function calculation formula of the pruned image description model is as follows:
wherein W represents a set of parameters of the image description model,
the image description model parameters of the ith layer are expressed, the total number of the layers is L, and each layer has C
lA parameter; w 'represents a parameter set of the image description model after pruning, and W' belongs to W; c (D | W') represents a loss function of the pruned image description model; c (D | W) represents the loss function of the image description model before pruning; b represents the number of non-zero parameters, D represents a training set (including images used for training the image description model, wherein objects and scenes in the images of the training set are labeled), and deltaC represents the difference between before and after pruning of the image description model.
Because the complexity of the Oracle pruning algorithm is extremely high, the change of the loss function is approximated by Taylor series expansion, and whether the objective function of pruning a certain characteristic diagram is changed into:
|ΔC(hi)|=|C(D|W')-C(D|W)|=|C(D,hi=0)-C(D,hi)|
wherein h isiIn order to cut out a certain feature map, cutting out a certain feature map is to make it be 0.
According to Taylor's formula, C (D, h)i0) to hiUnfolding:
wherein, C (D, h)i) To subtract a certain feature map hiWherein the signature is subtracted and is made 0;
because of the lagrange remainder R1(hi0) is small, neglecting it, the objective function to decide whether to prune a certain feature map becomes:
after a plurality of operations for evaluating the importance degree of the neuron, the cost function of the pruned image description model becomes:
wherein i is the number of evaluation times, N represents the total number of times of evaluation of the neuron importance degree, and M is the number of feature maps selected after each evaluation.
B2, removing the least important neurons: after a number of operations to assess the degree of neuronal importance (Oracle-abs), the least important neurons are removed; the method specifically comprises the following steps:
b21, selecting a feature map with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;
in this embodiment, the evaluation result means: after the importance degree of the neuron is evaluated, the distribution of the importance degree obtained each time, namely, a certain position in several layers is shown in fig. 3; coincidence means that a certain position of a certain layer is subjected to a preset degree of importance in a plurality of evaluation processes.
B22, calculating the importance degree value of the selected feature map and sorting the importance degree value from small to large;
and B23, cutting off the characteristic graph with the top rank.
In this embodiment, for example, the top 2% ranked feature map is clipped.
B3, fine-tuning the image description model: the precision of the pruned image description model is reduced, and the pruned image description model is finely adjusted;
in this embodiment, the fine tuning of the pruned image description model is an operation of retraining the pruned image description model, so as to prevent the accuracy from decreasing too fast.
B4, continuing or stopping pruning: returning to the step B1 to continue executing until pruning is completed, which may specifically include the following steps:
b41, judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the image description model after pruning;
b42, if yes, stopping pruning;
b43, otherwise, returning to the step B1 to continue the execution until the pruning is completed.
In this embodiment, after the image description model is evaluated, pruned, and fine-tuned for multiple times, the accuracy of the obtained image description model will decrease slowly and will fall rapidly after a certain pruning, and step B4 determines whether the image description model is the finally retained image description model after the model compression and acceleration algorithm processing according to the accuracy change degree of the pruned image description model.
In this embodiment, when the same image description model is used, pruning is completed through steps B1-B4 to obtain a compressed image description model, and the image description is obtained by inputting the stitched image into the compressed image description model.
It should be noted that:
in this embodiment, the overall framework of the image description model is an encoding-decoding (Encoder-Decoder) model, in which an Encoder end uses a convolutional layer of VGG16 to extract image features, and a Decoder end uses a long-time memory network (LSTM). It is worth noting that, independent of the image description model, the model compression and acceleration algorithm provided by the embodiment can be applied to other image description models, so as to achieve the effect of reducing the operation time and the operation amount; that is to say, different neural network models can obtain the neural network model after pruning through a model compression and acceleration algorithm, and the lightweight neural network model with small precision change and greatly reduced calculation amount is obtained.
In this embodiment, the image description model is used to describe the input spliced image in a text manner, as shown in fig. 4.
In this embodiment, after inputting the stitched image into an image description model based on model compression and acceleration algorithm processing to obtain image description information, the method includes:
and broadcasting the image description information in a voice mode.
The off-line blind person vision assisting device provided by the invention corresponds to the specific embodiment of the off-line blind person vision assisting method, and the off-line blind person vision assisting device can realize the purpose of the invention by executing the flow steps in the specific embodiment of the method, so the explanation in the specific embodiment of the off-line blind person vision assisting method is also suitable for the specific embodiment of the off-line blind person vision assisting device provided by the invention, and the explanation in the following specific embodiment of the invention will not be repeated.
As shown in fig. 5, an embodiment of the present invention further provides an offline blind person visual aid, including:
the shooting module 11 is used for acquiring images, wherein the acquired images are images shot by the blind in daily life;
the portable computing module 12 is configured to extract feature points of an image, splice the image with incomplete information by using the extracted feature points, and input the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;
broadcast module 13 for broadcast out image description information with the form of pronunciation.
In the embodiment of the present invention, the time and energy consumption of the image understanding method processed by comparing the original image understanding scheme with the model compression and acceleration method in the apparatus shown in fig. 5 are shown in table 1:
TABLE 1 time consumed and energy consumption situation
As shown in table 1, the image understanding method processed by the model compression and acceleration method consumes less time and less energy than the original image understanding method, and the image understanding method after pruning consumes less energy, which provides great convenience for loading the blind vision assistance system in a low-cost portable small mobile device and prolongs the time for the visually impaired to use the system.
In summary, the technical solution provided by the embodiment of the present invention has at least the following beneficial effects:
1) the problem that in the prior art, a deep learning network model is realized by means of either equipment with strong computing power or a network capable of transmitting a large amount of data is solved;
2) the time and energy consumption consumed by image processing can be reduced, and the problems of error accumulation and too long time consumed for fine tuning of neurons caused by one feature diagram clipping at a time when a network model is compressed by using a low-rank decomposition-based method are solved;
3) the problems that the blind auxiliary equipment based on the image description model needs to be operated by large-scale computing equipment and cannot be operated off-line and the like are solved;
4) the simple model can work in a low-cost portable mobile processor, enables a person with visual impairment to stably and timely sense the surrounding environment for a long time in a non-visual mode, and improves the life happiness of the person to a certain extent.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.