CN114049553A

CN114049553A - Offline blind person vision assisting method and device

Info

Publication number: CN114049553A
Application number: CN202111290759.4A
Authority: CN
Inventors: 郭宇; 陈悦; 谢圆琰
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2021-11-02
Filing date: 2021-11-02
Publication date: 2022-02-15

Abstract

The invention provides an offline blind visual aid method and device, belonging to the field of computer vision. The method includes: acquiring an image, wherein the acquired image is an image taken by a blind person in daily life; extracting feature points of the image, and using the extracted feature points to stitch images with incomplete information; inputting the stitched images based on Image description model processed by model compression and acceleration algorithm to obtain image description information. By adopting the present invention, the time and energy consumption for image processing can be reduced.

Description

Offline blind person vision assisting method and device

Technical Field

The invention relates to the field of computer vision, in particular to an off-line blind person vision assisting method and device.

Background

The visually impaired people are huge people which are easy to neglect among the disabled, and the external information cannot be sensed by the disabled through a visual system due to the defects of eyes, so that great inconvenience is brought to daily life and traveling. With the development of deep learning, the development of related research of image and language processing is greatly improved, and a new idea is brought to the problem of providing visual help for the blind. By adopting a computer vision technology, a vision auxiliary system capable of being carried on low-cost portable equipment is designed to become an urgent need for helping the blind.

Due to the continuous development of the neural network model, the problems of huge calculation amount, difficulty in practical application of related technologies and the like are brought. The deepening of the layer number of the deep neural network greatly improves the precision, but the operation amount is greatly increased, and simultaneously, a large amount of redundancy is also accompanied. The realization of the deep learning network model requires the help of equipment with strong computing power or a network capable of transmitting a large amount of data, and brings huge challenges to the realization of small-sized mobile equipment which can be carried about and can help people with visual impairment to provide daily services. For the reasons mentioned above, the current visual assistance devices have the disadvantages of high price, poor interactivity, no off-line capability, etc.

Therefore, the complexity of the model is reduced as much as possible while the accuracy of the model is ensured, and the method becomes a popular research subject. The model compression method proposed in recent years mainly comprises the following steps: model pruning, low rank decomposition, parameter quantification, knowledge distillation, retraining a compact neural network, and the like.

Emily Denton et al propose a method for adjusting network weight based on low rank decomposition [ Denton E, Zaremba W, Bruna J, et al. However, the low-rank decomposition method has a good effect when applied to a full-link layer, and when the compression algorithm of the low-rank decomposition is applied to a convolutional layer, an error accumulation effect occurs, so that the final precision loss image is large, fine adjustment needs to be performed on a network layer by layer, and time and labor are wasted.

The method adopting the model compression provides a feasible technical approach for running a complex deep network model in the portable mobile equipment. Based on model compression, a more efficient compressed image description model is further developed to solve the problems that the low-rank decomposition compression method is large in calculation amount and accumulated errors exist, and facilities and wearable equipment which provide visual assistance for the blind by using an original image description model are expensive, limited in assistance function, poor in interactivity, incapable of being offline, difficult to widely use in daily life scenes and the like.

Disclosure of Invention

The embodiment of the invention provides an off-line blind visual assistance method and device, which can reduce the time and energy consumption consumed by image processing. The technical scheme is as follows:

in one aspect, an offline visual assistance method for blind people is provided, which includes:

acquiring an image, wherein the acquired image is an image shot by the blind in daily life;

extracting characteristic points of the image, and splicing the image with incomplete information by using the extracted characteristic points;

and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.

Further, the stitching of the image with incomplete information by using the extracted feature points comprises:

a1, preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;

a2, screening out the same or similar features in the extracted feature points and matching the feature points;

a3, optimizing and purifying the matched characteristic points;

a4, obtaining a transformation matrix according to the matching relation between the matched feature points, and carrying out corresponding transformation on the image by using the transformation matrix;

and A5, splicing the plurality of transformed incomplete images of the information together, and eliminating seams and light differences to obtain an image with complete image information.

Further, the model compression and acceleration algorithm comprises the following steps:

b1, using Oracle pruning algorithm to evaluate the importance degree of the neuron for many times, so that the cost loss of the pruned image description model is minimum;

b2, removing the least important neurons after a plurality of operations of evaluating the importance degree of the neurons;

b3, fine adjustment is carried out on the image description model after pruning;

b4, returning to the step B1 to continue the execution until the pruning is completed.

Further, after the operation of evaluating the importance degree of the neuron, the cost function of the pruned image description model is expressed as:

wherein i is the number of executed evaluations, N represents the total number of executed evaluations of the neuron importance degree, M is the number of feature maps selected after each evaluation, W represents a parameter set of the image description model, W 'represents a parameter set of the image description model after pruning, C (D | W') represents a loss function of the image description model after pruning, C (D | W) represents a loss function of the image description model before pruning, B represents the number of nonzero parameters, and D represents a training set.

Further, the removing the least significant neurons after the operation of evaluating the significance of the neurons for a plurality of times includes:

selecting a feature diagram with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;

calculating the importance degree value of the selected feature graph and sequencing the importance degree value from small to large;

and cutting off the characteristic graph with the top rank.

Further, the fine-tuning the pruned image description model includes:

and retraining the image description model after pruning.

Further, the returning to the step B1 to continue the execution until the pruning is completed includes:

judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the pruned image description model;

if yes, stopping pruning;

otherwise, the procedure returns to step B1 to continue execution until pruning is completed.

Further, the image description model is used for describing the input spliced image in a text mode.

Further, after the spliced image is input into an image description model processed based on a model compression and acceleration algorithm to obtain image description information, the method comprises the following steps:

and broadcasting the image description information in a voice mode.

In one aspect, an offline visual aid for the blind comprises:

the device comprises a shooting module, a display module and a control module, wherein the shooting module is used for acquiring images, and the acquired images are images shot by the blind in daily life;

the portable computing module is used for extracting the feature points of the image, splicing the image with incomplete information by using the extracted feature points, and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;

and the broadcasting module is used for broadcasting the image description information in a voice mode.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

1) the problem that in the prior art, a deep learning network model is realized by means of either equipment with strong computing power or a network capable of transmitting a large amount of data is solved;

2) the time and energy consumption consumed by image processing can be reduced, and the problems of error accumulation and too long time consumed for fine tuning of neurons caused by one feature diagram clipping at a time when a network model is compressed by using a low-rank decomposition-based method are solved;

3) the problems that the blind auxiliary equipment based on the image description model needs to be operated by large-scale computing equipment and cannot be operated off-line and the like are solved;

4) the simple model can work in a low-cost portable mobile processor, enables a person with visual impairment to stably and timely sense the surrounding environment for a long time in a non-visual mode, and improves the life happiness of the person to a certain extent.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an off-line blind person visual assistance method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of image feature point matching according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating the importance of neurons according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating an image according to an embodiment of the present invention;

fig. 5 is a schematic structural view of an offline vision assisting device for blind people according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention provides an offline visual assistance method for blind people, including:

s101, acquiring an image, wherein the acquired image is an image shot by the blind in daily life;

the images acquired during the experiment may be existing data sets or images in real scenes. In the practical application process, the acquired image is a real-time image shot by the blind person by using a shooting module in the off-line blind person vision assisting device.

S102, extracting feature points of the image, and stitching the image with incomplete information by using the extracted feature points, which may specifically include the following steps:

a1, feature point extraction: preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;

in this embodiment, the image preprocessing mainly refers to performing geometric distortion correction, noise point suppression, and the like on the image, so that the image to be spliced does not have obvious geometric distortion. If the image preprocessing is not performed, some mismatching is easily caused by image splicing under the condition that the image quality is not ideal. The image preprocessing is mainly used for preparing for the next image registration, so that the image quality can meet the requirement of the image registration.

A2, feature point matching: screening out the same or similar features in the extracted feature points and performing feature point matching, wherein the process requires matching the same feature points as much as possible, as shown in fig. 2;

a3, optimizing and purifying the matched characteristic points;

in this embodiment, the optimization and purification of the matched point pairs mainly includes removing mismatching point pairs to obtain a better Homography Matrix (Homography Matrix) Matrix. In the feature matching process, image ghosting or matching failure caused by mismatching often occurs, and optimization aiming at the process has a very important influence on image splicing.

A4, image transformation to be matched: obtaining a transformation matrix according to the matching relation between the matched characteristic points, and carrying out corresponding transformation on the image by using the transformation matrix;

a5, image stitching: and splicing the plurality of transformed incomplete images together, and eliminating seams and light differences to obtain an image with complete image information.

In this embodiment, when there is a problem of object missing in an image or when a single photo cannot describe information of an entire scene, the image information is considered to be incomplete. For example, many people walk across the zebra crossing in a green light state, and a portion of the photograph shows only that many people are walking, while a portion of the traffic light is not present in the photograph. If the surrounding scene is continuously shot at different angles in the same scene as the supplement of the image information, the current scene can be more completely described. However, there may be a portion where images overlap with each other, and in this portion, images at different angles may be stitched by using a feature point matching algorithm between the images. The images are presented in the form of one picture after being spliced, and the difference is that the spliced images contain more scene information than before.

S103, inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.

In this embodiment, the model compression and acceleration algorithm includes the following steps:

b1, assessing the importance of neurons: the importance degree of the neuron is evaluated for multiple times by using an Oracle pruning algorithm, so that the cost loss of a pruned image description model is minimum;

in this embodiment, the way of evaluating the importance degree of the neuron is to compare the image description models before and after pruning, and when the change condition of the model is minimum, the neuron is unimportant.

In this embodiment, pruning refers to: removing the least important neurons can be understood as compression of the model.

In this embodiment, the cost function calculation formula of the pruned image description model is as follows:

wherein W represents a set of parameters of the image description model,

the image description model parameters of the ith layer are expressed, the total number of the layers is L, and each layer has C_lA parameter; w 'represents a parameter set of the image description model after pruning, and W' belongs to W; c (D | W') represents a loss function of the pruned image description model; c (D | W) represents the loss function of the image description model before pruning; b represents the number of non-zero parameters, D represents a training set (including images used for training the image description model, wherein objects and scenes in the images of the training set are labeled), and deltaC represents the difference between before and after pruning of the image description model.

Because the complexity of the Oracle pruning algorithm is extremely high, the change of the loss function is approximated by Taylor series expansion, and whether the objective function of pruning a certain characteristic diagram is changed into:

|ΔC(h_i)|＝|C(D|W')-C(D|W)|＝|C(D,h_i＝0)-C(D,h_i)|

wherein h is_iIn order to cut out a certain feature map, cutting out a certain feature map is to make it be 0.

According to Taylor's formula, C (D, h)_i0) to h_iUnfolding:

wherein, C (D, h)_i) To subtract a certain feature map h_iWherein the signature is subtracted and is made 0;

because of the lagrange remainder R₁(h_i0) is small, neglecting it, the objective function to decide whether to prune a certain feature map becomes:

after a plurality of operations for evaluating the importance degree of the neuron, the cost function of the pruned image description model becomes:

wherein i is the number of evaluation times, N represents the total number of times of evaluation of the neuron importance degree, and M is the number of feature maps selected after each evaluation.

B2, removing the least important neurons: after a number of operations to assess the degree of neuronal importance (Oracle-abs), the least important neurons are removed; the method specifically comprises the following steps:

b21, selecting a feature map with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;

in this embodiment, the evaluation result means: after the importance degree of the neuron is evaluated, the distribution of the importance degree obtained each time, namely, a certain position in several layers is shown in fig. 3; coincidence means that a certain position of a certain layer is subjected to a preset degree of importance in a plurality of evaluation processes.

B22, calculating the importance degree value of the selected feature map and sorting the importance degree value from small to large;

and B23, cutting off the characteristic graph with the top rank.

In this embodiment, for example, the top 2% ranked feature map is clipped.

B3, fine-tuning the image description model: the precision of the pruned image description model is reduced, and the pruned image description model is finely adjusted;

in this embodiment, the fine tuning of the pruned image description model is an operation of retraining the pruned image description model, so as to prevent the accuracy from decreasing too fast.

B4, continuing or stopping pruning: returning to the step B1 to continue executing until pruning is completed, which may specifically include the following steps:

b41, judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the image description model after pruning;

b42, if yes, stopping pruning;

b43, otherwise, returning to the step B1 to continue the execution until the pruning is completed.

In this embodiment, after the image description model is evaluated, pruned, and fine-tuned for multiple times, the accuracy of the obtained image description model will decrease slowly and will fall rapidly after a certain pruning, and step B4 determines whether the image description model is the finally retained image description model after the model compression and acceleration algorithm processing according to the accuracy change degree of the pruned image description model.

In this embodiment, when the same image description model is used, pruning is completed through steps B1-B4 to obtain a compressed image description model, and the image description is obtained by inputting the stitched image into the compressed image description model.

It should be noted that:

in this embodiment, the overall framework of the image description model is an encoding-decoding (Encoder-Decoder) model, in which an Encoder end uses a convolutional layer of VGG16 to extract image features, and a Decoder end uses a long-time memory network (LSTM). It is worth noting that, independent of the image description model, the model compression and acceleration algorithm provided by the embodiment can be applied to other image description models, so as to achieve the effect of reducing the operation time and the operation amount; that is to say, different neural network models can obtain the neural network model after pruning through a model compression and acceleration algorithm, and the lightweight neural network model with small precision change and greatly reduced calculation amount is obtained.

In this embodiment, the image description model is used to describe the input spliced image in a text manner, as shown in fig. 4.

In this embodiment, after inputting the stitched image into an image description model based on model compression and acceleration algorithm processing to obtain image description information, the method includes:

and broadcasting the image description information in a voice mode.

The off-line blind person vision assisting device provided by the invention corresponds to the specific embodiment of the off-line blind person vision assisting method, and the off-line blind person vision assisting device can realize the purpose of the invention by executing the flow steps in the specific embodiment of the method, so the explanation in the specific embodiment of the off-line blind person vision assisting method is also suitable for the specific embodiment of the off-line blind person vision assisting device provided by the invention, and the explanation in the following specific embodiment of the invention will not be repeated.

As shown in fig. 5, an embodiment of the present invention further provides an offline blind person visual aid, including:

the shooting module 11 is used for acquiring images, wherein the acquired images are images shot by the blind in daily life;

the portable computing module 12 is configured to extract feature points of an image, splice the image with incomplete information by using the extracted feature points, and input the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;

broadcast module 13 for broadcast out image description information with the form of pronunciation.

In the embodiment of the present invention, the time and energy consumption of the image understanding method processed by comparing the original image understanding scheme with the model compression and acceleration method in the apparatus shown in fig. 5 are shown in table 1:

TABLE 1 time consumed and energy consumption situation

As shown in table 1, the image understanding method processed by the model compression and acceleration method consumes less time and less energy than the original image understanding method, and the image understanding method after pruning consumes less energy, which provides great convenience for loading the blind vision assistance system in a low-cost portable small mobile device and prolongs the time for the visually impaired to use the system.

In summary, the technical solution provided by the embodiment of the present invention has at least the following beneficial effects:

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. an offline blind visual aid method, is characterized in that, comprises:

acquiring an image, wherein the acquired image is an image taken by a blind person in daily life;

Extract the feature points of the image, and use the extracted feature points to stitch the images with incomplete information;

The stitched image is input into the image description model based on model compression and acceleration algorithm processing, and the image description information is obtained.

2. offline blind person visual aid method according to claim 1 is characterized in that, the feature point of described extraction image, utilizes the feature point of extraction to splicing the image with incomplete information and comprises:

A1, preprocess multiple images with incomplete information and extract feature points, wherein, feature point extraction is to extract features for comparison in the images;

A2, filter out the same or similar features in the extracted feature points and perform feature point matching;

A3, optimize and purify the matched feature points;

A4, according to the matching relationship between the matched feature points, obtain a transformation matrix, and use the transformation matrix to transform the image accordingly;

A5, stitch together multiple transformed images with incomplete information, and eliminate seams and light differences to obtain an image with complete image information.

3. offline blind visual aid method according to claim 1, is characterized in that, described model compression and acceleration algorithm comprise the following steps:

B1, use the Oracle pruning algorithm to evaluate the importance of neurons multiple times to minimize the cost of the pruned image description model;

B2, remove the least important neurons after evaluating the importance of neurons multiple times;

B3, fine-tune the pruned image description model;

B4, return to step B1 and continue to execute until the pruning is completed.

4. offline blind visual aid method according to claim 3, is characterized in that, after repeatedly evaluating the operation of neuron importance, the cost function of the pruned image description model is expressed as:

Among them, i is the number of evaluations performed, N represents the total number of times to evaluate the importance of neurons, M is the number of feature maps selected after each evaluation, W represents the parameter set of the image description model, and W' represents the pruning The set of image description model parameters, C(D|W') represents the loss function of the image description model after pruning, C(D|W) represents the loss function of the image description model before pruning, and B represents the non-zero parameter number, D represents the training set.

5. The offline blind visual aid method according to claim 3, characterized in that, after the operation of repeatedly evaluating the importance of neurons, removing the least important neuron comprises:

Select the feature map that overlaps the evaluation results generated by the operation of evaluating the importance of neurons for multiple times;

Calculate the importance value of the selected feature maps and sort them from small to large;

Cut out the top-ranked feature maps.

6. The offline blind visual aid method according to claim 3, wherein the fine-tuning of the pruned image description model comprises:

Retrain the pruned image captioning model.

7. The offline visual aid method for the blind according to claim 3, wherein the returning step B1 continues to execute until the pruning is completed, comprising:

Judging by the degree of change in the accuracy of the image description model after pruning, whether this is the image description model processed by the model compression and acceleration algorithm that is finally retained;

If so, stop pruning;

Otherwise, return to step B1 and continue to execute until the pruning is completed.

8 . The offline visual aid method for the blind according to claim 1 , wherein the image description model is used to describe the input spliced image in text. 9 .

9. The offline visual aid method for the blind according to claim 1, wherein the image description model based on model compression and acceleration algorithm processing is input to the image after splicing, and after obtaining the image description information, the method comprises:

The image description information is broadcast in the form of voice.

10. An offline blind visual aid device, characterized in that, comprising:

a shooting module for acquiring images, wherein the acquired images are images taken by blind people in daily life;

The portable computing module is used to extract the feature points of the image, use the extracted feature points to stitch the images with incomplete information, and input the stitched images into the image description model based on model compression and acceleration algorithm processing to obtain the image description information;

The broadcast module is used to broadcast the image description information in the form of voice.