CN114049553A - Offline blind person vision assisting method and device - Google Patents

Offline blind person vision assisting method and device Download PDF

Info

Publication number
CN114049553A
CN114049553A CN202111290759.4A CN202111290759A CN114049553A CN 114049553 A CN114049553 A CN 114049553A CN 202111290759 A CN202111290759 A CN 202111290759A CN 114049553 A CN114049553 A CN 114049553A
Authority
CN
China
Prior art keywords
image
image description
model
blind
offline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111290759.4A
Other languages
Chinese (zh)
Inventor
郭宇
陈悦
谢圆琰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202111290759.4A priority Critical patent/CN114049553A/en
Publication of CN114049553A publication Critical patent/CN114049553A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61FFILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
    • A61F9/00Methods or devices for treatment of the eyes; Devices for putting in contact-lenses; Devices to correct squinting; Apparatus to guide the blind; Protective devices for the eyes, carried on the body or in the hand
    • A61F9/08Devices or methods enabling eye-patients to replace direct visual perception by another kind of perception
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Vascular Medicine (AREA)
  • Animal Behavior & Ethology (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本发明提供一种离线盲人视觉辅助方法及装置,属于计算机视觉领域。所述方法包括:获取图像,其中,获取的图像为盲人在日常生活中拍摄的图像;提取图像的特征点,利用提取的特征点对信息不完整的图像进行拼接;将拼接后的图像输入基于模型压缩与加速算法处理的图像描述模型,得到图像描述信息。采用本发明,能够降低图像处理所消耗的时间和能耗。

Figure 202111290759

The invention provides an offline blind visual aid method and device, belonging to the field of computer vision. The method includes: acquiring an image, wherein the acquired image is an image taken by a blind person in daily life; extracting feature points of the image, and using the extracted feature points to stitch images with incomplete information; inputting the stitched images based on Image description model processed by model compression and acceleration algorithm to obtain image description information. By adopting the present invention, the time and energy consumption for image processing can be reduced.

Figure 202111290759

Description

Offline blind person vision assisting method and device
Technical Field
The invention relates to the field of computer vision, in particular to an off-line blind person vision assisting method and device.
Background
The visually impaired people are huge people which are easy to neglect among the disabled, and the external information cannot be sensed by the disabled through a visual system due to the defects of eyes, so that great inconvenience is brought to daily life and traveling. With the development of deep learning, the development of related research of image and language processing is greatly improved, and a new idea is brought to the problem of providing visual help for the blind. By adopting a computer vision technology, a vision auxiliary system capable of being carried on low-cost portable equipment is designed to become an urgent need for helping the blind.
Due to the continuous development of the neural network model, the problems of huge calculation amount, difficulty in practical application of related technologies and the like are brought. The deepening of the layer number of the deep neural network greatly improves the precision, but the operation amount is greatly increased, and simultaneously, a large amount of redundancy is also accompanied. The realization of the deep learning network model requires the help of equipment with strong computing power or a network capable of transmitting a large amount of data, and brings huge challenges to the realization of small-sized mobile equipment which can be carried about and can help people with visual impairment to provide daily services. For the reasons mentioned above, the current visual assistance devices have the disadvantages of high price, poor interactivity, no off-line capability, etc.
Therefore, the complexity of the model is reduced as much as possible while the accuracy of the model is ensured, and the method becomes a popular research subject. The model compression method proposed in recent years mainly comprises the following steps: model pruning, low rank decomposition, parameter quantification, knowledge distillation, retraining a compact neural network, and the like.
Emily Denton et al propose a method for adjusting network weight based on low rank decomposition [ Denton E, Zaremba W, Bruna J, et al. However, the low-rank decomposition method has a good effect when applied to a full-link layer, and when the compression algorithm of the low-rank decomposition is applied to a convolutional layer, an error accumulation effect occurs, so that the final precision loss image is large, fine adjustment needs to be performed on a network layer by layer, and time and labor are wasted.
The method adopting the model compression provides a feasible technical approach for running a complex deep network model in the portable mobile equipment. Based on model compression, a more efficient compressed image description model is further developed to solve the problems that the low-rank decomposition compression method is large in calculation amount and accumulated errors exist, and facilities and wearable equipment which provide visual assistance for the blind by using an original image description model are expensive, limited in assistance function, poor in interactivity, incapable of being offline, difficult to widely use in daily life scenes and the like.
Disclosure of Invention
The embodiment of the invention provides an off-line blind visual assistance method and device, which can reduce the time and energy consumption consumed by image processing. The technical scheme is as follows:
in one aspect, an offline visual assistance method for blind people is provided, which includes:
acquiring an image, wherein the acquired image is an image shot by the blind in daily life;
extracting characteristic points of the image, and splicing the image with incomplete information by using the extracted characteristic points;
and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.
Further, the stitching of the image with incomplete information by using the extracted feature points comprises:
a1, preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;
a2, screening out the same or similar features in the extracted feature points and matching the feature points;
a3, optimizing and purifying the matched characteristic points;
a4, obtaining a transformation matrix according to the matching relation between the matched feature points, and carrying out corresponding transformation on the image by using the transformation matrix;
and A5, splicing the plurality of transformed incomplete images of the information together, and eliminating seams and light differences to obtain an image with complete image information.
Further, the model compression and acceleration algorithm comprises the following steps:
b1, using Oracle pruning algorithm to evaluate the importance degree of the neuron for many times, so that the cost loss of the pruned image description model is minimum;
b2, removing the least important neurons after a plurality of operations of evaluating the importance degree of the neurons;
b3, fine adjustment is carried out on the image description model after pruning;
b4, returning to the step B1 to continue the execution until the pruning is completed.
Further, after the operation of evaluating the importance degree of the neuron, the cost function of the pruned image description model is expressed as:
Figure BDA0003334628720000031
wherein i is the number of executed evaluations, N represents the total number of executed evaluations of the neuron importance degree, M is the number of feature maps selected after each evaluation, W represents a parameter set of the image description model, W 'represents a parameter set of the image description model after pruning, C (D | W') represents a loss function of the image description model after pruning, C (D | W) represents a loss function of the image description model before pruning, B represents the number of nonzero parameters, and D represents a training set.
Further, the removing the least significant neurons after the operation of evaluating the significance of the neurons for a plurality of times includes:
selecting a feature diagram with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;
calculating the importance degree value of the selected feature graph and sequencing the importance degree value from small to large;
and cutting off the characteristic graph with the top rank.
Further, the fine-tuning the pruned image description model includes:
and retraining the image description model after pruning.
Further, the returning to the step B1 to continue the execution until the pruning is completed includes:
judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the pruned image description model;
if yes, stopping pruning;
otherwise, the procedure returns to step B1 to continue execution until pruning is completed.
Further, the image description model is used for describing the input spliced image in a text mode.
Further, after the spliced image is input into an image description model processed based on a model compression and acceleration algorithm to obtain image description information, the method comprises the following steps:
and broadcasting the image description information in a voice mode.
In one aspect, an offline visual aid for the blind comprises:
the device comprises a shooting module, a display module and a control module, wherein the shooting module is used for acquiring images, and the acquired images are images shot by the blind in daily life;
the portable computing module is used for extracting the feature points of the image, splicing the image with incomplete information by using the extracted feature points, and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;
and the broadcasting module is used for broadcasting the image description information in a voice mode.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
1) the problem that in the prior art, a deep learning network model is realized by means of either equipment with strong computing power or a network capable of transmitting a large amount of data is solved;
2) the time and energy consumption consumed by image processing can be reduced, and the problems of error accumulation and too long time consumed for fine tuning of neurons caused by one feature diagram clipping at a time when a network model is compressed by using a low-rank decomposition-based method are solved;
3) the problems that the blind auxiliary equipment based on the image description model needs to be operated by large-scale computing equipment and cannot be operated off-line and the like are solved;
4) the simple model can work in a low-cost portable mobile processor, enables a person with visual impairment to stably and timely sense the surrounding environment for a long time in a non-visual mode, and improves the life happiness of the person to a certain extent.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an off-line blind person visual assistance method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of image feature point matching according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the importance of neurons according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an image according to an embodiment of the present invention;
fig. 5 is a schematic structural view of an offline vision assisting device for blind people according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides an offline visual assistance method for blind people, including:
s101, acquiring an image, wherein the acquired image is an image shot by the blind in daily life;
the images acquired during the experiment may be existing data sets or images in real scenes. In the practical application process, the acquired image is a real-time image shot by the blind person by using a shooting module in the off-line blind person vision assisting device.
S102, extracting feature points of the image, and stitching the image with incomplete information by using the extracted feature points, which may specifically include the following steps:
a1, feature point extraction: preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;
in this embodiment, the image preprocessing mainly refers to performing geometric distortion correction, noise point suppression, and the like on the image, so that the image to be spliced does not have obvious geometric distortion. If the image preprocessing is not performed, some mismatching is easily caused by image splicing under the condition that the image quality is not ideal. The image preprocessing is mainly used for preparing for the next image registration, so that the image quality can meet the requirement of the image registration.
A2, feature point matching: screening out the same or similar features in the extracted feature points and performing feature point matching, wherein the process requires matching the same feature points as much as possible, as shown in fig. 2;
a3, optimizing and purifying the matched characteristic points;
in this embodiment, the optimization and purification of the matched point pairs mainly includes removing mismatching point pairs to obtain a better Homography Matrix (Homography Matrix) Matrix. In the feature matching process, image ghosting or matching failure caused by mismatching often occurs, and optimization aiming at the process has a very important influence on image splicing.
A4, image transformation to be matched: obtaining a transformation matrix according to the matching relation between the matched characteristic points, and carrying out corresponding transformation on the image by using the transformation matrix;
a5, image stitching: and splicing the plurality of transformed incomplete images together, and eliminating seams and light differences to obtain an image with complete image information.
In this embodiment, when there is a problem of object missing in an image or when a single photo cannot describe information of an entire scene, the image information is considered to be incomplete. For example, many people walk across the zebra crossing in a green light state, and a portion of the photograph shows only that many people are walking, while a portion of the traffic light is not present in the photograph. If the surrounding scene is continuously shot at different angles in the same scene as the supplement of the image information, the current scene can be more completely described. However, there may be a portion where images overlap with each other, and in this portion, images at different angles may be stitched by using a feature point matching algorithm between the images. The images are presented in the form of one picture after being spliced, and the difference is that the spliced images contain more scene information than before.
S103, inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.
In this embodiment, the model compression and acceleration algorithm includes the following steps:
b1, assessing the importance of neurons: the importance degree of the neuron is evaluated for multiple times by using an Oracle pruning algorithm, so that the cost loss of a pruned image description model is minimum;
in this embodiment, the way of evaluating the importance degree of the neuron is to compare the image description models before and after pruning, and when the change condition of the model is minimum, the neuron is unimportant.
In this embodiment, pruning refers to: removing the least important neurons can be understood as compression of the model.
In this embodiment, the cost function calculation formula of the pruned image description model is as follows:
Figure BDA0003334628720000061
wherein W represents a set of parameters of the image description model,
Figure BDA0003334628720000062
Figure BDA0003334628720000063
the image description model parameters of the ith layer are expressed, the total number of the layers is L, and each layer has ClA parameter; w 'represents a parameter set of the image description model after pruning, and W' belongs to W; c (D | W') represents a loss function of the pruned image description model; c (D | W) represents the loss function of the image description model before pruning; b represents the number of non-zero parameters, D represents a training set (including images used for training the image description model, wherein objects and scenes in the images of the training set are labeled), and deltaC represents the difference between before and after pruning of the image description model.
Because the complexity of the Oracle pruning algorithm is extremely high, the change of the loss function is approximated by Taylor series expansion, and whether the objective function of pruning a certain characteristic diagram is changed into:
|ΔC(hi)|=|C(D|W')-C(D|W)|=|C(D,hi=0)-C(D,hi)|
wherein h isiIn order to cut out a certain feature map, cutting out a certain feature map is to make it be 0.
According to Taylor's formula, C (D, h)i0) to hiUnfolding:
Figure BDA0003334628720000064
wherein, C (D, h)i) To subtract a certain feature map hiWherein the signature is subtracted and is made 0;
because of the lagrange remainder R1(hi0) is small, neglecting it, the objective function to decide whether to prune a certain feature map becomes:
Figure BDA0003334628720000065
after a plurality of operations for evaluating the importance degree of the neuron, the cost function of the pruned image description model becomes:
Figure BDA0003334628720000066
wherein i is the number of evaluation times, N represents the total number of times of evaluation of the neuron importance degree, and M is the number of feature maps selected after each evaluation.
B2, removing the least important neurons: after a number of operations to assess the degree of neuronal importance (Oracle-abs), the least important neurons are removed; the method specifically comprises the following steps:
b21, selecting a feature map with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;
in this embodiment, the evaluation result means: after the importance degree of the neuron is evaluated, the distribution of the importance degree obtained each time, namely, a certain position in several layers is shown in fig. 3; coincidence means that a certain position of a certain layer is subjected to a preset degree of importance in a plurality of evaluation processes.
B22, calculating the importance degree value of the selected feature map and sorting the importance degree value from small to large;
and B23, cutting off the characteristic graph with the top rank.
In this embodiment, for example, the top 2% ranked feature map is clipped.
B3, fine-tuning the image description model: the precision of the pruned image description model is reduced, and the pruned image description model is finely adjusted;
in this embodiment, the fine tuning of the pruned image description model is an operation of retraining the pruned image description model, so as to prevent the accuracy from decreasing too fast.
B4, continuing or stopping pruning: returning to the step B1 to continue executing until pruning is completed, which may specifically include the following steps:
b41, judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the image description model after pruning;
b42, if yes, stopping pruning;
b43, otherwise, returning to the step B1 to continue the execution until the pruning is completed.
In this embodiment, after the image description model is evaluated, pruned, and fine-tuned for multiple times, the accuracy of the obtained image description model will decrease slowly and will fall rapidly after a certain pruning, and step B4 determines whether the image description model is the finally retained image description model after the model compression and acceleration algorithm processing according to the accuracy change degree of the pruned image description model.
In this embodiment, when the same image description model is used, pruning is completed through steps B1-B4 to obtain a compressed image description model, and the image description is obtained by inputting the stitched image into the compressed image description model.
It should be noted that:
in this embodiment, the overall framework of the image description model is an encoding-decoding (Encoder-Decoder) model, in which an Encoder end uses a convolutional layer of VGG16 to extract image features, and a Decoder end uses a long-time memory network (LSTM). It is worth noting that, independent of the image description model, the model compression and acceleration algorithm provided by the embodiment can be applied to other image description models, so as to achieve the effect of reducing the operation time and the operation amount; that is to say, different neural network models can obtain the neural network model after pruning through a model compression and acceleration algorithm, and the lightweight neural network model with small precision change and greatly reduced calculation amount is obtained.
In this embodiment, the image description model is used to describe the input spliced image in a text manner, as shown in fig. 4.
In this embodiment, after inputting the stitched image into an image description model based on model compression and acceleration algorithm processing to obtain image description information, the method includes:
and broadcasting the image description information in a voice mode.
The off-line blind person vision assisting device provided by the invention corresponds to the specific embodiment of the off-line blind person vision assisting method, and the off-line blind person vision assisting device can realize the purpose of the invention by executing the flow steps in the specific embodiment of the method, so the explanation in the specific embodiment of the off-line blind person vision assisting method is also suitable for the specific embodiment of the off-line blind person vision assisting device provided by the invention, and the explanation in the following specific embodiment of the invention will not be repeated.
As shown in fig. 5, an embodiment of the present invention further provides an offline blind person visual aid, including:
the shooting module 11 is used for acquiring images, wherein the acquired images are images shot by the blind in daily life;
the portable computing module 12 is configured to extract feature points of an image, splice the image with incomplete information by using the extracted feature points, and input the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;
broadcast module 13 for broadcast out image description information with the form of pronunciation.
In the embodiment of the present invention, the time and energy consumption of the image understanding method processed by comparing the original image understanding scheme with the model compression and acceleration method in the apparatus shown in fig. 5 are shown in table 1:
TABLE 1 time consumed and energy consumption situation
Figure BDA0003334628720000091
As shown in table 1, the image understanding method processed by the model compression and acceleration method consumes less time and less energy than the original image understanding method, and the image understanding method after pruning consumes less energy, which provides great convenience for loading the blind vision assistance system in a low-cost portable small mobile device and prolongs the time for the visually impaired to use the system.
In summary, the technical solution provided by the embodiment of the present invention has at least the following beneficial effects:
1) the problem that in the prior art, a deep learning network model is realized by means of either equipment with strong computing power or a network capable of transmitting a large amount of data is solved;
2) the time and energy consumption consumed by image processing can be reduced, and the problems of error accumulation and too long time consumed for fine tuning of neurons caused by one feature diagram clipping at a time when a network model is compressed by using a low-rank decomposition-based method are solved;
3) the problems that the blind auxiliary equipment based on the image description model needs to be operated by large-scale computing equipment and cannot be operated off-line and the like are solved;
4) the simple model can work in a low-cost portable mobile processor, enables a person with visual impairment to stably and timely sense the surrounding environment for a long time in a non-visual mode, and improves the life happiness of the person to a certain extent.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1.一种离线盲人视觉辅助方法,其特征在于,包括:1. an offline blind visual aid method, is characterized in that, comprises: 获取图像,其中,获取的图像为盲人在日常生活中拍摄的图像;acquiring an image, wherein the acquired image is an image taken by a blind person in daily life; 提取图像的特征点,利用提取的特征点对信息不完整的图像进行拼接;Extract the feature points of the image, and use the extracted feature points to stitch the images with incomplete information; 将拼接后的图像输入基于模型压缩与加速算法处理的图像描述模型,得到图像描述信息。The stitched image is input into the image description model based on model compression and acceleration algorithm processing, and the image description information is obtained. 2.根据权利要求1所述的离线盲人视觉辅助方法,其特征在于,所述提取图像的特征点,利用提取的特征点对信息不完整的图像进行拼接包括:2. offline blind person visual aid method according to claim 1 is characterized in that, the feature point of described extraction image, utilizes the feature point of extraction to splicing the image with incomplete information and comprises: A1,对多张信息不完整的图像进行预处理并提取特征点,其中,特征点提取是在图像中提取出可供比较的特征;A1, preprocess multiple images with incomplete information and extract feature points, wherein, feature point extraction is to extract features for comparison in the images; A2,筛选出已提取的特征点中的相同或相似特征并进行特征点匹配;A2, filter out the same or similar features in the extracted feature points and perform feature point matching; A3,优化提纯已匹配的特征点;A3, optimize and purify the matched feature points; A4,根据匹配的特征点间的匹配关系,得到变换矩阵,并利用变换矩阵将图像进行相应的变换;A4, according to the matching relationship between the matched feature points, obtain a transformation matrix, and use the transformation matrix to transform the image accordingly; A5,将变换后的多张信息不完整的图像拼接在一起,并消除接缝和光线差异,得到具有完整图像信息的图像。A5, stitch together multiple transformed images with incomplete information, and eliminate seams and light differences to obtain an image with complete image information. 3.根据权利要求1所述的离线盲人视觉辅助方法,其特征在于,所述模型压缩与加速算法包括以下步骤:3. offline blind visual aid method according to claim 1, is characterized in that, described model compression and acceleration algorithm comprise the following steps: B1,使用Oracle剪枝算法多次评估神经元的重要程度,使被剪枝的图像描述模型代价损失最小;B1, use the Oracle pruning algorithm to evaluate the importance of neurons multiple times to minimize the cost of the pruned image description model; B2,在多次评估神经元重要程度这一操作后,移除最不重要的神经元;B2, remove the least important neurons after evaluating the importance of neurons multiple times; B3,对剪枝后的图像描述模型进行微调;B3, fine-tune the pruned image description model; B4,返回步骤B1继续执行,直至剪枝完成。B4, return to step B1 and continue to execute until the pruning is completed. 4.根据权利要求3所述的离线盲人视觉辅助方法,其特征在于,多次评估神经元重要程度这一操作后,被剪枝的图像描述模型的代价函数表示为:4. offline blind visual aid method according to claim 3, is characterized in that, after repeatedly evaluating the operation of neuron importance, the cost function of the pruned image description model is expressed as:
Figure FDA0003334628710000011
Figure FDA0003334628710000011
其中,i为执行的评估次数,N表示执行评估神经元重要程度的总次数,M为每次评估后选取的特征图的个数,W表示图像描述模型的参数集合,W'表示剪枝后的图像描述模型参数集合,C(D|W')表示剪枝后的图像描述模型的损失函数,C(D|W)表示剪枝前的图像描述模型的损失函数,B表示非零参数的个数,D表示训练集。Among them, i is the number of evaluations performed, N represents the total number of times to evaluate the importance of neurons, M is the number of feature maps selected after each evaluation, W represents the parameter set of the image description model, and W' represents the pruning The set of image description model parameters, C(D|W') represents the loss function of the image description model after pruning, C(D|W) represents the loss function of the image description model before pruning, and B represents the non-zero parameter number, D represents the training set.
5.根据权利要求3所述的离线盲人视觉辅助方法,其特征在于,所述在多次评估神经元重要程度这一操作后,移除最不重要的神经元包括:5. The offline blind visual aid method according to claim 3, characterized in that, after the operation of repeatedly evaluating the importance of neurons, removing the least important neuron comprises: 选取多次评估神经元重要程度这一操作后产生的评估结果重合的特征图;Select the feature map that overlaps the evaluation results generated by the operation of evaluating the importance of neurons for multiple times; 计算所选取特征图的重要程度值并按从小到大排序;Calculate the importance value of the selected feature maps and sort them from small to large; 剪掉排名靠前的特征图。Cut out the top-ranked feature maps. 6.根据权利要求3所述的离线盲人视觉辅助方法,其特征在于,所述对剪枝后的图像描述模型进行微调包括:6. The offline blind visual aid method according to claim 3, wherein the fine-tuning of the pruned image description model comprises: 对剪枝后的图像描述模型重新训练。Retrain the pruned image captioning model. 7.根据权利要求3所述的离线盲人视觉辅助方法,其特征在于,所述返回步骤B1继续执行,直至剪枝完成包括:7. The offline visual aid method for the blind according to claim 3, wherein the returning step B1 continues to execute until the pruning is completed, comprising: 通过剪枝后的图像描述模型精度变化程度判断这是否为最终保留的经过模型压缩与加速算法处理后的图像描述模型;Judging by the degree of change in the accuracy of the image description model after pruning, whether this is the image description model processed by the model compression and acceleration algorithm that is finally retained; 若是,则停止剪枝;If so, stop pruning; 否则,则返回步骤B1继续执行,直至剪枝完成。Otherwise, return to step B1 and continue to execute until the pruning is completed. 8.根据权利要求1所述的离线盲人视觉辅助方法,其特征在于,所述图像描述模型,用于将输入的拼接后的图像以文字的方式描述出来。8 . The offline visual aid method for the blind according to claim 1 , wherein the image description model is used to describe the input spliced image in text. 9 . 9.根据权利要求1所述的离线盲人视觉辅助方法,其特征在于,在将拼接后的图像输入基于模型压缩与加速算法处理的图像描述模型,得到图像描述信息之后,所述方法包括:9. The offline visual aid method for the blind according to claim 1, wherein the image description model based on model compression and acceleration algorithm processing is input to the image after splicing, and after obtaining the image description information, the method comprises: 将图像描述信息以语音的形式播报出来。The image description information is broadcast in the form of voice. 10.一种离线盲人视觉辅助装置,其特征在于,包括:10. An offline blind visual aid device, characterized in that, comprising: 拍摄模块,用于获取图像,其中,获取的图像为盲人在日常生活中拍摄的图像;a shooting module for acquiring images, wherein the acquired images are images taken by blind people in daily life; 便携式计算模块,用于提取图像的特征点,利用提取的特征点对信息不完整的图像进行拼接,将拼接后的图像输入基于模型压缩与加速算法处理的图像描述模型,得到图像描述信息;The portable computing module is used to extract the feature points of the image, use the extracted feature points to stitch the images with incomplete information, and input the stitched images into the image description model based on model compression and acceleration algorithm processing to obtain the image description information; 播报模块,用于将图像描述信息以语音的形式播报出来。The broadcast module is used to broadcast the image description information in the form of voice.
CN202111290759.4A 2021-11-02 2021-11-02 Offline blind person vision assisting method and device Pending CN114049553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111290759.4A CN114049553A (en) 2021-11-02 2021-11-02 Offline blind person vision assisting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111290759.4A CN114049553A (en) 2021-11-02 2021-11-02 Offline blind person vision assisting method and device

Publications (1)

Publication Number Publication Date
CN114049553A true CN114049553A (en) 2022-02-15

Family

ID=80206815

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111290759.4A Pending CN114049553A (en) 2021-11-02 2021-11-02 Offline blind person vision assisting method and device

Country Status (1)

Country Link
CN (1) CN114049553A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120232907A1 (en) * 2011-03-09 2012-09-13 Christopher Liam Ivey System and Method for Delivering a Human Interactive Proof to the Visually Impaired by Means of Semantic Association of Objects
CN104535059A (en) * 2014-12-04 2015-04-22 上海交通大学 Indoor positioning system specific to totally blind population
CN106265004A (en) * 2016-10-08 2017-01-04 西安电子科技大学 Multi-sensor intelligent blind person's guiding method and device
CN107749053A (en) * 2017-10-24 2018-03-02 郑州布恩科技有限公司 A kind of binocular image collection and pretreatment unit and method for vision prosthesis
CN109753900A (en) * 2018-12-21 2019-05-14 西安科技大学 A blind auxiliary vision system based on CNN/LSTM
CN111241979A (en) * 2020-01-07 2020-06-05 浙江科技学院 Real-time obstacle detection method based on image feature calibration
CN112561054A (en) * 2020-12-03 2021-03-26 中国科学院光电技术研究所 Neural network filter pruning method based on batch characteristic heat map

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120232907A1 (en) * 2011-03-09 2012-09-13 Christopher Liam Ivey System and Method for Delivering a Human Interactive Proof to the Visually Impaired by Means of Semantic Association of Objects
CN104535059A (en) * 2014-12-04 2015-04-22 上海交通大学 Indoor positioning system specific to totally blind population
CN106265004A (en) * 2016-10-08 2017-01-04 西安电子科技大学 Multi-sensor intelligent blind person's guiding method and device
CN107749053A (en) * 2017-10-24 2018-03-02 郑州布恩科技有限公司 A kind of binocular image collection and pretreatment unit and method for vision prosthesis
CN109753900A (en) * 2018-12-21 2019-05-14 西安科技大学 A blind auxiliary vision system based on CNN/LSTM
CN111241979A (en) * 2020-01-07 2020-06-05 浙江科技学院 Real-time obstacle detection method based on image feature calibration
CN112561054A (en) * 2020-12-03 2021-03-26 中国科学院光电技术研究所 Neural network filter pruning method based on batch characteristic heat map

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PAVLO MOLCHANOV ET AL.: "《PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE》", 《PUBLISHED AS A CONFERENCE PAPER AT ICLR 2017》 *
官建军: "《无人机遥感测绘技术及应用》", 31 August 2018, 西安:西北工业大学出版社 *

Similar Documents

Publication Publication Date Title
CN108898579B (en) Image definition recognition method and device and storage medium
CN114399818A (en) A method and device for multimodal facial emotion recognition
CN108288035A (en) The human motion recognition method of multichannel image Fusion Features based on deep learning
CN110659573B (en) Face recognition method and device, electronic equipment and storage medium
WO2023015799A1 (en) Multimodal fusion obstacle detection method and apparatus based on artificial intelligence blindness guiding
CN111046738B (en) Precision improvement method of light u-net for finger vein segmentation
CN110163211B (en) Image recognition method, device and storage medium
CN114821050B (en) A transformer-based method for referential image segmentation
CN111126280B (en) Gesture recognition fusion-based aphasia patient auxiliary rehabilitation training system and method
CN112215203A (en) Pavement disease detection method and device based on deep learning
CN114120432A (en) Online Learning Attention Tracking Method Based on Gaze Estimation and Its Application
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
CN115146761B (en) Training method and related device for defect detection model
CN117373095A (en) Facial expression recognition method and system based on local global information cross fusion
CN110414338A (en) Pedestrian Re-Identification Method Based on Sparse Attention Network
CN114049553A (en) Offline blind person vision assisting method and device
CN115690887A (en) A method for intelligent identification of driver's emotions based on multi-modal network
CN114387553A (en) A video face recognition method based on frame structure-aware aggregation
CN109815922B (en) Rail transit ground target video identification method based on artificial intelligence neural network
CN211512572U (en) Interactive blind guiding system
Huang et al. ICMiF: Interactive cascade microformers for cross-domain person re-identification
Gholipour et al. Automatic Lip Reading of Persian Words by a Robotic System Using Deep Learning Algorithms
WO2024152265A1 (en) Person re-identification method and apparatus based on day and night images, and terminal
CN117392729A (en) End-to-end micro-expression recognition method based on pre-trained action extraction
CN112200226B (en) Image processing method based on reinforcement learning, image processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220215