CN114049553A - Offline blind person vision assisting method and device - Google Patents
Offline blind person vision assisting method and device Download PDFInfo
- Publication number
- CN114049553A CN114049553A CN202111290759.4A CN202111290759A CN114049553A CN 114049553 A CN114049553 A CN 114049553A CN 202111290759 A CN202111290759 A CN 202111290759A CN 114049553 A CN114049553 A CN 114049553A
- Authority
- CN
- China
- Prior art keywords
- image
- image description
- model
- pruning
- description model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61F—FILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
- A61F9/00—Methods or devices for treatment of the eyes; Devices for putting-in contact lenses; Devices to correct squinting; Apparatus to guide the blind; Protective devices for the eyes, carried on the body or in the hand
- A61F9/08—Devices or methods enabling eye-patients to replace direct visual perception by another kind of perception
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Ophthalmology & Optometry (AREA)
- Heart & Thoracic Surgery (AREA)
- Vascular Medicine (AREA)
- Animal Behavior & Ethology (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an off-line blind person vision assisting method and device, and belongs to the field of computer vision. The method comprises the following steps: acquiring an image, wherein the acquired image is an image shot by the blind in daily life; extracting characteristic points of the image, and splicing the image with incomplete information by using the extracted characteristic points; and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information. By adopting the invention, the time and energy consumption consumed by image processing can be reduced.
Description
Technical Field
The invention relates to the field of computer vision, in particular to an off-line blind person vision assisting method and device.
Background
The visually impaired people are huge people which are easy to neglect among the disabled, and the external information cannot be sensed by the disabled through a visual system due to the defects of eyes, so that great inconvenience is brought to daily life and traveling. With the development of deep learning, the development of related research of image and language processing is greatly improved, and a new idea is brought to the problem of providing visual help for the blind. By adopting a computer vision technology, a vision auxiliary system capable of being carried on low-cost portable equipment is designed to become an urgent need for helping the blind.
Due to the continuous development of the neural network model, the problems of huge calculation amount, difficulty in practical application of related technologies and the like are brought. The deepening of the layer number of the deep neural network greatly improves the precision, but the operation amount is greatly increased, and simultaneously, a large amount of redundancy is also accompanied. The realization of the deep learning network model requires the help of equipment with strong computing power or a network capable of transmitting a large amount of data, and brings huge challenges to the realization of small-sized mobile equipment which can be carried about and can help people with visual impairment to provide daily services. For the reasons mentioned above, the current visual assistance devices have the disadvantages of high price, poor interactivity, no off-line capability, etc.
Therefore, the complexity of the model is reduced as much as possible while the accuracy of the model is ensured, and the method becomes a popular research subject. The model compression method proposed in recent years mainly comprises the following steps: model pruning, low rank decomposition, parameter quantification, knowledge distillation, retraining a compact neural network, and the like.
Emily Denton et al propose a method for adjusting network weight based on low rank decomposition [ Denton E, Zaremba W, Bruna J, et al. However, the low-rank decomposition method has a good effect when applied to a full-link layer, and when the compression algorithm of the low-rank decomposition is applied to a convolutional layer, an error accumulation effect occurs, so that the final precision loss image is large, fine adjustment needs to be performed on a network layer by layer, and time and labor are wasted.
The method adopting the model compression provides a feasible technical approach for running a complex deep network model in the portable mobile equipment. Based on model compression, a more efficient compressed image description model is further developed to solve the problems that the low-rank decomposition compression method is large in calculation amount and accumulated errors exist, and facilities and wearable equipment which provide visual assistance for the blind by using an original image description model are expensive, limited in assistance function, poor in interactivity, incapable of being offline, difficult to widely use in daily life scenes and the like.
Disclosure of Invention
The embodiment of the invention provides an off-line blind visual assistance method and device, which can reduce the time and energy consumption consumed by image processing. The technical scheme is as follows:
in one aspect, an offline visual assistance method for blind people is provided, which includes:
acquiring an image, wherein the acquired image is an image shot by the blind in daily life;
extracting characteristic points of the image, and splicing the image with incomplete information by using the extracted characteristic points;
and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.
Further, the stitching of the image with incomplete information by using the extracted feature points comprises:
a1, preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;
a2, screening out the same or similar features in the extracted feature points and matching the feature points;
a3, optimizing and purifying the matched characteristic points;
a4, obtaining a transformation matrix according to the matching relation between the matched feature points, and carrying out corresponding transformation on the image by using the transformation matrix;
and A5, splicing the plurality of transformed incomplete images of the information together, and eliminating seams and light differences to obtain an image with complete image information.
Further, the model compression and acceleration algorithm comprises the following steps:
b1, using Oracle pruning algorithm to evaluate the importance degree of the neuron for many times, so that the cost loss of the pruned image description model is minimum;
b2, removing the least important neurons after a plurality of operations of evaluating the importance degree of the neurons;
b3, fine adjustment is carried out on the image description model after pruning;
b4, returning to the step B1 to continue the execution until the pruning is completed.
Further, after the operation of evaluating the importance degree of the neuron, the cost function of the pruned image description model is expressed as:
wherein i is the number of executed evaluations, N represents the total number of executed evaluations of the neuron importance degree, M is the number of feature maps selected after each evaluation, W represents a parameter set of the image description model, W 'represents a parameter set of the image description model after pruning, C (D | W') represents a loss function of the image description model after pruning, C (D | W) represents a loss function of the image description model before pruning, B represents the number of nonzero parameters, and D represents a training set.
Further, the removing the least significant neurons after the operation of evaluating the significance of the neurons for a plurality of times includes:
selecting a feature diagram with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;
calculating the importance degree value of the selected feature graph and sequencing the importance degree value from small to large;
and cutting off the characteristic graph with the top rank.
Further, the fine-tuning the pruned image description model includes:
and retraining the image description model after pruning.
Further, the returning to the step B1 to continue the execution until the pruning is completed includes:
judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the pruned image description model;
if yes, stopping pruning;
otherwise, the procedure returns to step B1 to continue execution until pruning is completed.
Further, the image description model is used for describing the input spliced image in a text mode.
Further, after the spliced image is input into an image description model processed based on a model compression and acceleration algorithm to obtain image description information, the method comprises the following steps:
and broadcasting the image description information in a voice mode.
In one aspect, an offline visual aid for the blind comprises:
the device comprises a shooting module, a display module and a control module, wherein the shooting module is used for acquiring images, and the acquired images are images shot by the blind in daily life;
the portable computing module is used for extracting the feature points of the image, splicing the image with incomplete information by using the extracted feature points, and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;
and the broadcasting module is used for broadcasting the image description information in a voice mode.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
1) the problem that in the prior art, a deep learning network model is realized by means of either equipment with strong computing power or a network capable of transmitting a large amount of data is solved;
2) the time and energy consumption consumed by image processing can be reduced, and the problems of error accumulation and too long time consumed for fine tuning of neurons caused by one feature diagram clipping at a time when a network model is compressed by using a low-rank decomposition-based method are solved;
3) the problems that the blind auxiliary equipment based on the image description model needs to be operated by large-scale computing equipment and cannot be operated off-line and the like are solved;
4) the simple model can work in a low-cost portable mobile processor, enables a person with visual impairment to stably and timely sense the surrounding environment for a long time in a non-visual mode, and improves the life happiness of the person to a certain extent.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of an off-line blind person visual assistance method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of image feature point matching according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the importance of neurons according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an image according to an embodiment of the present invention;
fig. 5 is a schematic structural view of an offline vision assisting device for blind people according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides an offline visual assistance method for blind people, including:
s101, acquiring an image, wherein the acquired image is an image shot by the blind in daily life;
the images acquired during the experiment may be existing data sets or images in real scenes. In the practical application process, the acquired image is a real-time image shot by the blind person by using a shooting module in the off-line blind person vision assisting device.
S102, extracting feature points of the image, and stitching the image with incomplete information by using the extracted feature points, which may specifically include the following steps:
a1, feature point extraction: preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;
in this embodiment, the image preprocessing mainly refers to performing geometric distortion correction, noise point suppression, and the like on the image, so that the image to be spliced does not have obvious geometric distortion. If the image preprocessing is not performed, some mismatching is easily caused by image splicing under the condition that the image quality is not ideal. The image preprocessing is mainly used for preparing for the next image registration, so that the image quality can meet the requirement of the image registration.
A2, feature point matching: screening out the same or similar features in the extracted feature points and performing feature point matching, wherein the process requires matching the same feature points as much as possible, as shown in fig. 2;
a3, optimizing and purifying the matched characteristic points;
in this embodiment, the optimization and purification of the matched point pairs mainly includes removing mismatching point pairs to obtain a better Homography Matrix (Homography Matrix) Matrix. In the feature matching process, image ghosting or matching failure caused by mismatching often occurs, and optimization aiming at the process has a very important influence on image splicing.
A4, image transformation to be matched: obtaining a transformation matrix according to the matching relation between the matched characteristic points, and carrying out corresponding transformation on the image by using the transformation matrix;
a5, image stitching: and splicing the plurality of transformed incomplete images together, and eliminating seams and light differences to obtain an image with complete image information.
In this embodiment, when there is a problem of object missing in an image or when a single photo cannot describe information of an entire scene, the image information is considered to be incomplete. For example, many people walk across the zebra crossing in a green light state, and a portion of the photograph shows only that many people are walking, while a portion of the traffic light is not present in the photograph. If the surrounding scene is continuously shot at different angles in the same scene as the supplement of the image information, the current scene can be more completely described. However, there may be a portion where images overlap with each other, and in this portion, images at different angles may be stitched by using a feature point matching algorithm between the images. The images are presented in the form of one picture after being spliced, and the difference is that the spliced images contain more scene information than before.
S103, inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.
In this embodiment, the model compression and acceleration algorithm includes the following steps:
b1, assessing the importance of neurons: the importance degree of the neuron is evaluated for multiple times by using an Oracle pruning algorithm, so that the cost loss of a pruned image description model is minimum;
in this embodiment, the way of evaluating the importance degree of the neuron is to compare the image description models before and after pruning, and when the change condition of the model is minimum, the neuron is unimportant.
In this embodiment, pruning refers to: removing the least important neurons can be understood as compression of the model.
In this embodiment, the cost function calculation formula of the pruned image description model is as follows:
wherein W represents a set of parameters of the image description model, the image description model parameters of the ith layer are expressed, the total number of the layers is L, and each layer has ClA parameter; w 'represents a parameter set of the image description model after pruning, and W' belongs to W; c (D | W') represents a loss function of the pruned image description model; c (D | W) represents the loss function of the image description model before pruning; b represents the number of non-zero parameters, D represents a training set (including images used for training the image description model, wherein objects and scenes in the images of the training set are labeled), and deltaC represents the difference between before and after pruning of the image description model.
Because the complexity of the Oracle pruning algorithm is extremely high, the change of the loss function is approximated by Taylor series expansion, and whether the objective function of pruning a certain characteristic diagram is changed into:
|ΔC(hi)|=|C(D|W')-C(D|W)|=|C(D,hi=0)-C(D,hi)|
wherein h isiIn order to cut out a certain feature map, cutting out a certain feature map is to make it be 0.
According to Taylor's formula, C (D, h)i0) to hiUnfolding:
wherein, C (D, h)i) To subtract a certain feature map hiWherein the signature is subtracted and is made 0;
because of the lagrange remainder R1(hi0) is small, neglecting it, the objective function to decide whether to prune a certain feature map becomes:
after a plurality of operations for evaluating the importance degree of the neuron, the cost function of the pruned image description model becomes:
wherein i is the number of evaluation times, N represents the total number of times of evaluation of the neuron importance degree, and M is the number of feature maps selected after each evaluation.
B2, removing the least important neurons: after a number of operations to assess the degree of neuronal importance (Oracle-abs), the least important neurons are removed; the method specifically comprises the following steps:
b21, selecting a feature map with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;
in this embodiment, the evaluation result means: after the importance degree of the neuron is evaluated, the distribution of the importance degree obtained each time, namely, a certain position in several layers is shown in fig. 3; coincidence means that a certain position of a certain layer is subjected to a preset degree of importance in a plurality of evaluation processes.
B22, calculating the importance degree value of the selected feature map and sorting the importance degree value from small to large;
and B23, cutting off the characteristic graph with the top rank.
In this embodiment, for example, the top 2% ranked feature map is clipped.
B3, fine-tuning the image description model: the precision of the pruned image description model is reduced, and the pruned image description model is finely adjusted;
in this embodiment, the fine tuning of the pruned image description model is an operation of retraining the pruned image description model, so as to prevent the accuracy from decreasing too fast.
B4, continuing or stopping pruning: returning to the step B1 to continue executing until pruning is completed, which may specifically include the following steps:
b41, judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the image description model after pruning;
b42, if yes, stopping pruning;
b43, otherwise, returning to the step B1 to continue the execution until the pruning is completed.
In this embodiment, after the image description model is evaluated, pruned, and fine-tuned for multiple times, the accuracy of the obtained image description model will decrease slowly and will fall rapidly after a certain pruning, and step B4 determines whether the image description model is the finally retained image description model after the model compression and acceleration algorithm processing according to the accuracy change degree of the pruned image description model.
In this embodiment, when the same image description model is used, pruning is completed through steps B1-B4 to obtain a compressed image description model, and the image description is obtained by inputting the stitched image into the compressed image description model.
It should be noted that:
in this embodiment, the overall framework of the image description model is an encoding-decoding (Encoder-Decoder) model, in which an Encoder end uses a convolutional layer of VGG16 to extract image features, and a Decoder end uses a long-time memory network (LSTM). It is worth noting that, independent of the image description model, the model compression and acceleration algorithm provided by the embodiment can be applied to other image description models, so as to achieve the effect of reducing the operation time and the operation amount; that is to say, different neural network models can obtain the neural network model after pruning through a model compression and acceleration algorithm, and the lightweight neural network model with small precision change and greatly reduced calculation amount is obtained.
In this embodiment, the image description model is used to describe the input spliced image in a text manner, as shown in fig. 4.
In this embodiment, after inputting the stitched image into an image description model based on model compression and acceleration algorithm processing to obtain image description information, the method includes:
and broadcasting the image description information in a voice mode.
The off-line blind person vision assisting device provided by the invention corresponds to the specific embodiment of the off-line blind person vision assisting method, and the off-line blind person vision assisting device can realize the purpose of the invention by executing the flow steps in the specific embodiment of the method, so the explanation in the specific embodiment of the off-line blind person vision assisting method is also suitable for the specific embodiment of the off-line blind person vision assisting device provided by the invention, and the explanation in the following specific embodiment of the invention will not be repeated.
As shown in fig. 5, an embodiment of the present invention further provides an offline blind person visual aid, including:
the shooting module 11 is used for acquiring images, wherein the acquired images are images shot by the blind in daily life;
the portable computing module 12 is configured to extract feature points of an image, splice the image with incomplete information by using the extracted feature points, and input the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;
In the embodiment of the present invention, the time and energy consumption of the image understanding method processed by comparing the original image understanding scheme with the model compression and acceleration method in the apparatus shown in fig. 5 are shown in table 1:
TABLE 1 time consumed and energy consumption situation
As shown in table 1, the image understanding method processed by the model compression and acceleration method consumes less time and less energy than the original image understanding method, and the image understanding method after pruning consumes less energy, which provides great convenience for loading the blind vision assistance system in a low-cost portable small mobile device and prolongs the time for the visually impaired to use the system.
In summary, the technical solution provided by the embodiment of the present invention has at least the following beneficial effects:
1) the problem that in the prior art, a deep learning network model is realized by means of either equipment with strong computing power or a network capable of transmitting a large amount of data is solved;
2) the time and energy consumption consumed by image processing can be reduced, and the problems of error accumulation and too long time consumed for fine tuning of neurons caused by one feature diagram clipping at a time when a network model is compressed by using a low-rank decomposition-based method are solved;
3) the problems that the blind auxiliary equipment based on the image description model needs to be operated by large-scale computing equipment and cannot be operated off-line and the like are solved;
4) the simple model can work in a low-cost portable mobile processor, enables a person with visual impairment to stably and timely sense the surrounding environment for a long time in a non-visual mode, and improves the life happiness of the person to a certain extent.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. An off-line visual assistance method for the blind, comprising:
acquiring an image, wherein the acquired image is an image shot by the blind in daily life;
extracting characteristic points of the image, and splicing the image with incomplete information by using the extracted characteristic points;
and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information.
2. The off-line blind visual assistance method of claim 1, wherein the extracting feature points of the image and the stitching the image with incomplete information by using the extracted feature points comprises:
a1, preprocessing a plurality of images with incomplete information and extracting feature points, wherein the feature point extraction is to extract features for comparison from the images;
a2, screening out the same or similar features in the extracted feature points and matching the feature points;
a3, optimizing and purifying the matched characteristic points;
a4, obtaining a transformation matrix according to the matching relation between the matched feature points, and carrying out corresponding transformation on the image by using the transformation matrix;
and A5, splicing the plurality of transformed incomplete images of the information together, and eliminating seams and light differences to obtain an image with complete image information.
3. The off-line blind visual aid method according to claim 1, wherein the model compression and acceleration algorithm comprises the steps of:
b1, using Oracle pruning algorithm to evaluate the importance degree of the neuron for many times, so that the cost loss of the pruned image description model is minimum;
b2, removing the least important neurons after a plurality of operations of evaluating the importance degree of the neurons;
b3, fine adjustment is carried out on the image description model after pruning;
b4, returning to the step B1 to continue the execution until the pruning is completed.
4. The off-line blind visual aid method according to claim 3, wherein after the operation of evaluating the importance degree of the neurons for a plurality of times, the cost function of the pruned image description model is expressed as:
wherein i is the number of executed evaluations, N represents the total number of executed evaluations of the neuron importance degree, M is the number of feature maps selected after each evaluation, W represents a parameter set of the image description model, W 'represents a parameter set of the image description model after pruning, C (D | W') represents a loss function of the image description model after pruning, C (D | W) represents a loss function of the image description model before pruning, B represents the number of nonzero parameters, and D represents a training set.
5. The off-line blind visual aid method according to claim 3, wherein removing the least significant neurons after the operation of evaluating the significance of the neurons a plurality of times comprises:
selecting a feature diagram with overlapped evaluation results generated after the operation of evaluating the importance degree of the neuron for multiple times;
calculating the importance degree value of the selected feature graph and sequencing the importance degree value from small to large;
and cutting off the characteristic graph with the top rank.
6. The off-line blind visual aid method according to claim 3, wherein the fine-tuning of the pruned image description model comprises:
and retraining the image description model after pruning.
7. The off-line blind visual aid method according to claim 3, wherein the returning to step B1 to continue execution until pruning is completed comprises:
judging whether the image description model is the finally reserved image description model processed by the model compression and acceleration algorithm according to the precision change degree of the pruned image description model;
if yes, stopping pruning;
otherwise, the procedure returns to step B1 to continue execution until pruning is completed.
8. The off-line blind visual aid method according to claim 1, wherein the image description model is used for describing the input spliced image in a text manner.
9. The off-line blind visual aid method according to claim 1, wherein after inputting the spliced image into an image description model processed based on a model compression and acceleration algorithm to obtain image description information, the method comprises:
and broadcasting the image description information in a voice mode.
10. An off-line blind visual aid, comprising:
the device comprises a shooting module, a display module and a control module, wherein the shooting module is used for acquiring images, and the acquired images are images shot by the blind in daily life;
the portable computing module is used for extracting the feature points of the image, splicing the image with incomplete information by using the extracted feature points, and inputting the spliced image into an image description model based on model compression and acceleration algorithm processing to obtain image description information;
and the broadcasting module is used for broadcasting the image description information in a voice mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111290759.4A CN114049553A (en) | 2021-11-02 | 2021-11-02 | Offline blind person vision assisting method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111290759.4A CN114049553A (en) | 2021-11-02 | 2021-11-02 | Offline blind person vision assisting method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114049553A true CN114049553A (en) | 2022-02-15 |
Family
ID=80206815
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111290759.4A Pending CN114049553A (en) | 2021-11-02 | 2021-11-02 | Offline blind person vision assisting method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114049553A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120232907A1 (en) * | 2011-03-09 | 2012-09-13 | Christopher Liam Ivey | System and Method for Delivering a Human Interactive Proof to the Visually Impaired by Means of Semantic Association of Objects |
CN104535059A (en) * | 2014-12-04 | 2015-04-22 | 上海交通大学 | Indoor positioning system specific to totally blind population |
CN106265004A (en) * | 2016-10-08 | 2017-01-04 | 西安电子科技大学 | Multi-sensor intelligent blind person's guiding method and device |
CN107749053A (en) * | 2017-10-24 | 2018-03-02 | 郑州布恩科技有限公司 | A kind of binocular image collection and pretreatment unit and method for vision prosthesis |
CN109753900A (en) * | 2018-12-21 | 2019-05-14 | 西安科技大学 | A kind of blind person's auxiliary vision system based on CNN/LSTM |
CN111241979A (en) * | 2020-01-07 | 2020-06-05 | 浙江科技学院 | Real-time obstacle detection method based on image feature calibration |
CN112561054A (en) * | 2020-12-03 | 2021-03-26 | 中国科学院光电技术研究所 | Neural network filter pruning method based on batch characteristic heat map |
-
2021
- 2021-11-02 CN CN202111290759.4A patent/CN114049553A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120232907A1 (en) * | 2011-03-09 | 2012-09-13 | Christopher Liam Ivey | System and Method for Delivering a Human Interactive Proof to the Visually Impaired by Means of Semantic Association of Objects |
CN104535059A (en) * | 2014-12-04 | 2015-04-22 | 上海交通大学 | Indoor positioning system specific to totally blind population |
CN106265004A (en) * | 2016-10-08 | 2017-01-04 | 西安电子科技大学 | Multi-sensor intelligent blind person's guiding method and device |
CN107749053A (en) * | 2017-10-24 | 2018-03-02 | 郑州布恩科技有限公司 | A kind of binocular image collection and pretreatment unit and method for vision prosthesis |
CN109753900A (en) * | 2018-12-21 | 2019-05-14 | 西安科技大学 | A kind of blind person's auxiliary vision system based on CNN/LSTM |
CN111241979A (en) * | 2020-01-07 | 2020-06-05 | 浙江科技学院 | Real-time obstacle detection method based on image feature calibration |
CN112561054A (en) * | 2020-12-03 | 2021-03-26 | 中国科学院光电技术研究所 | Neural network filter pruning method based on batch characteristic heat map |
Non-Patent Citations (2)
Title |
---|
PAVLO MOLCHANOV ET AL.: "《PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE》", 《PUBLISHED AS A CONFERENCE PAPER AT ICLR 2017》 * |
官建军: "《无人机遥感测绘技术及应用》", 31 August 2018, 西安:西北工业大学出版社 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110796018B (en) | Hand motion recognition method based on depth image and color image | |
CN110728308A (en) | Interactive blind guiding system and method based on improved Yolov2 target detection and voice recognition | |
CN109325915A (en) | A kind of super resolution ratio reconstruction method for low resolution monitor video | |
CN112215203A (en) | Pavement disease detection method and device based on deep learning | |
CN114821050B (en) | Method for dividing reference image based on transformer | |
CN110909578A (en) | Low-resolution image recognition method and device and storage medium | |
CN110659573A (en) | Face recognition method and device, electronic equipment and storage medium | |
CN113743544A (en) | Cross-modal neural network construction method, pedestrian retrieval method and system | |
CN111428583A (en) | Visual compensation method based on neural network and touch lattice | |
CN115146761B (en) | Training method and related device for defect detection model | |
CN115187480A (en) | Image color correction method based on Transformer | |
CN116342953A (en) | Dual-mode target detection model and method based on residual shrinkage attention network | |
CN116012300A (en) | Multi-mode image aesthetic quality evaluation method integrating local and global image features | |
CN113591692A (en) | Multi-view identity recognition method | |
CN114943937A (en) | Pedestrian re-identification method and device, storage medium and electronic equipment | |
CN114049553A (en) | Offline blind person vision assisting method and device | |
CN112580395A (en) | Depth information-based 3D face living body recognition method, system, device and medium | |
Zhu et al. | Deepfake detection via inter-frame inconsistency recomposition and enhancement | |
CN116185182B (en) | Controllable image description generation system and method for fusing eye movement attention | |
Joshi et al. | Real-time object detection and identification for visually challenged people using mobile platform | |
CN117115474A (en) | End-to-end single target tracking method based on multi-stage feature extraction | |
CN116664694A (en) | Training method of image brightness acquisition model, image acquisition method and mobile terminal | |
CN116704603A (en) | Action evaluation correction method and system based on limb key point analysis | |
CN112200226B (en) | Image processing method based on reinforcement learning, image processing method and related device | |
CN113420783B (en) | Intelligent man-machine interaction method and device based on image-text matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220215 |
|
RJ01 | Rejection of invention patent application after publication |