CN115998591A - Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium - Google Patents

Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium Download PDF

Info

Publication number
CN115998591A
CN115998591A CN202211521138.7A CN202211521138A CN115998591A CN 115998591 A CN115998591 A CN 115998591A CN 202211521138 A CN202211521138 A CN 202211521138A CN 115998591 A CN115998591 A CN 115998591A
Authority
CN
China
Prior art keywords
image
feature
map
estimation method
binocular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211521138.7A
Other languages
Chinese (zh)
Inventor
田洪君
廖玚
陈应俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intellimicro Medical Co ltd
Original Assignee
Intellimicro Medical Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intellimicro Medical Co ltd filed Critical Intellimicro Medical Co ltd
Priority to CN202211521138.7A priority Critical patent/CN115998591A/en
Publication of CN115998591A publication Critical patent/CN115998591A/en
Priority to PCT/CN2023/126060 priority patent/WO2024114175A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61FFILTERS IMPLANTABLE INTO BLOOD VESSELS; PROSTHESES; DEVICES PROVIDING PATENCY TO, OR PREVENTING COLLAPSING OF, TUBULAR STRUCTURES OF THE BODY, e.g. STENTS; ORTHOPAEDIC, NURSING OR CONTRACEPTIVE DEVICES; FOMENTATION; TREATMENT OR PROTECTION OF EYES OR EARS; BANDAGES, DRESSINGS OR ABSORBENT PADS; FIRST-AID KITS
    • A61F9/00Methods or devices for treatment of the eyes; Devices for putting-in contact lenses; Devices to correct squinting; Apparatus to guide the blind; Protective devices for the eyes, carried on the body or in the hand
    • A61F9/08Devices or methods enabling eye-patients to replace direct visual perception by another kind of perception
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H3/00Appliances for aiding patients or disabled persons to walk about
    • A61H3/06Walking aids for blind persons
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61NELECTROTHERAPY; MAGNETOTHERAPY; RADIATION THERAPY; ULTRASOUND THERAPY
    • A61N1/00Electrotherapy; Circuits therefor
    • A61N1/18Applying electric currents by contact electrodes
    • A61N1/32Applying electric currents by contact electrodes alternating or intermittent currents
    • A61N1/36Applying electric currents by contact electrodes alternating or intermittent currents for stimulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Veterinary Medicine (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Epidemiology (AREA)
  • Pain & Pain Management (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Rehabilitation Therapy (AREA)
  • Ophthalmology & Optometry (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Vascular Medicine (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a binocular parallax estimation method, a visual prosthesis and a computer readable storage medium, wherein the binocular parallax estimation method comprises the following steps: acquiring a first image and a second image of the surrounding environment of a visual prosthesis wearer acquired by a binocular camera; performing depth feature extraction and matching fusion on the first image and the second image to obtain a feature map; and performing parallax estimation according to the characteristic map to obtain a target parallax map for generating an electric stimulation pulse signal of the visual prosthesis. The binocular parallax estimation method is combined with the visual prosthesis, so that obstacle information in the surrounding environment of the blind person can be effectively extracted and provided for blind person patients, the blind person patients are assisted to recognize and avoid various obstacles in life scenes, and the action capability of the blind person patients is improved.

Description

Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium
Technical Field
The invention relates to the technical field of medical instruments, in particular to a binocular parallax estimation method, a visual prosthesis and a computer readable storage medium.
Background
Visual prostheses are a novel medical device that induces photopic vision in blind patients by applying a stimulating current to the retina or visual cortex to create a visual sensation. Depending on the location of the stimulation electrode implantation, visual prostheses for retinal stimulation (also referred to as "implantable retinal electrical stimulators") and visual prostheses for cortical stimulation can be categorized.
In the related art, after the retina stimulating visual prosthesis acquires an image of the surrounding environment of a patient through an external camera, the image is enhanced by adopting a conventional image processing algorithm and then converted into an electric stimulation signal, the electric stimulation signal is sent to an electrode array implanted on the retina of the patient, and the retina cells are electrically stimulated to realize visual perception and reconstruction.
However, in the actual daily life scene, various obstacles often exist around the blind patient when the blind patient walks, and at present, the conventional image processing algorithm cannot effectively extract and display the obstacles in life and the distance between the obstacles and the blind patient in a complex scene, so that the blind patient cannot be effectively assisted in identifying and avoiding various obstacles in the life scene, and potential safety hazards exist.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, a first object of the present invention is to provide a binocular parallax estimation method, which can effectively extract obstacle information in the surrounding environment of the blind person and provide the information to the blind person patient, assist the patient in identifying and avoiding various obstacles in the living scene, and improve the mobility thereof.
A second object of the invention is to propose a visual prosthesis.
A third object of the present invention is to propose a computer readable storage medium.
In order to achieve the above object, a binocular disparity estimation method according to an embodiment of a first aspect of the present invention acquires a first image and a second image of a surrounding environment of a visual prosthesis wearer acquired by a binocular camera; performing depth feature extraction and matching fusion on the first image and the second image to obtain a feature map; and performing parallax estimation according to the characteristic map to obtain a target parallax map for generating an electric stimulation pulse signal of the visual prosthesis.
According to the binocular parallax estimation method provided by the embodiment of the invention, the parallax map can be accurately calculated, the effective extraction of the obstacle information under the complex environment is realized, the target parallax map for generating the electric stimulation pulse signal of the visual prosthesis is generated, namely, the binocular parallax estimation method is combined with the electric stimulation of the visual prosthesis, the accuracy and the effectiveness of the obstacle recognition and the avoidance information assistance aspects of the transmission of the visual prosthesis to a patient can be obviously improved, and the action capability of the patient is improved.
In some embodiments, performing depth feature extraction and matching fusion on the first image and the second image to obtain a feature map includes: taking the first image and the second image as input, and respectively extracting first image features in the first image and second image features in the second image in a mode of sharing weights by using MobileNet; respectively carrying out feature matching on the first image feature and the second image feature according to a group of preset parallax values to obtain a plurality of matching results; and fusing a plurality of matching results to obtain a group of feature graphs.
In some embodiments, before the first image and the second image are taken as input, respectively extracting the first image feature in the first image and the second image feature in the second image via MobileNet and with shared weights, the control method further comprises: and carrying out distortion correction and stereo correction on the first image and the second image so that the corrected matching points of the first image and the second image are on the same pixel row.
In some embodiments, performing disparity estimation from the feature map to obtain a target disparity map for generating an electrical stimulation pulse signal of a visual prosthesis, comprising: inputting the set of feature maps into a disparity estimation network to obtain an initial disparity map, and inputting the set of feature maps into a weight estimation network to obtain an attention weight; the initial disparity map, the attention weight and the feature map are subjected to weighted fusion, and a global information optimization network is input; performing continuous frame optimization on the initial disparity map by using a context timing relationship to obtain a final predicted disparity map; and downsampling the final predicted disparity map according to the resolution set by the visual prosthesis to obtain the target disparity map.
In some embodiments, inputting the set of feature maps into a disparity estimation network to obtain an initial disparity map comprises: serializing the group of feature images to obtain a serialized feature image; modeling the dependency items of the input and output sequences by using a depth feature transformation network to determine a feature mapping relation; and aggregating the serialized feature map through feature mapping according to the feature mapping relation so as to estimate the initial disparity map.
In some embodiments, inputting the set of feature maps into a weight estimation network to obtain the attention weight includes: taking the set of feature maps as input information; constructing matching cost by aggregating environmental information of different sizes and different positions by utilizing a multi-scale cavity convolution pyramid module; stacking a plurality of hourglass networks through 3D convolution fusion, and adjusting the matching cost; and outputting the attention weight through a SoftMax layer of the weight estimation network.
In some embodiments, optimizing the initial disparity map for successive frames using a contextual timing relationship to obtain a final predicted disparity map comprises: respectively carrying out feature extraction based on MobileNet and feature matching between the continuous frames on a first image and a second image of the continuous K frames; solving a camera pose transformation relation of each frame of image under a relative coordinate system based on a Perspotive-n-Point algorithm; and carrying out complementation of the missing area by utilizing projection between frames to obtain the final prediction disparity map.
In some embodiments, the complementing of the missing region with the projection from frame to obtain the final predicted disparity map comprises: determining a missing region of the image; performing super-pixel segmentation of the missing region with a preset resolution by taking the nearest matching point in the missing region as a center, so that the missing region is contained by a mask; and based on the projection relation of the nearest matching point between the continuous frames, utilizing the effective parallax in the mask of the adjacent frames to carry out partial parallax complement of the missing region so as to obtain the final predicted parallax map.
To achieve the above object, an optical prosthesis according to an embodiment of the second aspect of the present invention includes: the implantation device is connected with the wireless annunciator; the camera unit is used for acquiring images of the surrounding environment of the wearer; the artificial intelligent image processing unit is connected with the camera unit and the wireless annunciator and is used for obtaining a target parallax image according to the binocular parallax estimation method, sending the target parallax image to the implantation device through the wireless annunciator, and generating an electric stimulation pulse signal according to the target parallax image by the implantation device.
According to the visual prosthesis provided by the embodiment of the invention, the target parallax image is obtained by executing the binocular parallax estimation method of the embodiment through the artificial intelligence image processing unit, namely, the binocular parallax estimation method of the embodiment is applied to the visual prosthesis, so that the barrier information can be effectively extracted in a complex environment, the accuracy and the effectiveness in the aspects of barrier identification and information avoidance assistance transmitted to a patient by the visual prosthesis can be remarkably improved, and the mobility of the patient is improved.
In some embodiments, the camera unit comprises at least one set of binocular cameras.
In order to achieve the above object, an embodiment of a third aspect of the present invention also proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the binocular disparity estimation method.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of a visual prosthesis according to one embodiment of the present invention;
fig. 2 is a flowchart of a binocular disparity estimation method according to an embodiment of the present invention;
fig. 3 is a flowchart of a binocular disparity estimation method according to another embodiment of the present invention;
fig. 4 is a schematic illustration of the operation of a visual prosthesis according to one embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below, by way of example with reference to the accompanying drawings.
The binocular parallax estimation method provided by the embodiment of the invention is combined with the visual prosthesis, so that the accuracy and the effectiveness of the visual prosthesis in the aspects of obstacle recognition and information avoidance assistance transmitted to a patient can be obviously improved.
Fig. 1 is a block diagram of a visual prosthesis according to one embodiment of the present invention, the visual prosthesis 100 shown in fig. 1 including an implant device 10, a wireless annunciator 20, a camera unit 30, and an artificial intelligence image processing unit 40.
Wherein the camera unit 30 is worn on the patient and can be used for capturing image information of the surroundings of the wearer. In some embodiments, the camera unit 30 may include at least one set of binocular cameras. The artificial intelligent image processing unit 40 may be a smart computing terminal, such as a smart phone, a tablet, etc., for performing binocular disparity estimation based on the image acquired by the image capturing unit 20 to obtain a target disparity map, and transmitting the target disparity map to the implant device 10 through the wireless annunciator 20. The electrode array of the implantation device 10 is implanted into the visual cortex cells or retina cells of the patient, and can generate an electric stimulation pulse signal according to the target parallax image so as to stimulate the visual cortex cells or retina cells of the patient, thereby realizing the optical illusion corresponding to the surrounding environment image, realizing the identification of objects in the surrounding environment, further effectively avoiding the objects, and improving the safety of the patient action.
The binocular disparity estimation method according to the embodiment of the first aspect of the present invention, which can be applied to the artificial intelligence image processing unit 40 of the visual prosthesis 100, will be described below with reference to the accompanying drawings.
Fig. 2 is a flowchart of a binocular disparity estimation method according to an embodiment of the present invention, and as shown in fig. 2, the binocular disparity estimation method of the embodiment of the present invention includes steps S1 to S3.
S1, acquiring a first image and a second image of the surrounding environment of a visual prosthesis wearer acquired by a binocular camera.
In particular, the visual prosthesis may employ a binocular camera that is worn on the blind patient to capture image information of the wearer's surroundings.
For example, a binocular camera may include two horizontally disposed cameras on the left and right, or a binocular camera may include two cameras disposed one above the other.
Images of the surroundings of the wearer of the visual prosthesis are acquired respectively by means of a binocular camera, namely a first image and a second image are generated, and image information is sent to an artificial intelligent image processing unit of the visual prosthesis. In an embodiment, the artificial intelligence image processing unit may be a smart computing terminal, such as a smart phone, tablet, etc., or the artificial intelligence image processing unit may be an image processing chip integral with a binocular camera.
S2, carrying out depth feature extraction and matching fusion on the first image and the second image so as to obtain a feature map.
The feature extraction is a process of extracting feature information in an image, for example, only the extraction of an edge feature map of all object edges in an input image is displayed. The depth feature extraction can be understood as feature extraction based on a neural network model, and the neural network model is trained based on a large amount of data in a real life scene, so that relevant features such as barrier features and the like can be effectively extracted in a complex scene, and more accurate barrier information is provided for a user so as to facilitate the user to avoid.
Specifically, depth feature extraction is performed on the first image and the second image respectively, extracted features are matched, and matching structures are fused to obtain a feature map.
And S3, performing parallax estimation according to the feature map to obtain a target parallax map for generating the electric stimulation pulse signal of the visual prosthesis.
The disparity map may refer to a disparity map in an image pair acquired by a binocular camera, and the position of pixels imaged by the same scene under two cameras is deviated. For example, for a binocular camera consisting of two horizontally placed cameras, the positional deviation is typically in the horizontal direction. In the center of depth of field technology, a disparity map can be converted into a depth map.
Specifically, parallax estimation is performed based on the feature map to obtain a target view, the target view can be converted into a depth map, the depth map not only can reflect objects in a scene where a user is located, but also can reflect distance information between the objects and the objects, and an electric stimulation pulse signal of a visual prosthesis is generated according to the depth map, and an implantation device of the visual prosthesis stimulates retina or visual cortex of a wearer according to the electric stimulation pulse signal, so that optical illusion of the wearer can be induced, the wearer is helped to know the objects in the surrounding environment and the distance information between the objects and the objects, and avoidance is facilitated.
According to the binocular parallax estimation method provided by the embodiment of the invention, the parallax map can be accurately calculated, the effective extraction of the obstacle information under the complex environment is realized, the target parallax map for generating the electric stimulation pulse signal of the visual prosthesis is generated, namely, the binocular parallax estimation method is combined with the electric stimulation of the visual prosthesis, the accuracy and the effectiveness of the obstacle recognition and the avoidance information assistance aspects of the transmission of the visual prosthesis to a patient can be obviously improved, and the movement capability of the patient is improved.
The process of obtaining the feature map and the target disparity map according to the embodiment of the present invention is further described below.
In an embodiment, depth feature extraction may be performed based on a neural network model, where a first image and a second image acquired by a binocular camera are input into the neural network, and a first image feature in the first image and a second image feature in the second image are respectively extracted by using a shared weight through a backbone network, such as MobileNet. The MobileNet is a lightweight convolutional neural network, and parameters for convolution are far less than those of standard convolution, so that parameters and calculation amount can be greatly reduced. The basic unit of MobileNet is depth level separable convolution, which can realize depth feature extraction.
Further, the first image feature and the second image feature are respectively subjected to feature matching according to a group of preset parallax values, and a plurality of matching results are obtained. For example, the preset disparity value may be a fixed disparity value, that is, the two feature maps are respectively matched according to a set of fixed disparity values. Further, a plurality of matching structures are fused into a set of feature maps.
In some embodiments, the image may also be distortion corrected and stereo corrected prior to depth feature extraction, i.e., the first image and the second image. Distortion correction refers to correcting the perspective distortion inherent to the lens of a camera due to an optical lens by using a formula. The three-dimensional correction is to correct the display effect of two lenses on the binocular camera on the same target, so that corresponding points in two images can fall on the same reference line. The matching points of the corrected first image and the corrected second image are arranged on the same pixel row through distortion correction and stereo correction, so that the accuracy of subsequent depth feature extraction and matching fusion can be improved.
In an embodiment, after a set of feature maps is obtained, disparity estimation is performed from the feature maps. And carrying out two paths of processing on the fused set of feature images, wherein one path inputs the set of feature images into a parallax estimation network to obtain an initial parallax image, and the other path inputs the set of feature images into a weight estimation network to obtain attention weights.
When an initial disparity map is estimated, the disparity estimation network sequences an input set of feature maps to obtain a serialized feature map. Modeling the dependency items of the input and output sequences by using a depth feature transformation network, namely modeling the dependency items of the obtained serialized feature map and the serialized feature map output by the parallax estimation network by using the depth feature transformation network so as to determine the feature mapping relation of the two. Thus, the obtained serialized feature map may be aggregated through feature mapping according to the feature mapping relationship to estimate and obtain the initial disparity map. The parallax estimation network and the depth feature transformation network can be obtained by adopting feature images of images acquired by binocular cameras in real life scenes as training data based on a basic network model in the related art.
When the attention weight is obtained, the obtained set of feature graphs is input into a weight estimation network, the weight estimation network utilizes a multi-scale cavity convolution pyramid module to construct matching cost by aggregating environment information of different sizes and different positions, stacks a plurality of hourglass networks by 3D convolution fusion, adjusts the matching cost, and outputs the attention weight through a softMax layer of the weight estimation network. The hole convolution is to interpolate between kernel sizes of the convolution kernel, and the parameter position is used to determine the interpolation quantity between kernel sizes.
For example, in some embodiments, the multi-scale hole convolution pyramid network may use U-Net as a basic model, and the hole convolution is adopted to replace the common convolution in the encoding-decoding stage to enlarge the receptive field, so that each convolution layer output contains a larger range of characteristic information than the common convolution, so as to be beneficial to obtaining global information of obstacle characteristics in the remote sensing image, and the pyramid convolution module integrates the multi-scale characteristics by combining with the U-Net jump connection structure to obtain high-resolution global overall information and low-resolution local detail information, so that accuracy and comprehensiveness of obstacle identification under a complex scene can be improved.
Further, after the initial disparity map and the attention weight are obtained, the initial disparity map, the attention weight and the feature map are subjected to weighted fusion, and are input into a global information optimization network, and the initial disparity map is subjected to continuous frame optimization by utilizing a context time sequence relationship so as to obtain a final predicted disparity map.
In the feature matching process, if a local area of a frame of image cannot be matched with effective features, multiple obvious defects of the parallax image can be caused. The embodiment of the invention adopts continuous frame optimization by utilizing a time sequence relation, and is an effective way for complementing parallax loss of a local area.
Specifically, in some embodiments, the first image and the second image of consecutive K frames are subjected to a MobileNet-based feature extraction and feature matching between consecutive frames, respectively. The value of K may be between 2 and 20 frames, for example, feature extraction based on the backbone network and feature matching between consecutive frames are performed on consecutive 10 frames of left images. And solving a camera pose transformation relation of each frame of image under a relative coordinate system based on a Perspotive-n-Point algorithm to obtain projections between frames, and complementing a missing region by utilizing the projections between frames to obtain a final predicted parallax image.
In an embodiment, in order to solve the problem that it is difficult to determine the projection/complement relationship due to the lack of texture of the missing region, 5 in some embodiments of the present invention, the missing region of the image is determined, and the partial super-pixel segmentation is performed with the nearest matching point in the missing region as the center and with a preset resolution, for example, a resolution R, so that the missing region is included by the mask. In the field of computer vision, among others, image segmentation refers to the process of subdividing a digital image into a plurality of image sub-regions, i.e. sets of pixels, also referred to as superpixels. And then, based on the projection relation of the nearest matching point between the continuous frames, utilizing the effective parallax in the mask of the adjacent frames to carry out local parallax complement so as to obtain a final predicted parallax map.
0 further downsampling the final predicted disparity map according to the resolution set by the visual prosthesis to obtain the target
And (5) marking a disparity map.
Based on the description of the above embodiments, the binocular disparity estimation method according to the embodiments of the present invention may include binocular image correction, depth feature extraction and matching fusion, hole convolution and weight estimation, context timing relationship continuous frame optimization and graph
Like downsampling, etc. Fig. 3 is a flowchart 5 of a binocular disparity estimation method according to an embodiment of the present invention, as shown in fig. 3, including:
s11, calibrating the binocular camera, namely performing distortion correction and stereo correction.
S12, binocular images are acquired and corrected.
And S13, carrying out depth feature extraction and matching fusion, and then respectively carrying out steps S14 and S15.
S14, performing parallax estimation to obtain an initial parallax image, and proceeding to step S16.
And 0S15, performing weight estimation.
S16, carrying out weighted fusion with the initial correction chart.
S17, performing continuous frame optimization on the initial disparity map by using the context time sequence relation.
And S18, downsampling the image according to the set resolution.
And S19, obtaining a target parallax map.
In summary, in the binocular disparity estimation method according to the embodiment of the present invention, for a first image and a second image acquired by a binocular camera
The image is subjected to distortion correction and stereo correction, so that the two corrected image matching points are on the same pixel row. On the basis, depth feature extraction is carried out on the corrected image, multiple groups of parallaxes are estimated by using an efficient feature transformation module synchronously, each parallax weight is estimated by adopting cavity convolution, and a final target parallax image is obtained after the two networks are jointly optimized.
In the embodiment of the present invention, the binocular disparity image estimation method of the above embodiment is combined with the visual prosthesis technique 0. FIG. 4 is an illustration of the principles of operation of an artificial intelligence video processing based visual prosthesis according to one embodiment of the present invention
It is intended that, as shown in fig. 4, the image capturing unit captures an image of the surrounding environment of the wearing patient and transmits the image information to the artificial intelligence image processing unit, the artificial intelligence image processing unit obtains a target parallax map according to the binocular parallax estimation method of the above embodiment, and transmits the target parallax map to the implantation device through the wireless annunciator, and the implantation device generates an electrical stimulation pulse signal according to the target parallax map to stimulate retinal cells or visual cortex cells of the wearer, thereby allowing the patient to form a perception of the image and distance of the surrounding obstacle.
The artificial intelligent image processing technology based on the binocular parallax estimation method of depth feature matching and continuous frame optimization is combined with the visual prosthesis, so that surrounding obstacles and distance information of a patient can be accurately extracted, accuracy and effectiveness in the aspect of obstacle recognition and avoidance information assistance transmitted to the patient by the visual prosthesis are remarkably improved, and efficient and reliable assistance information is provided for recognition and avoidance of the obstacles when the visual prosthesis acts alone.
The accuracy of disparity map estimation can be improved through depth feature extraction and matching. The method integrates the optimization algorithm of the continuous frames, so that the flickering problem between the continuous frames can be effectively reduced, and the interference and discomfort of a patient in the image sensing process can be remarkably reduced. By adopting the cavity convolution, the receptive field calculation has the characteristics of flexibility and high efficiency, the real-time calculation efficiency can be effectively improved, and real-time obstacle parallax information can be provided for blind patients. And the depth field depth information is acquired by adopting a binocular stereo matching mode, so that compared with the prior art, the method has the advantages of long acquisition distance, low price and the like, and is more suitable for visual prosthesis products.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor can implement the binocular disparity estimation method of the above embodiment. The specific implementation procedure of the binocular disparity estimation method can be referred to the description of the above embodiments.
Wherein, in an embodiment, the binocular disparity estimation method above may be embodied by a computer readable storage medium storing a computer program, which may be embodied by computer readable code comprising instructions executable by at least one computing device. The computer readable storage medium may be associated with any data storage device that can store data which can be thereafter read by a computer system. Computer readable storage media for example may include read-only memory, random-access memory, CD-ROM, HDD, DVD, magnetic tape, optical data storage devices, and the like. The computer readable storage medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A binocular disparity estimation method, comprising:
acquiring a first image and a second image of the surrounding environment of a visual prosthesis wearer acquired by a binocular camera;
performing depth feature extraction and matching fusion on the first image and the second image to obtain a feature map;
and performing parallax estimation according to the characteristic map to obtain a target parallax map for generating an electric stimulation pulse signal of the visual prosthesis.
2. The binocular disparity estimation method according to claim 1, wherein performing depth feature extraction and matching fusion on the first image and the second image to obtain a feature map, comprises:
taking the first image and the second image as input, and respectively extracting first image features in the first image and second image features in the second image in a mode of sharing weights by using MobileNet;
respectively carrying out feature matching on the first image feature and the second image feature according to a group of preset parallax values to obtain a plurality of matching results;
and fusing a plurality of matching results to obtain a group of feature graphs.
3. The binocular disparity estimation method according to claim 2, wherein before the first image and the second image are taken as input, the first image feature in the first image and the second image feature in the second image are extracted respectively via MobileNet and with a sharing weight, the method further comprises:
and carrying out distortion correction and stereo correction on the first image and the second image so that the corrected matching points of the first image and the second image are on the same pixel row.
4. The binocular disparity estimation method according to claim 2, wherein performing disparity estimation from the feature map to obtain a target disparity map for generating an electrical stimulation pulse signal of a visual prosthesis, comprises:
inputting the set of feature maps into a disparity estimation network to obtain an initial disparity map, and inputting the set of feature maps into a weight estimation network to obtain an attention weight;
the initial disparity map, the attention weight and the feature map are subjected to weighted fusion, and a global information optimization network is input;
performing continuous frame optimization on the initial disparity map by using a context timing relationship to obtain a final predicted disparity map;
and downsampling the final predicted disparity map according to the resolution set by the visual prosthesis to obtain the target disparity map.
5. The binocular disparity estimation method according to claim 4, wherein inputting the set of feature maps into a disparity estimation network to obtain an initial disparity map comprises:
serializing the group of feature images to obtain a serialized feature image;
modeling the dependency items of the input and output sequences by using a depth feature transformation network to determine a feature mapping relation;
and aggregating the serialized feature map through feature mapping according to the feature mapping relation so as to estimate the initial disparity map.
6. The binocular disparity estimation method according to claim 4, wherein inputting the set of feature maps into a weight estimation network to obtain the attention weight comprises:
taking the set of feature maps as input information;
constructing matching cost by aggregating environmental information of different sizes and different positions by utilizing a multi-scale cavity convolution pyramid module;
stacking a plurality of hourglass networks through 3D convolution fusion, and adjusting the matching cost;
and outputting the attention weight through a SoftMax layer of the weight estimation network.
7. The binocular disparity estimation method of claim 4, wherein optimizing the initial disparity map for successive frames using a contextual timing relationship to obtain a final predicted disparity map comprises:
respectively carrying out feature extraction based on MobileNet and feature matching between the continuous frames on a first image and a second image of the continuous K frames;
solving a camera pose transformation relation of each frame of image under a relative coordinate system based on a Perspotive-n-Point algorithm;
and carrying out complementation of the missing area by utilizing projection between frames to obtain the final prediction disparity map.
8. The binocular disparity estimation method according to claim 7, wherein the completion of the missing region using the projection from frame to obtain the final predicted disparity map comprises:
determining a missing region of the image;
performing super-pixel segmentation of the missing region with a preset resolution by taking the nearest matching point in the missing region as a center, so that the missing region is contained by a mask;
and based on the projection relation of the nearest matching point between the continuous frames, utilizing the effective parallax in the mask of the adjacent frames to carry out partial parallax complement of the missing region so as to obtain the final predicted parallax map.
9. A visual prosthesis comprising:
the implantation device is connected with the wireless annunciator;
the camera unit is used for acquiring images of the surrounding environment of the wearer;
the artificial intelligent image processing unit is connected with the camera shooting unit and the wireless annunciator and is used for obtaining a target parallax image according to the binocular parallax estimation method of any one of claims 1-8, and sending the target parallax image to the implantation device through the wireless annunciator, and the implantation device generates an electric stimulation pulse signal according to the target parallax image.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the binocular disparity estimation method according to any one of claims 1-8.
CN202211521138.7A 2022-11-30 2022-11-30 Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium Pending CN115998591A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211521138.7A CN115998591A (en) 2022-11-30 2022-11-30 Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium
PCT/CN2023/126060 WO2024114175A1 (en) 2022-11-30 2023-10-24 Binocular disparity estimation method, and visual prosthesis and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211521138.7A CN115998591A (en) 2022-11-30 2022-11-30 Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN115998591A true CN115998591A (en) 2023-04-25

Family

ID=86027375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211521138.7A Pending CN115998591A (en) 2022-11-30 2022-11-30 Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN115998591A (en)
WO (1) WO2024114175A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152133A (en) * 2023-10-30 2023-12-01 北京中科慧眼科技有限公司 Non-texture shielding detection method and device based on binocular stereoscopic vision
CN117765499A (en) * 2023-12-30 2024-03-26 武汉奥思工业设计有限公司 Intelligent decision method and system for auxiliary driving of vehicle
WO2024114175A1 (en) * 2022-11-30 2024-06-06 微智医疗器械有限公司 Binocular disparity estimation method, and visual prosthesis and computer-readable storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2185236B1 (en) * 2007-07-27 2015-10-07 Second Sight Medical Products Implantable device for the brain
CN111192306A (en) * 2018-11-15 2020-05-22 三星电子株式会社 System for disparity estimation and method for disparity estimation of system
CN109919993B (en) * 2019-03-12 2023-11-07 腾讯科技(深圳)有限公司 Parallax map acquisition method, device and equipment and control system
CN111481345A (en) * 2020-05-27 2020-08-04 微智医疗器械有限公司 Implant device and visual prosthesis with same
CN112188059B (en) * 2020-09-30 2022-07-15 深圳市商汤科技有限公司 Wearable device, intelligent guiding method and device and guiding system
RU2759125C1 (en) * 2021-02-15 2021-11-09 Автономная некоммерческая организация «Научно-производственная лаборатория «Сенсорные технологии для слепоглухих» System and method for visual cortical prosthetics
CN113128347B (en) * 2021-03-24 2024-01-16 北京中科慧眼科技有限公司 Obstacle target classification method and system based on RGB-D fusion information and intelligent terminal
CN114387197A (en) * 2022-01-04 2022-04-22 京东鲲鹏(江苏)科技有限公司 Binocular image processing method, device, equipment and storage medium
CN115170638B (en) * 2022-07-13 2023-04-18 东北林业大学 Binocular vision stereo matching network system and construction method thereof
CN115998591A (en) * 2022-11-30 2023-04-25 微智医疗器械有限公司 Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024114175A1 (en) * 2022-11-30 2024-06-06 微智医疗器械有限公司 Binocular disparity estimation method, and visual prosthesis and computer-readable storage medium
CN117152133A (en) * 2023-10-30 2023-12-01 北京中科慧眼科技有限公司 Non-texture shielding detection method and device based on binocular stereoscopic vision
CN117152133B (en) * 2023-10-30 2024-03-19 北京中科慧眼科技有限公司 Non-texture shielding detection method and device based on binocular stereoscopic vision
CN117765499A (en) * 2023-12-30 2024-03-26 武汉奥思工业设计有限公司 Intelligent decision method and system for auxiliary driving of vehicle

Also Published As

Publication number Publication date
WO2024114175A1 (en) 2024-06-06

Similar Documents

Publication Publication Date Title
CN115998591A (en) Binocular disparity estimation method, visual prosthesis, and computer-readable storage medium
CN112634341B (en) Method for constructing depth estimation model of multi-vision task cooperation
CN110599540B (en) Real-time three-dimensional human body shape and posture reconstruction method and device under multi-viewpoint camera
CN109448041B (en) Capsule endoscope image three-dimensional reconstruction method and system
EP4307233A1 (en) Data processing method and apparatus, and electronic device and computer-readable storage medium
US10773083B2 (en) Methods and systems for detecting obstacles for a visual prosthesis
CN111784754B (en) Tooth orthodontic method, device, equipment and storage medium based on computer vision
CN110032278A (en) A kind of method for recognizing position and attitude, the apparatus and system of human eye attention object
CN103732287B (en) The method and apparatus controlling visual acuity aid
CN112207821B (en) Target searching method of visual robot and robot
CN111080778B (en) Online three-dimensional reconstruction method of binocular endoscope soft tissue image
CN107749053A (en) A kind of binocular image collection and pretreatment unit and method for vision prosthesis
JP2020119127A (en) Learning data generation method, program, learning data generation device, and inference processing method
CN109841272A (en) Realtime graphic identification display equipment
CN109961092A (en) A kind of binocular vision solid matching method and system based on parallax anchor point
CN114929331A (en) Salient object detection for artificial vision
CN106385536A (en) Binocular image collection method and system for visual prosthesis
US20230290101A1 (en) Data processing method and apparatus, electronic device, and computer-readable storage medium
CN109711286B (en) Control method and device based on artificial retina space perception
CN112085777A (en) Six-degree-of-freedom VR glasses
CN114930392A (en) Runtime optimized artificial vision
CN108234986B (en) For treating the 3D rendering management method and management system and device of myopia or amblyopia
CN117770850B (en) Radiation source radiation dose self-adaption method and CT system
CN112132864A (en) Robot following method based on vision and following robot
CN109544611A (en) A kind of binocular vision solid matching method and system based on bit feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination