ES2517765A1

ES2517765A1 - Device and method of spatial analysis, storage and representation by means of sounds (Machine-translation by Google Translate, not legally binding)

Info

Publication number: ES2517765A1
Application number: ES201300416A
Authority: ES
Inventors: Guillermo Peris Fajarnes; Cai ZUOQUN; Víctor Manuel SANTIAGO PRADERAS
Original assignee: Grupo Eye2021 S L; GRUPO EYE2021 SL
Current assignee: Guillermo Peris Fajarnes
Priority date: 2013-04-30
Filing date: 2013-04-30
Publication date: 2014-11-03
Anticipated expiration: 2033-04-30
Also published as: ES2517765B1

Abstract

Method and device for analysis, storage and spatial representation by means of sounds comprising: capturing a plane of three-dimensional space by means of an image capture device (2); extract information of distances from the objects to the device (2); generating a three-dimensional map of the captured objects, said objects being defined by their coordinates and their distance to the device (2); which is characterized in that it comprises the steps of: i) providing a sound bank consisting of a set of sounds established by a user; ii) associate each point of the space defined by its coordinates with at least one sound from the sound bank; iii) create a map of coded sounds representing a plane of three-dimensional space; and iv) sequentially reproduce a sound from each of the points defined in the map in such a way that a single horizontal line or beam of distances is represented. (Machine-translation by Google Translate, not legally binding)

Description

2 DISPOSITIVO Y MÉTODO DE ANÁLISIS, ALMACENAMIENTO Y REPRESENTACIÓN ESPACIAL MEDIANTE SONIDOS DESCRIPCIÓN 5 La presente invención tiene por objeto un dispositivo configurado para obtener imágenes tridimensionales con sonido junto con sus coordenadas y dirección de adquisición; para posteriormente ser procesada la información y representada de diferentes formas, siendo su transformación en mapas acústicos su principal objetivo para usuarios ciegos. El dispositivo objeto de la invención se ha diseñado e integrado en los laterales de unas gafas, e incluye 10 unas cámaras, unos micrófonos, unos auriculares y una unidad de procesado. La información adquirida es susceptible de ser supervisada para ayudar a guiar a personas con otras discapacidades o particularidades, y puede ser transmitida o almacenada para su manipulación o análisis posterior. 15 En el caso de ser aplicado a usuarios ciegos, el sistema representa el espacio mediante la transformación en sonidos. Los sonidos se escuchan a través de unos auriculares; estos sonidos son percibidos por el usuario cómo procedente de las superficies de los objetos que se representan. El número de sonidos, el movimiento o la capacidad del usuario van a permitir representar los objetos con sonidos bien en su totalidad, bien por un muestreo de 20 éstos. De esta forma, el usuario concibe el espacio donde está situado y su cerebro reconstruye la forma de este espacio a partir de dichos sonidos. En el caso de almacenamiento para procesado de la información. ESTADO DE LA TÉCNICA 25 En la actualidad, el deseo de integración de personas invidentes o con su capacidad visual reducida y la existencia de una tecnología de visión avanzada, hace que se estudien, diseñen y creen diversas propuestas encaminadas a mejorar la vida de dichas personas mejorando su autosuficiencia en el ámbito de su propia movilidad, por lo que se hace casi 30 imprescindible la búsqueda y consolidación de métodos, sistemas y/o dispositivos para la ayuda de percibir el entorno que les rodea. Existe una gran cantidad de documentación acerca de soluciones para este problema técnico, como por ejemplo la patente americana US2009122161 que dispone de dos 35 P20130041624-03-2014 3 cámaras a una distancia predeterminada configuradas para capturar y transmitir imágenes, y un sistema de tratamiento conectado a las cámaras que se configura para crear un mapa topográfico tridimensional del área de visión comparando sus imágenes y teniendo en cuenta la distancia entre ellas, realiza una modulación en tonos y frecuencias de manera dinámica. 5 El uso de la estereovisión (conjunto de dos cámaras calibradas) es conocido y su procesado para la generación de una información tridimensional es similar y en cualquier caso es ampliamente conocido en la comunidad científica. El modo y la información adquirida son claves para una adecuada representación e interpretación de la información, y es este un 10 elemento clave y diferenciador. El uso de sonidos, por el contrario, no suele utilizarse como herramienta para el posterior procesado de la información obtenida. En concreto, el sistema objeto de la invención dispone de un sistema que lo posiciona y orienta en cada imagen, de manera que cada par de fotogramas o par estereoscópico tiene 15 cada imagen, del instante, dirección e inclinación de ambas. Dado que el dispositivo se ha creado y ubicado en unas gafas, el dispositivo incluye dos micrófonos ubicados en cada lateral de las gafas en la patilla. Ello es lo que va a permitir generar una superficie tridimensional en su orientación y correcta coordenada absoluta 3D. Esta información absoluta junto con la medición del grado de inclinación de ambas cámaras con la línea 20 horizontal del terreno permiten crear una información que permita ser almacenada y usada posteriormente. La información adquirida por el sistema y procesada genera una superficie que puede ser almacenada transmitida como imagen, como superficie o transformada en una función 25 sonora. La información almacenada en audio y video, dispone de la información complementaria del instante, coordenada y dirección. La representación sonora del espacio se realiza mediante una llamada directa que sustituye cada punto de coordenada x, y, z detectado por las cámaras, en un sonido “virtual” que el 30 usuario percibe cómo si se hubiera originado en la coordenada de referencia. El conjunto de estos sonidos permite a la persona ciega hacer una reconstrucción mental de la superficie. En concreto el sistema utiliza una función sonora generada en tiempo real o almacenada en una banco de sonidos o datos denominada HRTF y que el usuario percibe como coordenadas sonoras provenientes de la superficie, y que el cerebro interpreta como tales. 35 P20130041624-03-2014 4 El uso de una determinada combinación de mapas sonoros generados con HRTF permite asociar sonidos a tonalidades, permitiendo codificar elementos u obstáculos de riesgo. Las siglas HRTF son conocidas en el ámbito técnico y científico y se corresponden con Head Related Transfer Function, son funciones o ecuaciones matemáticas que relacionan cómo se modifica un sonido que se genera desde una punto del espacio determinado an función 5 de si llegan a una posición determinada, y en concreto de si llegan al oído derecho o al oído izquierdo. En una persona con capacidad visual normal, la percepción de la profundidad se genera mediante la estereovisión del área de visión que se observa. El cerebro, interpreta y percibe 10 un “área o superficie tridimensional”. En el caso de que se desee representar dicho espacio por sonidos, es posible escoger diferentes estrategias o maneras de hacerlo. La representación mediante un lenguaje es una de ellas (Ejemplo: “silla a un metro delante”), no obstante, la que se propone y se utiliza en esta invención se basa en un tipo de sonidos asociados a una función matemática de manera que el que llega a un oído es diferente al 15 que llega al otro, y que el usuario percibe cómo si se hubieran emitido desde un determinado punto del espacio. Si disponemos de la información 3D que se desea representar, es posible representarla toda parte de ella. Dado que la percepción del sonido y el número de sonidos que una persona puede interpretar por cada unidad de tiempo es limitada y depende del tipo de sonido, de la capacidad auditiva y del entrenamiento, la 20 cantidad de información o número de puntos por segundo que se pueden representar con sonidos es limitado. Esta “limitación” hace que el sistema deba escoger de qué manera, con qué parte del espacio y que área del espacio se decida representar, pudiendo el usuario, en función del contexto y su necesidad puntual, escoger una de ellas, como por ejemplo, un plano horizontal a modo de “bastón”, un “cono frontal” o un “rectángulo a modo de área de 25 pantalla o tv”. Es conocido que el uso de estereovisión o visión artificial están afectados de manera notable por la iluminación (por la noche sin luz no funciona), el uso de información de coordenadas y de dirección es un complemento que permite la generación del mapa 3D a partir de la 30 consulta en un banco de datos de los objetos, así como permite codificar mobiliario fijo que existe, además de poder guiar con sonidos al usuario hacia la dirección requerida. La patente española ES2133078 describe un sistema para la creación de un espacio acústico virtual, en tiempo real, a partir de la información proporcionada por un sistema de 35 P20130041624-03-2014 5 visión artificial. Un sistema óptico-electrónico capta mediante estereovisión el entorno físico del usuario, y apoyo visual. La captación de imágenes se hace mediante dos microcámaras, se calcula estereovisión, se generarán sonidos virtuales en una unidad portátil y se envían señales acústicas a unos auriculares, de modo que el sujeto obtenga información de la zona del espacio que se desea identificar. Este sistema reconoce objetos y texturas con las 5 cámaras y genera sonidos a través de unas funciones conocidas en el ámbito técnico ya mencionadas y denominadas HRTF. Ya se ha comentado que el uso de un sistema de estereovisión y su combinación con las funciones HRTF son conocidos en la comunidad científica internacional y han sido de 10 utilizados en proyectos públicos de investigación cómo el proyecto Europeo CASBLIP (Acrónimo del poryecto Cognitive Aid for Blind People, www.casblip.es). A diferencia de ellos y del proyecto CASBLIP, el sistema que se presenta en éste documento dispone de un calibrado y alineado de cada par de imágenes, unos micrófonos cerca de los oídos y un banco de sonidos almacenado. Estos tres elementos permiten corregir las distorsiones de 15 distancia a objetos derivadas de la inclinación de las cámaras, representar con más rapidez las superficies y no necesariamente con funciones HRTF (sino con cualquier otro tipo de sonidos escogidos por el usuario) y compensar de manera individual el volumen y sonido representado en función del sonido ambiente de manera independiente para cada auricular. 20 El conocimiento de las coordenadas y dirección de las imágenes permite tanto la representación, como la corrección de errores de alineamiento por temas de paralelismo de cámaras (dado que las cámaras de estereovisión son portadas por el usuario, la horizontalidad de ellas afecta mucho al reconocimiento y es clave para que la representación sonora del origen de la coordenada sea siempre relativa al cuerpo del 25 usuario. El hecho de colocar un sistema de estereovisión, angulado horizontalmente (es habitual en el caso de colocarse, por ejemplo en unas gafas), debe corregirse con un sistema que mida la dirección de las imágenes, dado que pequeños errores de inclinación pueden alterar la representación y distancia de los objetos al usuario. 30 El documento WO2011036288 se refiere a un dispositivo para ayudar a individuos con deficiencias visuales, basado en la identificación y representación tridimensional de objetos y su procesamiento para la conversión a información táctil o auditiva. En nuestro caso, consideramos que la posible parte común con el dispositivo anterior es hoy de conocimiento público en la comunidad científica. La posibilidad de disponer de un sistema que permita 35 P20130041624-03-2014 6 crear una superficie tridimensional de un objeto, puede ser representada de manera directa en un dispositivo 3D. El documento WO03107039 divulga un aparato y un método para ayudar a una persona con deficiencias visuales o ciega para detectar, identificar y evitar objetos en el campo visual de 5 los alrededores. El aparato incluye dispositivos electro-ópticos para detectar e identificar los objetos, una unidad de control y de una unidad vocal de representación que proporciona señales audibles sobre los objetos que están en el campo visual. La unidad de detección puede ser un sensor CCD, un sensor de la proyección de imagen del laser, un sensor de radar o un sensor de radiación electromagnética. 10 DESCRIPCIÓN DE LA INVENCIÓN En el estado de la técnica se describen multitud de sistemas que, por norma general, transforman una imagen 3D en una variedad de sonidos. Así, por ejemplo, en ES2133078 15 se habla de reconocimiento 3D y de generar un sonido que proceda del foco del objeto. No obstante, con esta solución es imposible combinar en tiempo real el reconocimiento en 3D y los sonidos sin saber la resolución acústica o el número de sonidos por unidad de tiempo. La presente invención soluciona los problemas técnicos derivados del procesado de imágenes y su transformación a sonido mediante el empleo de un banco de sonidos que 20 permite: a) Hacer una transformación directa. b) Aumentar el número de audio-frames por segundo, a partir de la reducción de la resolución del sonido. c) Compensar los errores por horizontalidad de visión. 25 d) Compensar el ruido ambiental de manera automática. e) Compensar situaciones de baja intensidad de luz mediante la sustitución de mapas 3D reales por archivos de objetos almacenados en un mapa. f) Permitir comunicar coordenadas posición y dirección de captura a usuarios a través de dispositivos de comunicación que lo admitan (y que hoy existen en el mercado). 30 g) Almacenar la información de videos con estereovisión asociados con un sonido 3D, una coordenada un instante de tiempo y una dirección. El banco de sonidos está generado artificialmente, naturalmente o codificados según las preferencias del usuario y se realiza una transformación de la información 3D por los 35 P20130041624-03-2014 7 sonidos incluyendo la información sobre las coordenadas. Esta transformación exige un cambio de resolución del sonido, ya que es difícil que la acústica y la información a representar coincidan. La invención se basa en la creación de un mapa de sonidos codificados que representan 5 una zona del espacio 3D. Dado el proceso de adaptación, el sistema puede ser configurado inicialmente para representar al usuario una única línea horizontal de distancias que poco a poco podrá ir abriéndose en función de su capacidad y necesidad.La horizontalidad de dicha línea (o área) permite una correcta interpretación a la hora de la toma de decisiones con relación sobre el camino a elegir por el usuario. El sistema permite al usuario modificar y 10 adaptar a su capacidad y necesidad en número de líneas horizontales, la anchura del campo de visión, y el número de tipos de sonidos simultáneos que dese utilizar en función del criterio o entorno de uso. El uso del dispositivo en una mesa frente a un plato con comida puede modularse en función de colores, pero su uso en navegación en una calle puede codificarse en función de criterios adaptados al riesgo. 15 El usuario, escucha un conjunto de sonidos que provienen de un banco, de tal forma que si se desea representar una línea o haz de distancias se reproduce un sonido desde cada uno de los puntos. . Actualmente la forma de percibir el sonido suele ser la de unos pequeños altavoces o unos auriculares, si bien, la tecnología permite utilizar sistemas muy similares a 20 unos auriculares que transmiten el sonido a través del hueso denominados auriculares de conducción ósea, o incluso permite representar el sonido a través de determinado implantes denominados “implantes cocleares”. En cualquier caso, el sistema utiliza un tipo de sonido en el que la alteración de las frecuencias que llegan a un lado u otro de la cabeza es lo que hace que el cerebro intérprete la procedencia de dicho sonido. 25 El dispositivo objeto de la invención comprende un cuerpo portable por un usuario; un dispositivo de captura de imágenes; dos auriculares; unos micrófonos; un sistema de orientación y posicionamiento, un procesador; una memoria; uno o más programas en el que el o los programas están almacenados en la memoria y configurados para ejecutarse 30 mediante el procesador. El dispositivo dispone además de un sistema de almacenamiento de audio para ser almacenado y/o procesado. A diferencia de cualquiera de los sistemas patentados con anterioridad, en éste, se incorporan dos micrófonos ubicados en cada patilla de las gafas, lo 35 P20130041624-03-2014 8 más cercano posible al oído, con el fin de poder corregir y solapar el sonido ambiental. De esta forma el dispositivo puede, en función de la configuración que el usuario establezca. De esta forma el usuario puede optar por restar el sonido ambiental, amplificar el sonido ambiental o ajustar el volumen de representación de espacio con sonidos virtuales en función del ruido ambiental. Se debe apuntar la posibilidad de incrementar el número de 5 micrófonos podría permitir abordar nuevas estrategias futuras de detección de obstáculos. La forma en la que se codifiquen los sonidos puede permitir el uso de un único altavoz o, en el caso de funciones de transferencia, el uso de auriculares. 10 La codificación de estos sonidos puede ser definida por el usuario, si bien la generación de sonidos mediante funciones de transferencia facilita la detección de la procedencia de estos, así como el tiempo de aprendizaje. En una realización práctica adicional es posible el uso de un mayor número de planos y de 15 bancos de sonido que pueden ser además usados individualmente o en combinación. De esta forma se puede llegar a representar un área de sonidos en el espacio con la que representar formas, objetos o superficies. La combinación de diferentes tipos de sonidos que proceden de las mismas coordenadas permite asociar sonidos a colores o texturas, así como codificar determinados sonidos para hacer referencia a objetos específicos. 20 Es un objeto de la invención la representación de líneas tridimensionales (u objetos) mediante la representación de estos por puntos que se denominarán en la presente descripción como audio-píxeles. El uso de estos bancos de sonidos permite la representación espacial de objetos reales. La combinación de diferentes bancos de sonido, 25 y el uso combinado de ellos permitirá además la representación de colores y texturas de manera análoga a la descomposición y tramado de imágenes usada por los sistemas de impresión CMYK. La procedencia de los objetos a representar puede ser tanto basada en la detección de 30 estos por sistemas láser, radar, estereovisión, sonar y otros. Así como el tipo de sonidos y la representación de estos mediante auriculares y funciones de convolución pueden permitir una mayor calidad en la resolución acústica y en facilitar la detección de las coordenadas de procedencia de los sonidos. 35 P20130041624-03-2014 9 Sea cual sea el banco de sonidos, el usuario al final transformará, de manera natural, el sonido por una posición espacial, en caso de sonidos no generados mediante funciones de transferencia exigirá un proceso de entrenamiento mayor que aquellos en los que la representación de ellos se genere o represente mediante las funciones de transferencia que simulan precisamente cómo llega el sonido desde una posición espacial a cada oído. 5 Cabe decir que este dispositivo, al disponer de dos cámaras de calidad, un sistema de grabación de sonido y un elemento de almacenamiento, puede utilizarse de manera directa como una cámara de grabación, almacenamiento o emisión de video “on line” en tres dimensiones ni ningún tipo de adaptación. Teniendo en este caso una aplicación tanto para 10 comunicaciones con video en 3D como para almacenamiento. La diferencia sustancial de este sistema de grabación 3D frente a un sistema estándar de dos cámaras se debe al sistema de grabación basado en dos auriculares ubicados cerca de los oídos y colocados en unas gafas, junto con un sistema de orientación y posicionamiento sincronizado, que permitiría el volcado de video con una información geoespacial de la posición, instante y 15 coordenada en la que cada fotograma ha sido adquirido.. A lo largo de la descripción y las reivindicaciones la palabra "comprende" y sus variantes no pretenden excluir otras características técnicas, aditivos, componentes o pasos. Para los expertos en la materia, otros objetos, ventajas y características de la invención se 20 desprenderán en parte de la descripción y en parte de la práctica de la invención. Los siguientes ejemplos y dibujos se proporcionan a modo de ilustración, y no se pretende que restrinjan la presente invención. Además, la presente invención cubre todas las posibles combinaciones de realizaciones particulares y preferidas aquí indicadas. 25 BREVE DESCRIPCIÓN DE LAS FIGURAS A continuación se pasa a describir de manera muy breve una serie de dibujos que ayudan a comprender mejor la invención y que se relacionan expresamente con una realización de dicha invención que se presenta como un ejemplo no limitativo de ésta. 30 FIG 1. Muestra una vista del dispositivo para el análisis y representación espacial, aquí preconizado. FIG 2. Muestra una serie de vistas de un segunda realización práctica del dispositivo, aquí preconizado. 35 P20130041624-03-2014 10 FIG 3. Muestra una representación esquemática del método de representación espacial mediante sonidos aquí presentado. REALIZACIÓN PREFERENTE DE LA INVENCIÓN 5 En las figuras adjuntas se muestra una realización preferida de la invención. Más concretamente, el sistema de análisis y representación espacial de un espacio, y su método de uso, está caracterizado esencialmente por comprender un primer cuerpo (1) con forma de gafa, que incorpora al menos un dispositivo de captura de imágenes (2), como por ejemplo una cámara de estereovisión asociados a, al menos, un procesador (4) configurado 10 para analizar las imágenes obtenidas y generan un mapa de profundidades que consiste en una imagen de distancias entre el punto central del dispositivo (2) y los objetos enfocados por ellas. La incorporación de micrófonos se realizará en los puntos 1 y 5 de la figura 1. A su vez, el dispositivo comprende una memoria donde se almacenan las imágenes 15 adquiridas asociadas a valores de posición X, Y, Z y dirección de cada uno de los fotogramas. El sistema aquí preconizado podrá ser empleado para la grabación de imágenes con dispositivo de captura (2), así como la grabación de sonidos mediante al menos un 20 micrófono (5) situado en el primer cuerpo (1) en una posición cercana a las orejas del usuario. En una realización práctica no limitativa, el sistema contará con un reloj, dos cámaras de estereovisión, junto con sistema de dirección o acelerómetro y un sistema de localización 25 GPS. De esta manera el sistema procesa cada imagen conociendo la posición desde donde se obtiene y la dirección, y puede, en caso de ser archivado, conocer el instante en el que dicha imagen ha sido adquirida. El sistema de adquisición de la información tridimensional basada en la estereovisión 30 permite un procesado rápido de una información obtenida mediante un sistema de bajo coste. Ello no debe suponer la exclusión de utilizar otras tecnologías alternativas o complementarias, como el radar, para la adquisición de dicha información tridimensional. 35 P20130041624-03-2014 11 A su vez, en una realización práctica no limitativa el procesador (4) se comunicará con el dispositivo de captura de manera inalámbrica, de manera que el sistema, la comunicación, el procesado de la información y la generación de un sonido que representa la salida pueden utilizar todos ellos o parte de ellos, comunicaciones inalámbricas. De esta manera la unidad completa puede aligerarse debido a que no es necesario que todo el procesamiento 5 se realice en una misma unidad. Esta comunicación inalámbrica no sólo nos permite enviar y recibir la información de los equipos o partes de este que debe llevar encima o cerva el usuario, sino que además, permite la comunicación con equipos u ordenadores lejos del usuario que lleva el sistema. Esta comunicación y transferencia de información o forma de distribuir la carga del trabajo informático entre el sistema que llevamos y uno o varios 10 sistemas remotos son aplicables en la invención. La posibilidad de realizar y operar remotamente ejecutándose determinados programas en lo que se conoce hoy como “procesamiento en nube”. El dispositivo una vez que ha generado dicho mapa, sustituye cada punto de distancia por 15 un sonido que se percibe como ubicado en dicho punto, y percibido por el usuario mediante el empleo de al menos un auricular (3). Dado que la capacidad del oído es limitada, el procesador (4) decide y ordena la secuencia de sonidos a reproducir así como el orden y el número de repeticiones de cada uno de ellos, 20 con el fin de que pueda ser percibido y entendido como proveniente del punto del espacio que se desea representar. Para un correcto funcionamiento el sistema repite el proceso un número de veces por segundo lo suficientemente alto para la navegación del usuario, que oscilará 25 preferentemente entre las 5 y las 25 imágenes por segundo. La cantidad de alteraciones en las imágenes, el tipo de estas y el número de cambios en las imágenes afectan directamente a la forma en la que se genera el sonido. El sistema puede estar configurado con intervalos de espacio entre dos sonidos que dependa de la capacidad 30 auditiva del usuario, y de la misma manera que se pueden generar sonidos de diferentes características en función de los colores o texturas que los objetos que se representan. En esta realización particular, el algoritmo de procesado de imagen por estereovisión será el algoritmo de Birchfiel y Tomasi[S. Birchfield, C. Tomasi. "Depth discontinuities by pixel-to-35 P20130041624-03-2014 12 pixel stereo" International Journal of Computer Vision, 17(3), pp 269-293.Dec 99)]consistente en la obtención de mapas de profundidad mediante la utilización de programación dinámica, con la ventaja, de que dicho algoritmo es capaz de proveer resultados en tiempo real. 5 Cabe mencionar que la tecnología actual permite la ejecución procesado y almacenamiento de información en una unidad local, a modo de unidad central de proceso, o puede realizarse en un lugar remoto, al que se le envía la información directamente y este devuelve el resultado, que es lo que se denomina en la actualidad “procesamiento en nube”. Sea cual sea el lugar en donde se realice la transformación de la información adquirida en 10 información sonora, el proceso no se ve alterado por el tipo de unidad o el lugar donde esta unidad se encuentre con relación al usuario. Los sistemas de representación acústica del espacio pueden ser obtenidos directamente mediante la grabación de sonidos en un estudio sobre un maniquí con un micrófono en cada 15 oreja, de manera que bastaría representar el sonido grabado, o bien pueden ser generados digitalmente alterando sus frecuencias para simular como llegan a cada oído y utilizando este tipo de sonidos como base de sonidos. La forma en la que se codifiquen los sonidos puede permitir el uso de un único altavoz, 20 aunque en el caso de funciones de transferencia, el uso de dos auriculares, altavoces o fuentes de sonido es necesario. En cualquier caso, una vez obtenida y procesada la información, el dispositivo puede incluso representar co complementar la representación de la información mediante el uso de dispositivos complementarios conocidos, como puede ser una pantalla táctil o un sistema que emite vibraciones de diferente intensidad en función de 25 la orientación del usuario o de un periférico. En cuanto a la estrategia o el orden de secuenciación de sonidos para su representación espacial, será configurable en base al menos a cuatro estrategias que pueden combinarse: a) de derecha a izquierda o izquierda a derecha secuencialmente; b) de cercano a lejano sin 30 criterio de repeticiones; c) con criterio de proximidad y repeticiones de manera que cuanto más cercano, más se repite el número de veces que se representa; d) criterio de formas por las que se representan los bordes o contornos de los objetos con más frecuencia que el resto. 35 P20130041624-03-2014 13 El sistema es capaz de almacenar una imagen tridimensional junto con la información del instante, coordenada de la foto y dirección de la imagen, a la vez que un sistema de micrófonos ubicados cerca de cada oreja. El conjunto de esta información almacenada permite el uso del sistema como una herramienta de grabación de películas con estereovisión y sonido de alta calidad. 5 P20130041624-03-2014 2 DEVICE AND METHOD OF ANALYSIS, STORAGE AND SPACE REPRESENTATION THROUGH SOUNDS DESCRIPTION 5 The present invention aims at a device configured to obtain three-dimensional images with sound along with their coordinates and acquisition direction; to subsequently be processed the information and represented in different ways, its transformation into acoustic maps its main objective for blind users. The device object of the invention has been designed and integrated into the sides of glasses, and includes cameras, microphones, headphones and a processing unit. The information acquired is likely to be monitored to help guide people with other disabilities or particularities, and can be transmitted or stored for further manipulation or analysis. 15 In the case of being applied to blind users, the system represents space through the transformation into sounds. The sounds are heard through headphones; These sounds are perceived by the user as coming from the surfaces of the objects that are represented. The number of sounds, the movement or the user's ability will allow to represent objects with sounds either in their entirety, or by sampling them. In this way, the user conceives the space where it is located and his brain reconstructs the shape of this space from these sounds. In the case of storage for information processing. STATE OF THE TECHNIQUE 25 At present, the desire for the integration of blind people or with their reduced visual capacity and the existence of advanced vision technology, causes various proposals aimed at improving the lives of these people to be studied, designed and created. improving their self-sufficiency in the field of their own mobility, so it is almost essential to find and consolidate methods, systems and / or devices to help perceive the environment around them. There is a large amount of documentation about solutions to this technical problem, such as US patent US2009122161, which has two 35 P20130041624-03-2014 3 cameras at a predetermined distance configured to capture and transmit images, and a treatment system connected to the cameras that is configured to create a three-dimensional topographic map of the viewing area by comparing your images and taking into account the distance between them, performs a modulation in tones and frequencies dynamically. 5 The use of stereovision (set of two calibrated cameras) is known and its processing for the generation of three-dimensional information is similar and in any case it is widely known in the scientific community. The mode and information acquired are key to an adequate representation and interpretation of the information, and this is a key and differentiating element. The use of sounds, on the other hand, is not usually used as a tool for the subsequent processing of the information obtained. Specifically, the system object of the invention has a system that positions and orientates it in each image, so that each pair of frames or stereoscopic pair has each image, of the instant, direction and inclination of both. Since the device has been created and located in glasses, the device includes two microphones located on each side of the glasses on the pin. This is what will allow to generate a three-dimensional surface in its orientation and correct absolute 3D coordinate. This absolute information together with the measurement of the degree of inclination of both cameras with the horizontal line 20 of the terrain allow to create information that allows it to be stored and used later. The information acquired by the system and processed generates a surface that can be stored transmitted as an image, as a surface or transformed into a sound function. The information stored in audio and video, has the complementary information of the moment, coordinate and address. The sound representation of the space is done through a direct call that replaces each coordinate point x, y, z detected by the cameras, in a “virtual” sound that the user perceives as if it had originated in the reference coordinate. The set of these sounds allows the blind person to make a mental reconstruction of the surface. Specifically, the system uses a sound function generated in real time or stored in a sound or data bank called HRTF and that the user perceives as sound coordinates coming from the surface, and that the brain interprets as such. 35 P20130041624-03-2014 4 The use of a certain combination of sound maps generated with HRTF allows sounds to be associated with tones, allowing risk elements or obstacles to be encoded. The acronym HRTF is known in the technical and scientific field and corresponds to Head Related Transfer Function, are mathematical functions or equations that relate how to modify a sound that is generated from a certain point of space to function 5 of whether they reach a position determined, and specifically whether they reach the right ear or the left ear. In a person with normal visual ability, the perception of depth is generated by stereovision of the area of vision that is observed. The brain interprets and perceives a "three-dimensional area or surface." If you want to represent this space by sounds, it is possible to choose different strategies or ways to do it. The representation by means of a language is one of them (Example: “chair one meter ahead”), however, what is proposed and used in this invention is based on a type of sounds associated with a mathematical function so that the that reaches one ear is different from the one that reaches the other, and that the user perceives how if they had been emitted from a certain point in space. If we have the 3D information that we want to represent, it is possible to represent all part of it. Since the perception of sound and the number of sounds that a person can interpret for each unit of time is limited and depends on the type of sound, hearing and training capacity, the amount of information or number of points per second that They can be represented with sounds is limited. This "limitation" means that the system must choose in what way, with what part of the space and what area of the space it is decided to represent, and the user may, depending on the context and their specific need, choose one of them, such as, for example, a horizontal plane as a "cane", a "front cone" or a "rectangle as a 25-screen or TV area". It is known that the use of stereo vision or artificial vision are significantly affected by lighting (at night without light does not work), the use of coordinate and direction information is a complement that allows the generation of the 3D map from The query in a data bank of the objects, as well as allows coding fixed furniture that exists, in addition to being able to guide the user with sounds towards the required address. The Spanish patent ES2133078 describes a system for the creation of a virtual acoustic space, in real time, from the information provided by a system of P20130041624-03-2014 5 artificial vision. An optical-electronic system captures the user's physical environment through stereovision and visual support. The capture of images is done by means of two micro cameras, stereovision is calculated, virtual sounds will be generated in a portable unit and acoustic signals are sent to headphones, so that the subject obtains information about the area of the space to be identified. This system recognizes objects and textures with the 5 cameras and generates sounds through functions known in the technical field already mentioned and called HRTF. It has already been commented that the use of a stereovision system and its combination with HRTF functions are known in the international scientific community and have been used in public research projects such as the European CASBLIP project (Acronym for the Cognitive Aid for Blind project People, www.casblip.es). Unlike them and the CASBLIP project, the system presented in this document has a calibration and alignment of each pair of images, microphones near the ears and a stored sound bank. These three elements allow to correct the distortions of 15 distance to objects derived from the inclination of the cameras, to represent more quickly the surfaces and not necessarily with HRTF functions (but with any other type of sounds chosen by the user) and compensate individually the volume and sound represented according to the ambient sound independently for each headset. 20 The knowledge of the coordinates and direction of the images allows both the representation and the correction of alignment errors due to camera parallelism issues (since stereovision cameras are carried by the user, their horizontality greatly affects recognition and it is key so that the sound representation of the origin of the coordinate is always relative to the body of the user 25. The fact of placing a stereovision system, angled horizontally (it is usual in the case of being placed, for example in glasses), must corrected with a system that measures the direction of the images, since small inclination errors can alter the representation and distance of the objects to the user.30 WO2011036288 refers to a device to help individuals with visual impairments, based on the three-dimensional identification and representation of objects and their processing for conversion to information tactile or auditory In our case, we consider that the possible common part with the previous device is today of public knowledge in the scientific community. The possibility of having a system that allows 35 P20130041624-03-2014 6 create a three-dimensional surface of an object, it can be directly represented on a 3D device. WO03107039 discloses an apparatus and a method to help a person with visual or blind deficiencies to detect, identify and avoid objects in the visual field of the surroundings. The apparatus includes electro-optical devices for detecting and identifying objects, a control unit and a vocal representation unit that provides audible signals on objects that are in the visual field. The detection unit can be a CCD sensor, a laser imaging sensor, a radar sensor or an electromagnetic radiation sensor. 10 DESCRIPTION OF THE INVENTION In the state of the art, a multitude of systems are described which, as a general rule, transform a 3D image into a variety of sounds. Thus, for example, in ES2133078 15 we talk about 3D recognition and generating a sound that comes from the focus of the object. However, with this solution it is impossible to combine 3D recognition and sounds in real time without knowing the acoustic resolution or the number of sounds per unit of time. The present invention solves the technical problems derived from the processing of images and their transformation to sound through the use of a sound bank that allows: a) To make a direct transformation. b) Increase the number of audio frames per second, from the reduction of the sound resolution. c) Compensate errors by horizontal vision. 25 d) Compensate for ambient noise automatically. e) Compensate for low light intensity situations by replacing real 3D maps with object files stored on a map. f) Allow to communicate coordinates position and direction of capture to users through communication devices that support it (and that today exist in the market). 30 g) Store the information of videos with stereovision associated with a 3D sound, an instant time coordinate and an address. The sound bank is generated artificially, naturally or coded according to the user's preferences and a transformation of the 3D information is carried out by the 35 P20130041624-03-2014 7 sounds including coordinate information. This transformation requires a change in the resolution of the sound, since it is difficult for the acoustics and the information to be represented to coincide. The invention is based on the creation of a map of coded sounds representing a zone of 3D space. Given the adaptation process, the system can be initially configured to represent the user a single horizontal line of distances that can gradually be opened depending on their capacity and need.The horizontality of said line (or area) allows a correct interpretation when making decisions regarding the path to be chosen by the user. The system allows the user to modify and adapt to his capacity and need in number of horizontal lines, the width of the field of vision, and the number of types of simultaneous sounds that he wishes to use depending on the criteria or environment of use. The use of the device on a table in front of a plate with food can be modulated according to colors, but its use in navigation on a street can be coded according to criteria adapted to the risk. 15 The user listens to a set of sounds that come from a bank, so that if you want to represent a line or distance beam a sound is played from each of the points. . Currently the way to perceive the sound is usually that of small speakers or headphones, although the technology allows to use very similar systems to 20 headphones that transmit the sound through the bone called bone conduction headphones, or even allows to represent the sound through certain implants called "cochlear implants". In any case, the system uses a type of sound in which the alteration of the frequencies that reach one side or another of the head is what causes the brain to interpret the origin of that sound. The device object of the invention comprises a portable body by a user; an image capture device; two headphones; some microphones; an orientation and positioning system, a processor; a memory; one or more programs in which the program (s) are stored in memory and configured to run through the processor. The device also has an audio storage system to be stored and / or processed. In contrast to any of the previously patented systems, two microphones are incorporated in each of the glasses' pins, the P20130041624-03-2014 8 closest to the ear, in order to correct and overlap the ambient sound. In this way the device can, depending on the configuration that the user establishes. In this way the user can choose to subtract the ambient sound, amplify the ambient sound or adjust the volume of space representation with virtual sounds based on the ambient noise. It should be noted the possibility of increasing the number of 5 microphones could allow addressing new future obstacle detection strategies. The way in which sounds are encoded may allow the use of a single speaker or, in the case of transfer functions, the use of headphones. 10 The coding of these sounds can be defined by the user, although the generation of sounds through transfer functions facilitates the detection of their origin, as well as the learning time. In a further practical embodiment it is possible to use a larger number of planes and 15 sound banks that can also be used individually or in combination. In this way you can get to represent an area of sounds in space with which to represent shapes, objects or surfaces. The combination of different types of sounds that come from the same coordinates allows you to associate sounds with colors or textures, as well as encode certain sounds to refer to specific objects. It is an object of the invention to represent three-dimensional lines (or objects) by representing them by points that will be referred to herein as audio-pixels. The use of these sound banks allows the spatial representation of real objects. The combination of different sound banks, 25 and the combined use of them will also allow the representation of colors and textures in a manner analogous to the decomposition and screening of images used by CMYK printing systems. The origin of the objects to be represented can be based on the detection of these by laser systems, radar, stereovision, sonar and others. As well as the type of sounds and the representation of these by means of headphones and convolution functions they can allow a better quality in the acoustic resolution and in facilitating the detection of the coordinates of origin of the sounds. 35 P20130041624-03-2014 9 Whatever the sound bank, the user will eventually transform the sound naturally by a spatial position, in case of sounds not generated by transfer functions will require a training process greater than those in which the representation of they are generated or represented by transfer functions that simulate precisely how sound arrives from a spatial position to each ear. 5 It is worth mentioning that this device, having two quality cameras, a sound recording system and a storage element, can be used directly as a three-dimensional “on-line” video recording, storage or broadcast camera nor any type of adaptation. Having in this case an application for both communications with 3D video and storage. The substantial difference of this 3D recording system compared to a standard two-camera system is due to the recording system based on two headphones located near the ears and placed in glasses, together with a synchronized orientation and positioning system, which would allow the video dump with a geospatial information of the position, moment and coordinate in which each frame has been acquired .. Throughout the description and the claims the word "comprises" and its variants are not intended to exclude other technical characteristics, additives, components or steps. For those skilled in the art, other objects, advantages and features of the invention will be derived partly from the description and partly from the practice of the invention. The following examples and drawings are provided by way of illustration, and are not intended to restrict the present invention. In addition, the present invention covers all possible combinations of particular and preferred embodiments indicated herein. BRIEF DESCRIPTION OF THE FIGURES Next, a series of drawings that help to better understand the invention and that expressly relate to an embodiment of said invention which is presented as a non-limiting example thereof is described very briefly. 30 FIG 1. Shows a view of the device for spatial analysis and representation, here recommended. FIG 2. Shows a series of views of a second practical embodiment of the device, here recommended. 35 P20130041624-03-2014 10 FIG 3. Shows a schematic representation of the spatial representation method using sounds presented here. PREFERRED EMBODIMENT OF THE INVENTION 5 A preferred embodiment of the invention is shown in the attached figures. More specifically, the spatial analysis and representation system of a space, and its method of use, is essentially characterized by comprising a first body (1) shaped like a spectacle, which incorporates at least one image capture device (2), such as a stereovision camera associated with at least one processor (4) configured 10 to analyze the images obtained and generate a depth map consisting of an image of distances between the center point of the device (2) and the objects focused on them. The incorporation of microphones will be carried out at points 1 and 5 of Figure 1. In turn, the device comprises a memory where the acquired images associated with values of position X, Y, Z and address of each of the devices are stored. frames. The system recommended here may be used for the recording of images with capture device (2), as well as the recording of sounds by means of at least one microphone (5) located in the first body (1) in a position close to the ears of the user. In a practical non-limiting embodiment, the system will have a clock, two stereovision cameras, along with a steering or accelerometer system and a GPS tracking system. In this way the system processes each image knowing the position from which it is obtained and the address, and can, in case of being archived, know the moment in which said image has been acquired. The three-dimensional information acquisition system based on stereovision 30 allows rapid processing of information obtained through a low-cost system. This should not imply the exclusion of using other alternative or complementary technologies, such as radar, for the acquisition of such three-dimensional information. 35 P20130041624-03-2014 11 In turn, in a practical, non-limiting embodiment, the processor (4) will communicate with the capture device wirelessly, so that the system, communication, information processing and generation of a sound representing the Output can use all of them or part of them, wireless communications. In this way the entire unit can be lightened because it is not necessary that all processing 5 be carried out in the same unit. This wireless communication not only allows us to send and receive the information of the equipment or parts of it that the user must carry or keep, but also allows communication with equipment or computers away from the user who carries the system. This communication and transfer of information or way of distributing the burden of computer work between the system we carry and one or more 10 remote systems are applicable in the invention. The possibility of performing and operating remotely by executing certain programs in what is known today as “cloud processing”. The device once it has generated said map, replaces each distance point with a sound that is perceived as located at that point, and perceived by the user through the use of at least one headset (3). Since the capacity of the ear is limited, the processor (4) decides and orders the sequence of sounds to be reproduced as well as the order and number of repetitions of each of them, 20 so that it can be perceived and understood as coming from the point of space that you want to represent. For proper operation the system repeats the process a number of times per second high enough for user navigation, which will preferably range between 5 and 25 images per second. The amount of alterations in the images, the type of these and the number of changes in the images directly affect the way in which the sound is generated. The system can be configured with intervals of space between two sounds that depend on the user's hearing capacity 30, and in the same way that sounds of different characteristics can be generated depending on the colors or textures that the objects that are represented. In this particular embodiment, the image processing algorithm by stereovision will be the Birchfiel and Tomasi algorithm [S. Birchfield, C. Tomasi. "Depth discontinuities by pixel-to-35 P20130041624-03-2014 12 stereo pixel "International Journal of Computer Vision, 17 (3), pp 269-293.Dec 99)] consisting of obtaining depth maps by using dynamic programming, with the advantage that said algorithm is capable of provide real-time results 5 It is worth mentioning that current technology allows the processing and storage of information in a local unit, as a central processing unit, or it can be carried out in a remote place, to which the information is sent directly and this returns the result, which is what is currently called “cloud processing.” Whatever the place where the transformation of the information acquired into sound information is carried out, the process is not altered by the type of unit or the place where this unit is in relation to the user. The acoustic representation systems of the space can be obtained directly by recording sounds in an e study on a dummy with a microphone in every 15 ears, so that it would be enough to represent the recorded sound, or they can be generated digitally by altering their frequencies to simulate how they reach each ear and using this type of sound as the basis of sounds. The way in which the sounds are encoded may allow the use of a single speaker, 20 although in the case of transfer functions, the use of two headphones, speakers or sound sources is necessary. In any case, once the information is obtained and processed, the device can even represent how to complement the representation of the information through the use of known complementary devices, such as a touch screen or a system that emits vibrations of different intensity depending on 25 the orientation of the user or a peripheral. As for the strategy or order of sound sequencing for its spatial representation, it will be configurable based on at least four strategies that can be combined: a) from right to left or left to right sequentially; b) from near to far without repetition criteria; c) with proximity criteria and repetitions so that the closer, the more the number of times it is repeated is repeated; d) criterion of ways by which the edges or contours of objects are represented more frequently than the rest. 35 P20130041624-03-2014 13 The system is capable of storing a three-dimensional image along with the instant information, photo coordinate and image direction, as well as a microphone system located near each ear. The set of this stored information allows the use of the system as a movie recording tool with stereovision and high quality sound. 5 P20130041624-03-2014

Claims

14 CLAIMS 1.- Method of analysis, storage and spatial representation by means of sounds comprising: capturing an area of three-dimensional space by means of a device for capturing images (2); extract distance information from objects to the image capture device (2); generate a three-dimensional map of the captured objects, said objects being defined by their coordinates and their distance to the capture device (2); characterized in that it comprises the steps of: i) providing a sound bank consisting of a set of sounds set by a user; ii) associate 10 each point of the space defined by its coordinates with at least one sound from the sound bank; iii) create a map of coded sounds representing a plane of the captured three-dimensional space; and iv) sequentially reproduce, in a sound reproduction element of the earpiece type (3), a sound from each of the points defined on the map so that space is represented by a single horizontal line or beam 15 of distances . 2. Method according to claim 1 wherein the representation of the space is done by several horizontal lines. 3. Method according to claim 1 wherein the representation of the space is done by means of a representation area. 4.- Spatial analysis and representation device comprising: a body (1) portable by a user; an image capture device (2); a headset (3); a processor (4); a memory; one or more programs in which the program (s) are stored in the memory and configured to be executed by means of the processor (4) including the programs instructions to execute the method of claim 1. 5. Device according to claim 4 in where it incorporates at least a 30 microphone (5). 6. Device according to claim 4 wherein it incorporates an accelerometer. 35 P20130041624-03-2014 7. Device according to claim 4 wherein it incorporates a GPS tracking system. 8. Device according to claim 4 wherein the processor (4) communicates with the capture device (2) wirelessly. 5. Device according to claim 4 wherein the space capture system is made by one or more radars. 10. Device according to claim 4 wherein a system of communication with the user with a touch device is included. 11. Device according to claim 4 wherein the headphones are replaced by sound transmission devices is performed by bone conduction headphones or by cochlear implants. 12. Device according to any of claims 4-11 wherein each frame recorded has the information of the moment, position and direction in which it has been taken; and in which the sound is stored by two microphones positioned near each of the ears of the user who is making the recording. 20 25 30 P20130041624-03-2014