WO2014013447A2

WO2014013447A2 - Social network functionalities for visually impaired users

Info

Publication number: WO2014013447A2
Application number: PCT/IB2013/055872
Authority: WO
Inventors: Menahem David SMADJA; Joseph SMADJA
Original assignee: Ayin Seventies Limited
Priority date: 2012-07-19
Filing date: 2013-07-17
Publication date: 2014-01-23
Also published as: WO2014013447A3

Abstract

A method of audibly representing a plurality of social media posts. The method comprises monitoring a plurality of social media posts uploaded in a social media page of a user of a social network, identifying an image in at least one of the plurality of social media posts, coding an audibly coded signal from at least a portion of the image according to a visual-to-auditory function, and outputting the audibly coded signal for presentation to the user.

Description

SOCIAL NETWORK FUNCTIONALITIES FOR VISUALLY IMPAIRED USERS

BACKGROUND

The present invention, in some embodiments thereof, relates to visually impaired communication and, more specifically, but not exclusively, to methods and systems for providing social network functionalities to the visually impaired.

Living with a sensory impairment is challenging and those who have lost the use of one sensory modality need to find ways to deal with numerous problems encountered in daily life. In today's life, vision is a basic sensory in computer based communication. Nevertheless, the blind manage to function efficiently in this environment using computer access software, designated input devices, speech recognition/voice controlled systems and other designated products for the visually impaired.

During the last years a multimodal auditory interface which permits blind users to work more easily and efficiently with browsers have been developed, see Patrick Roth et al, Auditory browser for blind and visually impaired users. For example, a macro-analysis phase, which can be either passive or active, may be used to explore elements in a global layout of hypertext markup language (HTML) documents. Such an interface is based on: (1) a mapping of the graphical HTML document into a 3D virtual sound space environment, where non-speech auditory cues differentiate HTML elements; (2) the transcription into sound not only of text, but also of images; and (3) the use of a touch- sensitive screen to facilitate user interaction. Moreover, in order to validate the sonification model of the images, we have created an audio "memory game", which can be used as a pedagogical tool to help blind pupils learn spatial exploration cues.

SUMMARY

According to some embodiments of the present invention, there is provided a method of audibly representing a plurality of social media posts. The method comprises monitoring a plurality of social media posts uploaded for in a social media page of a user of a social network, identifying an image in at least one of the plurality of social media posts, coding an audibly coded signal from at least a portion of the image according to a visual-to-auditory function, and outputting the audibly coded signal for presentation to the user.

Optionally, the outputting comprises sequentially sounding the audibly coded signal and at least one another audibly coded signal representing textual content from the plurality of social media posts.

Optionally, the method further comprises receiving a textual comment from the user and adding the textual comment to the at least one post.

More optionally, the receiving comprises recording vocal comment given by the user with regard to the image and converting the vocal comment to create the textual comment.

Optionally, the identifying comprises analyzing the image to emphasize at least one body language indication therein.

Optionally, the plurality of social media posts are designated to update a dynamic field of the social media page in real time.

Optionally, the plurality of social media posts comprises at least one of an instant messaging (IM) message, a messages with an image attachment.

Optionally, the audibly coded signal is a time-varying spectral representation of the portion.

Optionally, the method further comprises receiving from a user instructions for audibly representing the social media content and performing the coding and the outputting in response to the user instructions.

Optionally, the coding comprises combining the audibly coded signal with at least one another audibly coded signal which is indicative of a textual content extracted from the at least one post.

Optionally, the coding comprises combining the audibly coded signal with at least one another audibly coded signal which is indicative of an identity of another user which uploads the at least one post.

Optionally, the monitoring is continuously performed to monitor the social media content and to sound the audibly coded signal in real time.

Optionally, the method further comprises converting, using a speech-to-text function, a comment which is received from a user in relation to the image, and posting the comment with reference to the image. Optionally, the method further comprises analyzing the image to identify a body language indication; wherein the coding comprises adding an audible body language indication indicative of the body language indication to the audibly coded signal.

Optionally, the image is a profile picture of another user of the social network. Optionally, the plurality of social media posts are displayed when the social media page is displayed; further comprising receiving from the user instructions to perform the method when the social media page is accessed.

Optionally, the outputting comprises outputting an audio file which comprises the audibly coded signal and a plurality of other audibly coded signals each coded to audibly represent one of the plurality of social media posts.

According to some embodiments of the present invention, there is provided a method of sounding audible body language indication during a communication session. The method comprises monitoring a plurality of images depicting a certain of a plurality of users participating in a communication session, analyzing the plurality of images sequence to identify a body language expression of the first user, capturing an image depicting the body language expression, coding an audibly coded signal from at least a portion of the image according to a visual-to-auditory function, and outputting the audibly coded signal for presentation to at least one another of the plurality of users during the communication session.

Optionally, the method further comprises establishing a multidirectional audio call between the certain user and the at least another user and sounding the audibly coded signal to the at least another user while the multidirectional audio call is taking place.

Optionally, the method is performed for each of the plurality of users during the communication session.

Optionally, the capturing, coding, and outputting are repeated in a plurality of iterations during the communication session.

More optionally, the plurality of iterations are set so that the capturing, coding, and outputting repeatedly occur every predefined period.

More optionally, the analyzing comprises detecting a change in a body language of the first user; wherein each the iteration is induced by the change detection. More optionally, the coding comprises adding an audible tag to the audibly coded signal, the audible tag is indicative of the identity of the certain user.

According to some embodiments of the present invention, there is provided a system for audibly representing a plurality of social media posts. The system comprises a processor, a monitoring module which monitors a plurality of social media posts uploaded for in a social media page of a user, a visual-to-auditory module which codes, using the processor, an audibly coded signal, from at least a portion of an image provided at least one of the plurality of social media posts according to a visual-to- auditory function, and a presentation module which induces a sounding of the audibly coded signal while presenting the plurality of social media posts to the user.

Optionally, the presentation module presents a user interface to allow a user to provide instructions indicative of activating and deactivating the visual-to-auditory module.

Optionally, the monitoring module and the visual-to-auditory module are installed on a proxy that communicates with a client terminal to instruct the sounding.

Optionally, the sounding is performed by a browser installed on a client terminal that instructs the sounding.

Unless otherwise defined, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of embodiments of the invention, exemplary methods and/or materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and are not intended to be necessarily limiting.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

Some embodiments of the invention are herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of embodiments of the invention. In this regard, the description taken with the drawings makes apparent to those skilled in the art how embodiments of the invention may be practiced. In the drawings:

FIG. 1 is a system that allows visually impaired and blind users to socially connect with other visually impaired users and users who have normal vision, over computer networks, according to some embodiments of the present invention;

FIG. 2 is a flowchart of a method of audibly representing a plurality of social media posts in a social media page of a user, according to some embodiments of the present invention;

FIG. 3 is a schematic illustration of a conversation process, according to some embodiments of the present invention;

FIG. 4 is a flowchart of a method sounding audible body language indication during a communication session, according to some embodiments of the present invention;

FIGs. 5 A and 5B are schematic illustrations of entities of a system that enhances an exemplary communication session between users, according to some embodiments of the present invention;

FIG. 6 is an exemplary flow of data, wherein a system adapts, for example transcodes, visual media content received from a social network to audible media content that is sent to a client terminal, according to some embodiments of the present invention; and

FIG. 7 is a screenshot of an exemplary main page of an exemplary user of a social network wherein images are converted by a visual-to-auditory module and text is converted by a text-to-auditory module to create a combined version, according to some embodiments of the present invention. DETAILED DESCRIPTION

According to some embodiments of the present invention, there are provided methods and systems of encoding visual social media content, for example images of posts, as audibly coded signals. For example, the audible signals are coded by transforming visual information into spectral auditory information, such as soundscapes. Optionally, the visual information includes images, which are intertwined in a feed of up-to-date social media posts that is published in a social media page of a visually impaired user. In use, the visually impaired user may listen to the audibly coded signals to perceive the visual social media content. These audibly coded signals may be combined with audibly coded signals indicative of textual content. The textual content may include comments and/or statuses which are also extracted from the posts and/or from other related social media content.

Optionally, the methods and systems allow a visually impaired user to participate actively in a social media community that includes other visually impaired users and/or users with a functioning visual sensory system. For example, the visually impaired user may react to the visual social media content, for example by posting a comment, sending a message and/or establishing a communication session with the user who uploaded the visual social media content. The ability to react to visual social media content allows a user to communicate with users with a functioning visual sensory system in the manner they communicate when using a social network service. While one user uploads images and type text, the other user may react to the uploaded images and/or type text with a textual and/or vocal comment without seeing the uploaded images and/or type text.

According to some embodiments of the present invention, there are provided methods and systems of sounding audible body language indications which are indicative of body language expressions of a certain user to another user during a communication session therebetween. The communication session may be a voice chat, an instant messaging (IM) session, a video chat and/or the like. In these embodiments, a video sequence of the certain user may be analyzed in real time to extract images that depict current body language expressions of the certain user. This allows coding respective audibly coded signals which audibly indicate to other user(s) which body language expressions the imaged user currently expresses. The images may be taken periodically, upon request, and/or when a change in the body language of the imaged user is detected.

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or "system." Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD- ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

Reference is now made to FIG. 1, which is a social media adaptation system 100 that allows visually impaired and blind users, for brevity referred to herein as visually impaired users, to socially connect with other visually impaired users and users who have normal vision, for brevity referred to herein as seeing users, over computer networks, according to some embodiments of the present invention. The social media adaptation system 100 communicates with a plurality of client modules 141 which are installed, for example temporarily loaded by a browser and/or as a software component, in client terminals 109, such as laptops, desktops, tablets, and Smartphones, and/or client modules which are set as add-ons installed in browsers. The social media adaptation system 100 translates visual social media content to audibly coded signals in a manner that allows visually impaired users to understand visual social media content without perceiving it with his eyes and to react to the visual content substantially in the same context and/or manner a seeing user would have. In such a manner, visually impaired users may communicate with seeing users (i.e. users who have normal vision) without requiring from users who have normal vision to change their social network behavior.

The audibly coded signals are optionally time-varying spectral representations of the images. Though reference is made to an image, the audibly coded signal may be coded from any portion of the image, for example from a region of interest (ROI) that depicts a face and/or a happening. The ROI may be identified using methods known in the art.

In the exemplary implementation depicted in FIG. 1, the social media adaptation system 100 is implemented on a central server which includes a processor 131 and connected to social network servers 107 via one or more networks 105, such as the Internet. The social network servers 107 may be servers of social networks such as Facebook™, Google Plus+™,Badoo™ and/or the like. It should be noted that though the social media adaptation system 100 is implemented as a central server, that optionally functions as a proxy, it may be implemented as one or more components which are executed at the client side, for example in each client terminal, for example as a software, a browser, and/or a Java script loaded by a browser. The social media adaptation system 100 may also implemented as a software as a service (SaaS), providing, optionally inter alia, visual-to-auditory conversation services to client modules upon request.

The social media adaptation system 100 includes a monitoring module 132 that monitors real time social media content that is sent from the network servers 107, for example posts and/or messages (i.e. instant messaging (IM) messages, messages with image attachments, and/or the like) of socially connected users, which are updated in one or more dynamic fields of a social media webpage of a visually impaired user, such as 108. The monitoring module 132 may identify, based on this monitoring, one or more images which are uploaded by the socially connected users. As used herein, an image is an image file, a graphical element, a frame, an icon, an animation, for example a graphics interchange format (GIF) file and/or the like. The one or more images are optionally differentiated from textual posts which may be translated to voice generation instructions and/or to tactile system instructions, for example using a text-to-auditory module 134 or a text-to-tactile module 135 that is connected and operated by a client terminal, such as 109. The tactile system instructions may be implemented using a Braille module that codes instructions to present appropriate Braille characters based on the textual posts. It should be noted that the audible signal, which is referred to herein as an audibly coded signal, maybe any type of instructions for playing sound, for example, a message, a file, a link, a chunk of data and/or the like.

The social media adaptation system 100 further includes a visual-to-auditory module 136 that codes, using the processor 131; one or more audibly coded signals representing visual content are extracted from the one or more images. The visual-to- auditory module 136 may be operated according to a visual-to-auditory function, for instance as described below. This allows coding the audible signal(s) in a manner that allows the client module 141 to sound it, for example in combination with one or more text segments which are read from the same and/or related posts. For example, the client module 141 may be a browser add-on that embeds a layer with instructions from the social media adaptation system 100 on a social network webpage that is presented on the screen of the respective client terminal 109. In another example, the social media adaptation system 100 includes a managing module 137 which manages the encoding of existing social network content, for example from webpages loaded from any of the network servers 107, to social network content adapted for the visually impaired, for brevity referred to herein as visually impaired enabled social network content (VIESN content). The coding is optionally done using modules 134-136. For example, the VIESN content includes new posts of a social network service wherein text and images are converted by modules 134 and 136 to instructions of sounding audibly coded signals.

In another example, the client module 141 is a browser widget or a software module that receives a feed of data from the social media adaptation system 100, for example via a network interface. The feed includes the VIESN content. The widget and/or software component includes an application programming interface (API) that is designed to instruct presentation means, such as speakers, for example integral speakers of the client terminal 107, to sound the audibly coded signal(s).

The social media adaptation system 100 may be implemented as a software as a service (SaaS) that processes existing social media content of a third party. The social media adaptation system 100 may be implemented by one or more modules, which are installed in one or more servers of a social network service, which are optionally designated for visually impaired users and/or for users who have normal vision, over computer networks.

Reference is now also made to FIG. 2, which is a flowchart 200 of a method of audibly representing a plurality of social media posts in a social media page of a user, according to some embodiments of the present invention.

First, as shown at 201, social media content, such as posts, are monitored, for example in one or more dynamic fields of a social media webpage. For example, posts appearing in a dynamic field, such as most recent/top stories field in a Facebook™ webpage. This allows, a shown at 202, to identify which content is in the posts, for example one or more images, text segments, and/or the like. For example, when the social media content is a content of a third party social network the social network page of the visually impaired user, is parsed to identify which posts includes images. This may be indicated by an extensible markup language (XML)-based resource description file, such as an XML schema.

Now, as shown at 203, an audibly coded signal is coded for audibly representing each image according to a visual-to-auditory function. For example, the audibly coded signal is coded by a conversion program that transforms the visual information into spectral auditory information, such as soundscapes. The spectral auditory information is optionally a graph wherein the vertical axis (i.e. elevation of an object in an image) is represented by frequency, horizontal axis by time and stereo panning, while brightness of the image is encoded by loudness. Similar transformations which may be used are described in Capelle C, Trullemans C, Arno P, Veraart C (1998) A real-time experimental prototype for enhancement of vision rehabilitation using auditory substitution, IEEE Trans Biomed Eng 45(10): 1279-1293; Cronly-Dillon J, Persaud K, Gregory RP (1999), The perception of visual images encoded in musical form: a study in cross-modality information transfer. Proc Biol Sci 266(1436):2427-2433; and Cronly-Dillon J, Persaud KC, Blore R (2000) Blind subjects construct conscious mental images of visual scenes encoded in musical form. Proc Biol Sci 267(1458):2231-2238.

When a soundscape is coded, for example as described above, visual details at high resolution (up to 25,344 pixels, the resolution used here) are preserved. The images which are converted into soundscapes using an algorithm, such as the described above function, allow proficient users to differentiate the shapes of different objects, identify the actual objects, and also locate them in the depicted space, see Amedi A, Stern WM, Camprodon JA, Bermpohl F, Merabet L, et al. (2007) Shape conveyed by visual-to- auditory sensory substitution activates the lateral occipital complex. Nat Neurosci 10: 687-689, Auvray M, Hanneton S, O'Regan JK (2007) Learning to perceive with a visuoauditory substitution system: Localisation and object recognition with 'The vOICe' . Perception 36: 416-430, and Proulx MJ, Stoerig P, Ludowig E, Knoll I (2008) Seeing 'where' through the ears: effects of learning-by-doing and long-term sensory deprivation on localization based on image-to-sound substitution. PLoS ONE 3: el840. The functional basis of this visuoauditory transformation lies in spectrographic sound synthesis from an input image, which is then further perceptually enhanced through stereo panning and other techniques. Time and stereo panning constitute the horizontal axis in the sound representation of an image, tone frequency makes up the vertical axis, and loudness corresponds to pixel brightness.

In an exemplary visual-to-auditory function, the conversion and/or transformation employed optionally lies in spectrographic sound synthesis from any input image, which is then perceptually enhanced through stereo panning and/or similar techniques. Time and stereo panning constitutes the horizontal axis in the sound representation of an image, tone frequency makes up the vertical axis, and loudness corresponds to pixel brightness. In this example, visual information in the sound representations of complicated gray-scale images is preserved up to a resolution of about 60x60 pixels for a 1 second sound scan and a 5 kilohertz (kHz) audio bandwidth, see also Amedi A, Stern WM, Camprodon JA, Bermpohl F, Merabet L, Rotman S et al, Shape conveyed by visual-to-auditory sensory substitution activates the lateral occipital complex. Nat Neurosci 10(6):687-689 (2007) and Meijer PB, An experimental system for auditory image representations. IEEE Trans Biomed Eng 39(2): 112-121 (1992). In FIG. 3, which is a schematic illustration of a conversation process, an original picture 121 transformed to a spectrogram 122 coded by a function 123, for example as described in Amedi A, Stern WM, Camprodon JA, Bermpohl F, Merabet L, Rotman S et al (2007) Shape conveyed by AN auditory sensory substitution activates the lateral occipital complex. Nat Neurosci 10(6):687-689.

Optionally, the image is preprocessed before being a respective audible signal is coded therefrom. In such embodiments, selected features may be emphasized to increase their expression in the respective audibly coded signal. The features are optionally features with social importance, for example facial expressions, eye expressions (i.e. iris size) and/or gestures, such as hand and/or head gestures. In such embodiments, preprocessing emphasized the visibility of these features, see, for example, Sonou Lee, Kwangyun Wohn, Yoosoo Ahn, Sukki Lee, CFBOX(tm): Superimposing 3D Human Face on Motion Picture, Proceedings of the Seventh International Conference on Virtual Systems and Multimedia (VSMM'01), p.644, October 25-27, 2001. Additionally or alternatively, the image is analyzed to identify the above features, for example using known feature detection methods. Then, the image may be tagged with descriptive text that may be converted to an additional audibly coded signal by a text to speech module. In use, the tags may be sound before and/or after the sounding of the audibly coded signal that is coded as described above.

As shown at 204, this allows sounding the audibly coded signal, for example by a client terminal that hosts a browser accessing (i.e. in a push and/or a pull manner) the audibly coded signal.

According to some embodiments of the present invention, a visual-to-auditory function, such as the above described visual-to-auditory function, is used to sound an audible body language indication to a visually impaired user, for example an audibly coded signal, that is indicative of facial expressions, eye expressions (i.e. iris size) and/or gestures, such as hand and/or head gestures, of a remote user that is currently communicating therewith. In such a manner, the visually impaired user may receive auditory indications about visible expressions of the remote user with whom she communicates without seeing him.

For example, reference is now also made to FIG. 4, is a flowchart 250 of a method sounding audible body language indication during a communication session, according to some embodiments of the present invention. As used herein, a communication session means a voice chat, an audio chat, an instant messaging session, an inputting of a post (i.e. typing, selecting and dictating), and/or the like. For example, reference is also made to FIG. 5A, which is a schematic illustration of entities 440, 107, 449, 450 in an exemplary communication session between a user 442 and user 443, according to some embodiments of the present invention. In these embodiments, the network 105, and network server 107 are as depicted in FIG. 1 and the system 440 includes some of the components of system 100 in FIG. 1; however, system 440 includes a video sequence monitoring module 445.

In FIG. 4, the method depicts a process that provides audible body language indications for at least one visually impaired user in a certain communication session; however, method 250 may be implemented to support a plurality of simultaneous communication sessions. Moreover, although only two participants 442, 443 are depicted any number of participants may be supported so that audible body language indications are provided to any participant in a multi participant communication session.

First, as shown at 251, a video sequence of depicting body language of a remote user is captured. The video sequence may be any video sequence of a video recorder and/or a set of sequential still images captured by an image sensor. The video sequence may be taken during a social network communication, such as an audio chat, a video, and/or an instant messaging (EVl) chat. For example, in FIG. 5A, a web camera 451, which is connected to client terminal 449, captures the video sequence during a communication session between user 442 and user 443.

As shown at 252, an image is captured from the video sequence upon occurrence of an event. For example, the event is a period reception of image capturing instructions every certain period, for example every minute, every 5 minutes, every 10 minutes or any intermediate or longer period. The events may be detected by the video sequence monitoring module 445.

In another example, one or more images are taken when a user (i.e. 442) accesses his social network page and/or a certain social network post. The video sequence may be captured using any camera that is connected to the client terminal of the user 442. Additionally or alternatively, the video sequence is analyzed to identify a body language expression of the user 442. In such embodiments, a body language expression may be a detection of a smile, a grimace, a fear expression, and a disgust expression, for example as known in the art. When such a body language expression and/or a body language expression change are detected, an image is taken and set as a current body language expression. When a new body language expression that is not the current body language indication, is detected, another event occurs.

Additionally or alternatively, an image is taken upon request from a user who communicates with the imaged user. In such a manner, the user 443 may determine when she would like to receive an audibly coded signal that is indicative of visible expressions of the imaged user.

As shown at 253, an audibly coded signal is coded for representing the image in an auditory manner according to a visual-to-auditory function, for example as described above with reference to 203, for instance by the visual to auditory module 136. As described above, the audibly coded signal provides information that allows the user to identify facial expressions, eye expressions (i.e. iris size) and/or gestures, such as hand and/or head gestures, see Ella Striem-Amit, et al. 'Visual' Acuity of the Congenitally Blind Using Visual-to-Auditory Sensory Substitution.

As shown at 254, this audibly coded signal is outputted, for example played to the user 443, for example as described above. In such a manner, the user listens to audibly coded signals indicative of visible expressions of the imaged user during a communication session therewith. Alternatively, the audibly coded signal is coded based on an image that is captured when a user posts a new post, the audibly coded signal may be added to be played when the respective post is accessed by a visually impaired user. It should be noted that the system 440 may be set to allow a plurality of users of client terminals to receive audibly coded signals indicative of visible expressions of users they communicate with by translating the captured images as described above.

Optionally, the process depicted in FIG. 4 is held to analyze a plurality of video sequences of a plurality of different users so that any number of visually impaired users may communicate with one another while receiving the above audibly coded signals during the communication session. For example, in FIG. 5A, a video sequence of user 443may be captured, for instance by another web camera 451 that is connected to client terminal 450 and analyzed to facilitate the coding of audible signals and the sounding thereof to user 442, for example in a similar manner to the described above. As shown at FIG. 5B, the process may be held simultaneously to distribute audible signals, for example based on detected body language indications of any number of users, that allow a plurality of visually impaired users to communicate with one another while receiving audible indications about body language expressions of the user they communicate with.

Optionally, when more than two users participate in a communication session, each audibly coded signal is tagged with a voice tag indicative of the user imaged in the image according to which it is coded. The voice tag is optionally coded using a text to speech module, for example translating the identifier of the imaged user to an audibly coded signal. Additionally or alternatively, when more than two users participate in a communication session, audibly coded signals representing current body expressions are generate and distributed when a certain participates starts talking, stops taking and/or changes a his tone and/or talking pace. The speech initiation, pausing, and/or change are detected by voice and/or image analysis, for example as known in the art.

As shown at 255, per participant, this method 250 is repeated sequentially, based on events, for example periodically, upon detection of changes, and/or upon request from any of the users. In such a manner, the user is kept up-to-date with information indicative of visible expressions of the imaged user and optionally informed about his body language. Optionally, when the audibly coded signal is played, other vocal communication information transmissions are held, for example dimmed, withheld, and/or muted. It should be noted that as the audibly coded signals are played iteratively and not continuously, they do not substantially interfere in the communication process.

The method 250 and/or system 440 allow visually impaired users to communicate socially with one another while listening to audibly coded signals which provide information about current body language expressions of the users with which they communicate.

Reference is now also made to FIG. 6, which is an exemplary flow of data, where the system 301 is implemented as a proxy that adapts, for example transcodes, visual media content received from a third party social network 302 to audible media content that is sent to a client terminal 303 accessed by visually impaired users.

First, as shown at 401, 402, third party social network service 302 receives social media content that includes one or more images and intended for a designated user, for example from a user that is socially connected thereto. This data is acquired by the system 301, for example by pulling the third party social network 302, optionally iteratively, randomly, and/or upon event, optionally with access details (e.g. username and password) which are received from each user. In another example, the data is pushed to the system, optionally iteratively, randomly, and/or upon event.

For example, FIG. 6 depicts an exemplary social network webpage from which up-to-date social network content that includes images may be extracted and converted, according to some embodiments of the present invention. The social network content is optionally extracted from a dynamic field that includes posts which are updated in real time, for example the posts depicted in 501.

As shown a 403, the system 301 analyses the data to detect images and converts the images to audibly coded signals, as shown at 404. For example, a post is converted to audibly coded signals where text is converted by the text-to-auditory module 134 and images are converted by the visual-to-auditory module 136. The converted social network content is optionally merged to create a coherent audible representation of each post. The coherent audible representation of each post is then forwarded to the client terminal for presentation, as shown at 405. For example, reference is now made to FIG. 7 that is a screenshot of an exemplary main page of an exemplary user of a social network wherein images are converted by the visual-to-auditory module 136 and text is converted by the text- to-auditory module 134 to create a combined version, according to some embodiments of the present invention. In the combined version, the name of the user who contributes the post is set to be played first. Then, an audibly coded signal representing a profile picture of this user (contributor), for example 505, is set to be played second. The audible signal may be coded by the visual-to-auditory module 136 for example as described above. Then text added to the post may be played, for example 502. The text is optionally converted using the text- to-auditory module 134. Then, a post image, for example 503, is set to be played. Now, comments added to the post may be played, either automatically and/or upon demand. Each comment may be converted to a coherent audible representation, for example as the post above, for example the name, the profile picture, and the text, are converted to be played audibly. Comments are played one after the other, optionally chronologically.

The system 301 provides a browser 304 associated with the designated user a respective audibly coded signal through the internet accordingly. The browser 304 may be implemented using a web browser enabled with text to speech abilities, for example home page reader (HPR) that is available from web accessibility in mind (WebAEVI) project.

When a post (for brevity includes a message) with an image is updated in a certain dynamic field of a social network webpage, the system 301 (i.e. monitoring module 202) identifies the image within that post and codes an audible signal accordingly. The audibly coded signal may then be sent for sounding by the client terminal 303 that hosts the browser 122. The browser 122 may now present images in a non- visual manner. Optionally, the browser allows the user to repeat the sounding of the audibly coded signal, for example by a vocal comment and/or an input device such a Braille keyboard or any other visually impaired input device. Optionally, the user may use the client terminal to respond to the post, for example adding a comment or sending a message. For instance, a microphone that is connected to the client terminal is used for recording vocal instructions to add a comment to a post and optionally a vocal comment that is translated to text. This may be done using a speech to text module, for example a speech recognition browser addon and/or application program interface (API) such as speechapi.com. In another example a Braille keyboard or any other suitable input device, is connected to the client terminal is used for receiving an input indicative of an instruction to add a comment to a post and optionally the comment itself.

According to some embodiments of the present invention, the user may activate and/or deactivate the conversation that is made by the social media adaptation system 100 from a user interface (UI), such as a graphical user interface (GUI) that is presented to him, for example as a button in a browser, a webpage GUI, and/or the like. In such a manner, the social network content of a user may be either sound for a visually impaired and/or displayed to a seeing user. For instance, the UI is provided in one or more of the webpages of a social network website. This allows any user to activate and/or to deactivate the sounding of data, such as images and text, upon request.

Optionally, the client terminal receives posts represented as audibly coded signals and sounds them sequentially. In such a manner, an audio stream may be played to the user, allowing him to react to selected posts using a speech-to-text module and/or any other visually impaired accessory, such as a suitable Braille keyboard. The audio stream may be provided as an audio file that is sent to the client terminal of the user, for example as a voice message, an attachment of a message, a push voice notification, and/or the like. The sounding of the audio file is done a controlled manner while instructions from the user are processed. In such a manner, the user may add a comment to played posts, for example using a speech to text module and/or a visually impaired accessory. The characteristics of sounding posts from the audio file, for example the pace, the volume, repetitive sounding setting and/or the like are optionally controlled by view instructions and/or instructions from a visually impaired accessory.

It should be noted that though the system 440 is implemented as a proxy it may be implemented as one or more components which are executed at the client side, for example in each client terminal, for example as a software, a browser, and/or a Java script loaded by a browser. The system 440 may also implemented as a SaaS, providing conversation services to client modules upon request.

The methods as described above are used in the fabrication of integrated circuit chips.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

It is expected that during the life of a patent maturing from this application many relevant systems and methods will be developed and the scope of the term an image, a module, and a processor is intended to include all such new technologies a priori.

As used herein the term "about" refers to ± 10 %. The terms "comprises", "comprising", "includes", "including", "having" and their conjugates mean "including but not limited to". This term encompasses the terms "consisting of" and "consisting essentially of".

The phrase "consisting essentially of" means that the composition or method may include additional ingredients and/or steps, but only if the additional ingredients and/or steps do not materially alter the basic and novel characteristics of the claimed composition or method.

As used herein, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise. For example, the term "a compound" or "at least one compound" may include a plurality of compounds, including mixtures thereof.

The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments.

The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". Any particular embodiment of the invention may include a plurality of "optional" features unless such features conflict.

Throughout this application, various embodiments of this invention may be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.

Whenever a numerical range is indicated herein, it is meant to include any cited numeral (fractional or integral) within the indicated range. The phrases "ranging/ranges between" a first indicate number and a second indicate number and "ranging/ranges from" a first indicate number "to" a second indicate number are used herein interchangeably and are meant to include the first and second indicated numbers and all the fractional and integral numerals therebetween.

It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention. To the extent that section headings are used, they should not be construed as necessarily limiting.

Claims

WHAT IS CLAIMED IS:

1. A method of audibly representing a plurality of social media posts, comprising: monitoring a plurality of social media posts uploaded for in a social media page of a user of a social network;

identifying an image in at least one of said plurality of social media posts;

coding an audibly coded signal from at least a portion of said image according to a visual-to-auditory function; and

outputting said audibly coded signal for presentation with said to said user.

2. The method of claim 1, wherein said outputting comprises sequentially sounding said audibly coded signal and at least one another audibly coded signal representing textual content from said plurality of social media posts.

3. The method of claim 1, further comprising receiving a textual comment from said user and adding said textual comment to said at least one post.

4. The method of claim 3, wherein said receiving comprises recording vocal comment given by said user with regard to said image and converting said vocal comment to create said textual comment.

5. The method of claim 1, wherein said identifying comprises analyzing said image to emphasize at least one body language indication therein.

6. The method of claim 1, wherein said plurality of social media posts are designated to update a dynamic field of said social media page in real time.

7. The method of claim 1, wherein said plurality of social media posts comprises at least one of an instant messaging (IM) message, a messages with an image attachment.

8. The method of claim 1, wherein said audibly coded signal is a time- varying spectral representation of said portion.

9. The method of claim 1, further comprising receiving from a user instructions for audibly representing said social media content and performing said coding and said outputting in response to said user instructions.

10. The method of claim 1, wherein said coding comprises combining said audibly coded signal with at least one another audibly coded signal which is indicative of a textual content extracted from said at least one post.

11. The method of claim 1, wherein said coding comprises combining said audibly coded signal with at least one another audibly coded signal which is indicative of an identity of another user which uploads said at least one post.

12. The method of claim 1, wherein said monitoring is continuously performed to monitor said social media content and to sound said audibly coded signal in real time.

13. A computer readable medium comprising computer executable instructions adapted to perform the method of claim 1.

14. The method of claim 1, further comprising converting, using a speech-to-text function, a comment which is received from a user in relation to said image, and posting said comment with reference to said image.

15. The method of claim 1, further comprising analyzing said image to identify a body language indication; wherein said coding comprises adding an audible body language indication indicative of said body language indication to said audibly coded signal.

16. The method of claim 1, wherein said image is a profile picture of another user of said social network.

17. The method of claim 1, wherein said plurality of social media posts are displayed when said social media page is displayed; further comprising receiving from said user instructions to perform said method when said social media page is accessed.

18. The method of claim 1, wherein said outputting comprises outputting an audio file which comprises said audibly coded signal and a plurality of other audibly coded signals each coded to audibly represent one of said plurality of social media posts.

19. A method of sounding audible body language indication during a communication session, comprising:

monitoring a plurality of images depicting a certain of a plurality of users participating in a communication session;

analyzing said plurality of images sequence to identify a body language expression of said first user;

capturing an image depicting said body language expression;

outputting said audibly coded signal for presentation to at least one another of said plurality of users during said communication session.

20. The method of claim 19, further comprising establishing a multidirectional audio call between said certain user and said at least another user and sounding said audibly coded signal to said at least another user while said multidirectional audio call is taking place.

21. The method of claim 19, wherein said method is performed for each of said plurality of users during said communication session.

22. The method of claim 19, wherein said capturing, coding, and outputting are repeated in a plurality of iterations during said communication session.

23. The method of claim 22, wherein said plurality of iterations are set so that said capturing, coding, and outputting repeatedly occur every predefined period.

24. The method of claim 22, wherein said analyzing comprises detecting a change in a body language of said first user; wherein each said iteration is induced by said change detection.

25. The method of claim 22, wherein said coding comprises adding an audible tag to said audibly coded signal, said audible tag is indicative of the identity of said certain user.

26. A system for audibly representing a plurality of social media posts, comprising: a processor;

a monitoring module which monitors a plurality of social media posts uploaded for in a social media page of a user;

a visual-to-auditory module which codes, using said processor, an audibly coded signal, from at least a portion of an image provided at least one of said plurality of social media posts according to a visual-to-auditory function; and

a presentation module which induces a sounding of said audibly coded signal while presenting said plurality of social media posts to said user.

27. The system of claim 26, wherein said presentation module presents a user interface to allow a user to provide instructions indicative of activating and deactivating said visual-to-auditory module.

28. The system of claim 26, wherein said monitoring module and said visual-to- auditory module are installed on a proxy that communicates with a client terminal to instruct said sounding.

29. The system of claim 26, wherein said sounding is performed by a browser installed on a client terminal that instructs said sounding.

30. A computer program product for audibly representing a plurality of social media posts, comprising:

a computer readable storage medium;

first program instructions to monitor a plurality of social media posts uploaded for in a social media page of a user of a social network;

second program instructions to identify an image in at least one of said plurality of social media posts;

third program instructions to code an audibly coded signal from at least a portion of said image according to a visual-to-auditory function; and

fourth program instructions to output said audibly coded signal for presentation with said to said user;

wherein said first, second, third, and fourth program instructions are stored on said computer readable storage medium.