WO2016003436A1 - Reconnaissance de caractères dans des flux vidéo en temps réel - Google Patents

Reconnaissance de caractères dans des flux vidéo en temps réel Download PDF

Info

Publication number
WO2016003436A1
WO2016003436A1 PCT/US2014/044969 US2014044969W WO2016003436A1 WO 2016003436 A1 WO2016003436 A1 WO 2016003436A1 US 2014044969 W US2014044969 W US 2014044969W WO 2016003436 A1 WO2016003436 A1 WO 2016003436A1
Authority
WO
WIPO (PCT)
Prior art keywords
character data
shape
instructions
video
mirrored
Prior art date
Application number
PCT/US2014/044969
Other languages
English (en)
Inventor
Chi So
Kent E. Biggs
Jeffrey C. Stevens
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2014/044969 priority Critical patent/WO2016003436A1/fr
Priority to TW104118779A priority patent/TW201603567A/zh
Publication of WO2016003436A1 publication Critical patent/WO2016003436A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/567Multimedia conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • Remote collaboration systems strive to deliver an experience that local and remote meeting participants are in the same room.
  • a see-through screen-based collaboration system creates the illusion that the users are seemingly separated only by a sheet of glass where in fact they are at different locations.
  • Such a system provides an effective collaboration tool where the users can see each other's body language, hand gestures, eye contact, and gaze.
  • FIG. 1 is an example environment in which various examples may be implemented as a video processing system.
  • FIG. 2 is a block diagram depicting an example machine-readable storage medium comprising instructions executable by a processor for video processing.
  • FIG. 3 is a flow diagram depicting an example method for detecting character data in a video stream based on known shapes.
  • FIG. 4 is a flow diagram depicting an example method for video processing as used in a see-through screen-based collaboration system.
  • FIG. 5 is an example picture depicting how two users communicate using a see-through screen-based collaboration system.
  • FIG. 6 is a diagram depicting an example system design of a see-through screen-based collaboration system.
  • FIG. 7 is a diagram depicting an example implementation of detecting character data in a video stream.
  • FIG. 8 is a diagram depicting an example implementation of controlling a mirror image effect using a depth camera.
  • Remote collaboration systems strive to deliver an experience that local and remote meeting participants are in the same room.
  • a see-through screen-based collaboration system can create the illusion that the users are seemingly separated only by a sheet of glass where in fact they are physically at different locations.
  • Such a system provides an effective collaboration tool where the users can see each other's body language, hand gestures, eye contact, gaze, and how they interact with shared content displayed on the see-through screen.
  • the see-through screen-based collaboration system may include a first video capturing device (e.g., video camera) that captures a view of a first user (as well as the space around the first user) through a first see- through screen.
  • the "see-through screen" as used herein may comprise a transparent display screen through which a user can view, upload, or otherwise interact with content (e.g., image, text, video, etc.) and that the user can write on.
  • An example see-through screen is shown in FIG. 5.
  • the first video capturing device may be installed behind the first screen, shooting through the first screen.
  • the first user may be present on the other side of the screen, facing the first camera through the first screen.
  • a first projector may be installed on the same side as the first camera, projecting shared content on the first screen.
  • a similar arrangement of the system may be set up at a remote location for a second user.
  • a second video capturing device may capture a view of the second user (as well as the space around the second user) through a second see-through screen.
  • the second video capturing device may be installed behind the second screen, shooting through the second screen.
  • the second user may be present on the other side of the screen, facing the second camera through the second screen.
  • the content being shared and interacted with by the first user as well as the captured view of the first user may be projected by a second projector to the second see-through screen.
  • the content being shared and interacted with by the second user as well as the captured view of the second user may be projected by the first projector to the first see-through screen.
  • This particular arrangement of the see-through screen-based system would allow for capturing a video image of the remote user from a viewpoint that corresponds to that of the local user.
  • An example system design of the see-through screen-based collaboration system as discussed above is illustrated in FIG. 6.
  • the "shared content" as used herein may comprise any content (e.g., image, text, video, etc.) that may be shared by a user via his/her see-through screen.
  • the shared content may then be projected to and/or displayed on another user's see- through screen.
  • the local and remote users may view the same content on their respective see-through screens as the content is being created, uploaded, interacted with, and/or manipulated by the users.
  • the first user may draw a flow diagram on the first see-through screen.
  • the user may also upload and/or share content on the first screen or may choose to draw on top of an uploaded image. All of these may be projected to and/or displayed on the second see-through screen via which the second user may view the shared content.
  • the second user may further manipulate the shared content and any changes and additions made to the shared content may be simultaneously displayed on the first see-through screen.
  • any shared content between the first and second users may be displayed in the same orientation so that drawings (or any other content) on the screen appear correct to both users.
  • drawings or any other content
  • any shared content may be displayed on the screen in such a way that both users will be able to see the word "logistics" in its correct orientation rather than a mirror-reversed version of the word.
  • the captured video images of the remote user may be horizontally flipped (or mirror-reversed) so that the shared content and the captured video images of the remote user may be properly combined and displayed on the local user's see-through screen.
  • a video image of the first user may be mirror-reversed and displayed on the second see-through screen.
  • a video image of the second user may be mirror-reversed and displayed on the first see-through screen.
  • text shown in the mirrored video image will appear to be written backwards. For example, when the first user is wearing a t- shirt with a word, that word will be shown backwards if viewed from the second user's perspective.
  • Examples disclosed herein address this undesirable mirror image effect by recognizing and/or detecting character data (e.g., letters, numbers, symbols, etc.) in a video stream.
  • the character data may be recognized and/or detected based on various different ways.
  • the character data may be detected by recognizing a shape that is known to include character data.
  • T-shirts often have some text written on them.
  • the mirror image effect may be corrected by horizontally flipping a portion of the video stream representing the detected character data around an axis (e.g., the central axis of the detected portion). Resolution and/or accuracy of the detected character data may be enhanced and the enhanced character data may be displayed on the screen. Further, any perspective distortion caused by the flipping of the detected portion may be corrected to deliver an even more realistic experience.
  • FIG. 1 is an example environment 100 in which various examples may be implemented as a video processing system 1 10.
  • Environment 100 may include various components including server computing device 130 and client computing devices 140 (illustrated as 140A, 140B, ..., 140N). Each client computing device 140A, 140B, ..., 140N may communicate requests to and/or receive responses from server computing device 130.
  • Server computing device 130 may receive and/or respond to requests from client computing devices 140.
  • Client computing devices 140 may include any type of computing device providing a user interface through which a user can interact with a software application.
  • client computing devices 140 may include a laptop computing device, a desktop computing device, an all-in-one computing device, a tablet computing device, a mobile phone, an electronic book reader, a network-enabled appliance such as a "Smart" television, and/or other electronic device suitable for displaying a user interface and processing user interactions with the displayed interface.
  • server computing device 130 is depicted as a single computing device, server computing device 130 may include any number of integrated or distributed computing devices serving one or more software applications for consumption by client computing devices 140.
  • Network 50 may comprise any infrastructure or combination of infrastructures that enable electronic communication between the components.
  • network 50 may include any one or more of the Internet, an intranet, a PAN (Personal Area Network), a LAN (Local Area Network), a WAN (Wide Area Network), a SAN (Storage Area Network), a MAN (Metropolitan Area Network), a wireless network, a cellular communications network, a Public Switched Telephone Network, and/or other network.
  • video processing system 1 10 and the various components described herein may be implemented in hardware and/or programming that configure hardware.
  • FIG. 1 and other Figures described herein different numbers of components or entities than depicted may be used.
  • Video processing system 1 10 may process a video stream captured by a video capturing device in such a way that allows the local and remote users to have a realistic collaboration experience that they are facing each other through a see- through glass while interacting with each other through shared content displayed on the see-through glass.
  • Video processing system 1 10 may create a mirrored video image of the video stream captured by the video capturing device, recognize and/or detect character data in the mirrored video image, and/or correct the undesirable mirror image effect by flipping a portion in the mirrored video image representing the detected character data. Any perspective distortion caused by the flipping of the portion representing the detected character data may be corrected as well. Further, video processing system 1 10 may enhance resolution and accuracy of the detected character data.
  • video processing system 1 10 may comprise a video mirroring engine 121 , a detecting engine 122, a correcting engine 123, a character enhancing engine 124, and an output generating engine 125, and/or other engines.
  • engine refers to a combination of hardware and programming that performs a designated function.
  • the hardware of each engine for example, may include one or both of a processor and a machine-readable storage medium, while the programming is a set of instructions or code stored on the machine-readable storage medium and executable by the processor to perform the designated function.
  • Video mirroring engine 121 may create and/or generate a mirrored version of a video stream captured by a video capturing device.
  • the captured video images of the remote user may be horizontally flipped (or mirror- reversed) so that the shared content displayed on the screen appear correct to both users.
  • the video content of the video stream may be flipped horizontally around the central axis of the captured video image.
  • the video image of the first user may be mirror-reversed and projected to the second see-through screen.
  • the video image of the second user may be mirror-reversed and projected to the first see-through screen.
  • Detecting engine 122 may recognize and/or detect character data (e.g., letters, numbers, symbols, etc.) in a video stream.
  • the character data may be recognized and/or detected based on various different ways.
  • the character data may be detected by graphically matching the characters to known characters stored and maintained in a database (e.g., character database).
  • the OCR optical character recognition
  • the character database may be updated over time to include additional characters, fonts, or glyphs for better recognition. Both English and foreign language characters along with punctuation marks or other symbols may be included in the character database.
  • detecting engine 122 may recognize a shape that is known to contain character data.
  • a database e.g., shape database
  • a machine learning algorithm may be used to identify the shapes that have been previously determined to actually contain character data for inclusion in the shape database.
  • detecting engine 122 may detect character data within the shape.
  • a view captured by a video capturing device may have a designated section of the view where any character data within the designated section may be recognized by detecting engine 122.
  • detecting engine 122 may then recognize and/or detect the character data shown within the designated section of the camera view.
  • An example diagram depicting the use of the designated section of the camera view to detect character data is illustrated in FIG. 7.
  • Correcting engine 123 may correct the mirror image effect by horizontally flipping a portion of the video stream representing the detected character data around an axis (e.g., the central axis of the detected portion). In some implementations, correcting engine 123 may correct the mirror image effect by flipping an object in the shape recognized by detecting engine 122. For example, when a T-shirt shape is recognized in the video stream, the shape may be flipped horizontally around the central axis of the T-shirt shape, allowing any character data within the shape to be flipped around as well.
  • the video capturing device may comprise a depth camera that can determine the distance to a human or other object in a field of view of the camera.
  • correcting engine 123 may mirror reverse (or flip horizontally) any characters, physical objects, or even the entire space that are between a first distance from the depth camera to a second distance from the depth camera.
  • the viewing space of the camera may be divided into several different depth ranges. For each depth range, correcting engine 123 may specify what needs to be mirror reversed and what needs to remain unchanged (e.g., not mirrored). In one example, correcting engine 123 may mirror reverse (or flip horizontally) only the character data that is present within 1 -3 feet from the depth camera.
  • Correcting engine 123 may mirror reverse (or flip horizontally) the entire space within 3-6 feet from the depth camera whereas the orientation of the space beyond 6 feet from the depth camera may remain the same.
  • correcting engine 123 may correct any perspective distortion caused by the flipped portion of the video stream representing the detected character data.
  • a particular portion of the video stream When a particular portion of the video stream is flipped, it may cause the video to have unnatural perspective distortion. For example, if the word "HELLO" is written on a piece of paper where one end of the paper is closer to the screen than the other end, then the letter “H” might look larger than the letter ⁇ " in the captured video image. When the word is flipped while the orientation of the paper stays the same, the final video image may look unnatural and might severely compromise the realistic collaboration experience. Thus, any perspective distortion caused by the flipped character portion may be corrected to deliver an even more realistic collaboration experience.
  • Character enhancing engine 124 may enhance the recognized and/or detected character data in the video stream.
  • the video images of the recognized characters may be made to be shaper, clearer, and more accurate.
  • character enhancing engine 124 may find closest matching fonts or glyphs and replace the detected characters with those fonts or glyphs. Fonts, glyphs, and/or other related data may be stored in a fonts database and/or other databases.
  • Output generating engine 125 may generate an output video stream to be projected to a see-through screen.
  • the output video stream may comprise the mirrored video stream (by video mirroring engine 121 ) with the flipped portion (by correcting engine 123) and the shared content.
  • the shared content may be combined with the mirrored version of the captured video stream (with the flipped portion as discussed herein) to create the output video stream.
  • the output video stream may be projected to and/or displayed on a see-through screen in front of a remote user. For example, the output video stream capturing a view of the first user may be displayed on the second see-through screen. Similarly, the output video stream capturing a view of the second user may be displayed on the first see-through screen.
  • Data storage 129 may represent any memory accessible to video processing system 1 10 that can be used to store and retrieve data.
  • Data storage 129 may comprise floppy disks, hard disks, optical disks, tapes, solid state drives, random access memory (RAM), read-only memory (ROM), electrically-erasable
  • Video processing system 1 10 may access data storage 129 locally or remotely via network 50 or other networks.
  • data storage 129 may comprise the character database, the shape database, the fonts database, and/or other database as discussed herein.
  • Data storage 129 may include a database to organize and store data.
  • Database may be, include, or interface to, for example, an OracleTM relational database sold commercially by Oracle Corporation.
  • Other databases such as InformixTM, DB2 (Database 2) or other data storage, including file-based (e.g., comma or tab separated files), or query formats, platforms, or resources such as OLAP (On Line Analytical Processing), SQL (Structured Query Language), a SAN (storage area network), Microsoft AccessTM, MySQL, PostgreSQL, HSpace, Apache Cassandra, MongoDB, Apache CouchDBTM, or others may also be used, incorporated, or accessed.
  • the database may reside in a single or multiple physical device(s) and in a single or multiple physical location(s).
  • the database may store a plurality of types of data and/or files and associated data or file description, administrative information, or any other data.
  • FIG. 2 is a block diagram depicting an example machine-readable storage medium 210 comprising instructions executable by a processor for video processing.
  • engines 121 -125 were described as combinations of hardware and programming. Engines 121 -125 may be implemented in a number of fashions.
  • the programming may include processor executable instructions 221 -225 stored on a machine-readable storage medium 210 and the hardware may include a processor 21 1 for executing those instructions.
  • machine-readable storage medium 210 can be said to store program instructions or code that when executed by processor 21 1 implements video processing system 1 10 of FIG. 1 .
  • Machine-readable storage medium 210 may be any electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions.
  • machine-readable storage medium 210 may be a non-transitory storage medium, where the term "non-transitory" does not encompass transitory propagating signals.
  • Machine-readable storage medium 210 may be implemented in a single device or distributed across devices.
  • processor 21 1 may represent any number of processors capable of executing instructions stored by machine-readable storage medium 210.
  • Processor 21 1 may be integrated in a single device or distributed across devices.
  • machine- readable storage medium 210 may be fully or partially integrated in the same device as processor 21 1 , or it may be separate but accessible to that device and processor 21 1 .
  • the program instructions may be part of an installation package that when installed can be executed by processor 21 1 to implement video processing system 1 10.
  • machine-readable storage medium 210 may be a portable medium such as a floppy disk, CD, DVD, or flash drive or a memory maintained by a server from which the installation package can be downloaded and installed.
  • the program instructions may be part of an application or applications already installed.
  • machine-readable storage medium 210 may include a hard disk, optical disk, tapes, solid state drives, RAM, ROM, EEPROM, or the like.
  • Processor 21 1 may be one or more central processing units (CPUs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 210.
  • Processor 21 1 may fetch, decode, and execute program instructions 221 -225, and/or other instructions.
  • processor 21 1 may include one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of instructions 221 -225, and/or other instructions.
  • the executable program instructions in machine-readable storage medium 210 are depicted as video mirroring instructions 221 , detecting instructions 222, correcting instructions 223, character enhancing instructions 224, and output generating instructions 225.
  • Instructions 221 -225 represent program instructions that, when executed, cause processor 21 1 to implement engines 121 -125, respectively.
  • FIG. 3 is a flow diagram depicting an example method 300 for detecting character data in a video stream based on known shapes.
  • the various processing blocks and/or data flows depicted in FIG. 3 are described in greater detail herein.
  • the described processing blocks may be accomplished using some or all of the system components described in detail above and, in some implementations, various processing blocks may be performed in different sequences and various processing blocks may be omitted. Additional processing blocks may be performed along with some or all of the processing blocks shown in the depicted flow diagrams. Some processing blocks may be performed simultaneously.
  • method 300 as illustrated (and described in greater detail below) is meant be an example and, as such, should not be viewed as limiting.
  • Method 300 may be implemented in the form of executable instructions stored on a machine-readable storage medium, such as storage medium 210, and/or in the form of electronic circuitry.
  • Method 300 may start in block 310 and proceed to block 321 where at least one shape and characteristics related to the at least one shape are stored in a database.
  • the database e.g., shape database
  • the database may store at least one shape that is known to contain character data and its shape characteristics. Shapes such as a square (e.g., a piece of paper, signs, posters, etc.), rectangle (e.g., a piece of paper, signs, posters, etc.), octagon (e.g., a stop sign), T-shirt shape, or a flag shape often have some letters, numbers, or symbols within the shapes. Such shapes will be scrutinized more carefully to see if they contain character data since they are more likely to contain the character data than other types of objects or shapes.
  • a video stream may be obtained.
  • the video stream may be a mirrored version of an original video stream captured by a video capturing device.
  • method 300 may include determining whether the video stream comprises video content showing a shape having the specified shape characteristics of the at least one shape. If there is a particular shape that matches at least one of the stored shapes based on comparing their characteristics, method 300 may proceed to block 324 where character data within the shape may be recognized and/or detected. On the other hand, if method 300 determines that there is no such shape found in the video stream, the method 300 may proceed to block 330. Method 300 may then stop in block 330.
  • detecting engine 122 may be responsible for implementing method 300.
  • FIG. 4 is a flow diagram depicting an example method 400 for video processing as used in a see-through screen-based collaboration system.
  • Method 400 as illustrated is meant be an example and, as such, should not be viewed as limiting.
  • Method 400 may be implemented in the form of executable instructions stored on a machine-readable storage medium, such as storage medium 210, and/or in the form of electronic circuitry.
  • Method 400 may start in block 410 and proceed to block 421 where any content being shared between the local and remote users may be projected to a first see-through screen for a first user.
  • the first user may modify, add, or otherwise interact with the shared content on the screen.
  • a video image of the first user (and the space around the first user) may be received (block 422) and used to generate a mirrored video image (block 423).
  • method 400 may include recognizing character data in the mirrored video image. Various different character recognition techniques, as discussed herein, may be used.
  • a portion representing the character data may be flipped around an axis (e.g., the central axis of the portion) in the mirrored video image.
  • an output video image comprising the mirrored video image with the flipped portion may be generated.
  • the output video image may be combined with the shared content (as modified, added, or otherwise interacted with by the first user) and projected to a second see-through screen for a second user (block 427).
  • Method 400 may then stop in block 430.
  • output generating engine 125 may be responsible for implementing blocks 421 , 426, and 427.
  • Video mirroring engine 121 may be responsible for implementing blocks 422 and 423.
  • Detecting engine 122 may be responsible for implementing block 424.
  • Correcting engine 123 may be responsible for implementing block 425.
  • FIG. 5 is an example picture 500 depicting how two users communicate using a see-through screen-based collaboration system. Note the video image of the remote user is horizontally flipped so that drawings (e.g., shared content) made on the screen appear correct to both users. In this example, the words written on the remote user's T-shirt may be detected and flipped around according to various implementations as discussed herein.
  • FIG. 6 is a diagram depicting an example system design 600 of a see- through screen-based collaboration system.
  • the see-through screen-based collaboration system may include a first video capturing device 635A that captures a view of a first user 61 OA through a first see-through screen 620A.
  • the first video capturing device 635A may be installed behind the first screen 620A, shooting through the first screen 620A.
  • the first user 61 OA may be present on the other side of the screen 620A, facing the first camera 635A through the first screen 620A.
  • a first projector 630A may be installed on the same side as the first camera 635A, projecting shared content 650 on the first screen 620A.
  • a similar arrangement of the system may be set up at a remote location for a second user 610B.
  • a second video capturing device 635B may capture a view of the second user 610B through a second see-through screen 620B.
  • the second video capturing device 635B may be installed behind the second screen 620B, shooting through the second screen 620B.
  • the second user 610B may be present on the other side of the screen 620B, facing the second camera 635B through the second screen 620B.
  • the content being shared and interacted with by the first user 61 OA as well as the captured view of the first user 61 OA may be projected by a second projector 630B to the second see-through screen 620B.
  • the content being shared and interacted with by the second user 610B as well as the captured view of the second user 610B may be projected by the first projector 630A to the first see-through screen 620A.
  • This particular arrangement of the see-through screen-based system would allow for capturing a video image of the remote user from a viewpoint that corresponds to that of the local user.
  • the first video capturing device 635A may capture a video image 640 of the first user 61 OA.
  • system 1 10 may create a mirror version 641 of the video image 640 so that the shared content displayed on the screen appear correct to both users.
  • One problem with this is that any characters shown in the mirrored video image 641 will appear to be written backwards.
  • system 1 10 may recognize and/or detect character data (e.g., letters, numbers, symbols, etc.) in the mirrored video image 641 and flip a portion of the mirrored video image 641 representing the detected character data (e.g., a corrected video image 642).
  • character data e.g., letters, numbers, symbols, etc.
  • the word "HELLO” written on the first user 61 OA's T-shirt may be detected and flipped around according to various implementations as discussed herein. Further, resolution and accuracy of the word “HELLO” may be enhanced according to various implementations as discussed herein.
  • System 1 10 may then generate an output video stream 643 to be projected to the second see-through screen 620B.
  • the output video stream 643 may comprise the corrected video image 642 and the shared content 650.
  • the shared content 650 may be combined with the corrected video stream 642 to create the output video stream 643.
  • the output video stream 643 may then be projected to and/or displayed on the second see-through screen 620B. Not shown in FIG. 6 is the reverse path where the second user 610B's video image is captured by the second camera 635B, mirrored, corrected, and displayed with the shared content 650 on the first see- through screen 620A.
  • FIG. 7 is a diagram depicting an example implementation of detecting character data in a video stream.
  • a view captured by a video capturing device may have a designated section 760 (shown shaded in FIG. 7) of the view where any character data within the designated section 760 may be detected.
  • a user 710 can hold a sheet of paper 750 up so that any character data (e.g., the word "HELLO") written on the paper 750 is positioned within the designated section 760 of the view.
  • System 1 10 may recognize and/or detect the word "HELLO" shown within the designated section 760 of the camera view.
  • System 1 10 may create a mirrored version 741 of the captured video 740 and further correct the mirrored version 741 by horizontally flipping the word "HELLO" (e.g., a corrected video image 742).
  • FIG. 8 is a diagram depicting an example implementation 800 of controlling a mirror image effect using a depth camera.
  • a depth camera 830 may determine the distance to a human 810 or other object in a field of view of the camera 830. Using the depth camera 830, system 1 10 may mirror reverse (or flip horizontally) any characters, physical objects, or even the entire space that are between a first distance from the depth camera 830 to a second distance from the depth camera 830. In some instances, the viewing space of the camera may be divided into several different depth ranges. For each depth range, system 1 10 may specify what needs to be mirror reversed and what needs to remain unchanged (e.g., not mirrored). For example, system 1 10 may mirror reverse (or flip horizontally) only the character data that is present within a first depth range 850.
  • any other physical objects or the space itself within that depth range may remain unchanged. Further, the entire space within a second depth range 851 may be mirror reversed (or horizontally flipped) whereas the orientation of the space within a third depth range 852 may remain the same (e.g., not mirrored).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

Des exemples de l'invention concernent la détection de données de caractères. Au moins une forme et des caractéristiques associées à la ou aux formes sont stockées dans une base de données. Lorsqu'un flux de données vidéo est obtenu, il est possible de déterminer si le flux vidéo contient un contenu vidéo montrant une forme ayant les caractéristiques de la ou des formes. Si le flux vidéo contient un contenu vidéo montrant une forme ayant les caractéristiques de la ou des formes, les données de caractères dans cette forme peuvent être détectées.
PCT/US2014/044969 2014-06-30 2014-06-30 Reconnaissance de caractères dans des flux vidéo en temps réel WO2016003436A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US2014/044969 WO2016003436A1 (fr) 2014-06-30 2014-06-30 Reconnaissance de caractères dans des flux vidéo en temps réel
TW104118779A TW201603567A (zh) 2014-06-30 2015-06-10 即時視訊串流中字元辨識技術

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/044969 WO2016003436A1 (fr) 2014-06-30 2014-06-30 Reconnaissance de caractères dans des flux vidéo en temps réel

Publications (1)

Publication Number Publication Date
WO2016003436A1 true WO2016003436A1 (fr) 2016-01-07

Family

ID=55019784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/044969 WO2016003436A1 (fr) 2014-06-30 2014-06-30 Reconnaissance de caractères dans des flux vidéo en temps réel

Country Status (2)

Country Link
TW (1) TW201603567A (fr)
WO (1) WO2016003436A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060817A1 (en) * 2015-08-27 2017-03-02 Microsoft Technology Licensing, Llc Smart flip operation for grouped objects
US10762375B2 (en) 2018-01-27 2020-09-01 Microsoft Technology Licensing, Llc Media management system for video data processing and adaptation data generation
US11972623B2 (en) 2021-07-23 2024-04-30 International Business Machines Corporation Selective mirror enhanced video stream

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040080616A1 (en) * 1994-03-15 2004-04-29 Canon Kabushiki Kaisha Video information display system and display apparatus applicable to the same
US20050162511A1 (en) * 2004-01-28 2005-07-28 Jackson Warren B. Method and system for display of facial features on nonplanar surfaces
US20080292215A1 (en) * 2007-05-23 2008-11-27 Xerox Corporation Selective text flipping and image mirroring system and method
US20120284646A1 (en) * 2011-05-06 2012-11-08 David H. Sitrick Systems And Methodologies Providing Collaboration And Display Among A Plurality Of Users
US20130278629A1 (en) * 2012-04-24 2013-10-24 Kar-Han Tan Visual feedback during remote collaboration

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040080616A1 (en) * 1994-03-15 2004-04-29 Canon Kabushiki Kaisha Video information display system and display apparatus applicable to the same
US20050162511A1 (en) * 2004-01-28 2005-07-28 Jackson Warren B. Method and system for display of facial features on nonplanar surfaces
US20080292215A1 (en) * 2007-05-23 2008-11-27 Xerox Corporation Selective text flipping and image mirroring system and method
US20120284646A1 (en) * 2011-05-06 2012-11-08 David H. Sitrick Systems And Methodologies Providing Collaboration And Display Among A Plurality Of Users
US20130278629A1 (en) * 2012-04-24 2013-10-24 Kar-Han Tan Visual feedback during remote collaboration

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060817A1 (en) * 2015-08-27 2017-03-02 Microsoft Technology Licensing, Llc Smart flip operation for grouped objects
US10176148B2 (en) * 2015-08-27 2019-01-08 Microsoft Technology Licensing, Llc Smart flip operation for grouped objects
US10762375B2 (en) 2018-01-27 2020-09-01 Microsoft Technology Licensing, Llc Media management system for video data processing and adaptation data generation
US11501546B2 (en) 2018-01-27 2022-11-15 Microsoft Technology Licensing, Llc Media management system for video data processing and adaptation data generation
US11972623B2 (en) 2021-07-23 2024-04-30 International Business Machines Corporation Selective mirror enhanced video stream

Also Published As

Publication number Publication date
TW201603567A (zh) 2016-01-16

Similar Documents

Publication Publication Date Title
US11710279B2 (en) Contextual local image recognition dataset
US11551377B2 (en) Eye gaze tracking using neural networks
US10832086B2 (en) Target object presentation method and apparatus
US10891671B2 (en) Image recognition result culling
EP3063730B1 (fr) Recadrage et partage d'images automatisés
CA3083486C (fr) Procede, support et systeme de previsualisation en direct par l`intermediaire de modeles d`apprentissage automatique
US10284817B2 (en) Device for and method of corneal imaging
US11700417B2 (en) Method and apparatus for processing video
US20160306505A1 (en) Computer-implemented methods and systems for automatically creating and displaying instant presentations from selected visual content items
US9892648B2 (en) Directing field of vision based on personal interests
US11556605B2 (en) Search method, device and storage medium
US20150269133A1 (en) Electronic book reading incorporating added environmental feel factors
US11914836B2 (en) Hand presence over keyboard inclusiveness
US11226785B2 (en) Scale determination service
US11010980B2 (en) Augmented interface distraction reduction
WO2016003436A1 (fr) Reconnaissance de caractères dans des flux vidéo en temps réel
US10432572B2 (en) Content posting method and apparatus
US8791947B1 (en) Level of detail blurring and 3D model data selection
CN111105440B (zh) 视频中目标物体的跟踪方法、装置、设备及存储介质
US11107285B2 (en) Augmented reality-based image editing
WO2023142400A1 (fr) Procédé et appareil de traitement de données, et dispositif informatique, support de stockage lisible par ordinateur et produit programme informatique
US20230251809A1 (en) Information Orientation and Display in Extended Reality Environments
WO2023060207A1 (fr) Détection et virtualisation d'objets manuscrits

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14896661

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14896661

Country of ref document: EP

Kind code of ref document: A1