WO2021190264A1

WO2021190264A1 - Cooperative document editing with augmented reality

Info

Publication number: WO2021190264A1
Application number: PCT/CN2021/078898
Authority: WO
Inventors: Jun Liu; Yi Xu; Shuxue Quan
Original assignee: Guangdong Oppo Mobile Telecommunications Corp., Ltd.
Priority date: 2020-03-25
Filing date: 2021-03-03
Publication date: 2021-09-30
Also published as: CN115190996A

Abstract

Described herein are methods and systems for collaborative document editing via augmented reality content using consumer technology. Embodiments of the present disclosure are directed to collaborative document editing incorporating AR features and a communication platform to facilitate real-time collaborative document editing via a single AR user interface. An AR file may be presented on a surface within an AR scene. Embodiments further receive document modification data, convert the document modification data into digital information, provide an updated AR file using the digital information, update the AR scene data using the updated AR file, generate an updated document file using the updated AR file, and store the updated document file in a data store.

Description

COOPERATIVE DOCUMENT EDITING WITH AUGMENTED REALITY

TECHNICAL FIELD

This disclosure relates to the technical field of augmented reality (AR) , and particularly to a method for cooperative document editing in an AR environment, a system, and a non-transitory computer readable medium.

BACKGROUND

AR superimposes virtual content over a user's view of the real world. With the development of AR software development kits (SDK) , the mobile industry has brought mobile device AR platforms to the mainstream. An AR SDK typically provides six degrees-of-freedom (6DoF) tracking capability. A user can scan the environment using a camera included in an electronic device (e.g., a smartphone or an AR system) and the electronic device performs visual inertial odometry (VIO) in real time. Once the camera pose is tracked continuously, virtual objects can be placed into the AR scene to create an illusion that real objects and virtual objects are merged together.

Despite the progress made in the field of AR, there is a need in the art for improved methods of collaborative document editing implementing AR features in an intuitive way.

SUMMARY

The present disclosure relates generally to methods and systems related to augmented reality (AR) applications. More particularly, embodiments of the present disclosure provide methods and systems for collaborative document editing via augmented reality content using consumer technology. Embodiments of the present disclosure are applicable to a variety of applications in augmented reality and computer-based display systems.

One embodiment of the disclosure is directed to method, implemented by a computer system, for cooperative document editing in an augmented reality (AR) environment. The method comprises obtaining a document file generating an AR file using the document file and providing: AR scene data and a user interface comprising one or more interactive elements. An AR scene generated from the AR scene data is configured to present the AR file in the AR scene disposed on a surface of a real real-world object. The one or more interactive elements is configured to receive user interaction. The method further comprises receiving document modification data, converting the document modification data into digital textual information, providing an updated AR file using the digital textual information, updating the AR scene data using the updated AR file, generating an updated document file using the updated AR file, and storing the updated document file in a data store.

Another embodiment of the disclosure is directed to a system comprising a processor; and a memory including instructions that, when executed with the processor, cause the system to, at least: obtain a document file, generate an AR file using the document file, and provide an AR scene data, (an AR scene generated from the AR scene data being configured to present the AR file in the AR scene disposed on a surface of a real real-world object) and a user interface comprising one or more interactive elements (the one or more interactive elements configured to receive user interaction) . The instructions further cause the system to receive document modification data, convert the document modification data into digital textual information, provide an updated AR file using the digital textual information, update the AR scene data using the updated AR file, generate an updated document file using the updated AR file, and store the updated document file in a data store.

Yet another embodiment of the disclosure is directed to a non-transitory computer readable medium storing specific computer-executable instructions that, when executed by a processor, cause a computer system to at least: obtain a document file, generate an AR file using the document file, and provide an AR scene data, (an AR scene generated from the AR scene data being configured to present the AR file in the AR scene disposed on a surface of a real real-world object) and a user interface comprising one or more interactive elements (the one or more interactive elements configured to receive user interaction) . The instructions further cause the computer system to receive document modification data, convert the document modification data into digital textual information, provide an updated AR file using the digital textual information, update the AR scene data using the updated AR file, generate an updated document file using the updated AR file, and store the updated document file in a data store.

Numerous benefits are achieved by way of the present disclosure over conventional techniques. For example, embodiments of the present disclosure involve methods and systems that provide editable digital documents for incorporation into AR content automatically on a mobile device. These and other embodiments of the disclosure along with many of its advantages and features are described in more detail in conjunction with the text below and attached figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an example of a technique for collaborative document editing using augmented reality (AR) applications according to an embodiment of the present disclosure.

FIG. 2 is a diagram illustrating an example system architecture for a system for collaborative document editing in an AR environment according to an embodiment of the present disclosure.

FIG. 3A is a diagram showing an example technique for collaborative document editing using AR applications according to an embodiment of the present disclosure.

FIG. 3B is another diagram showing another example technique for collaborative document editing using AR applications according to an embodiment of the present disclosure.

FIG. 3C is another diagram showing another example technique for collaborative document editing using AR applications according to an embodiment of the present disclosure.

FIG. 4 depicts an illustrative example of a technique for collaborative editing of a document including AR features according to an embodiment of the present disclosure.

FIG. 5 is a simplified flowchart illustrating a method of collaborative document editing using AR applications according to an embodiment of the present disclosure.

FIG. 6 illustrates an example computer system according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.

The present disclosure relates generally to methods and systems related to virtual reality applications. More particularly, embodiments of the present disclosure are directed to, among other things, a system and method for collaborative document editing incorporating AR features and a communication platform to facilitate real-time collaborative document editing via a single AR user interface (UI) .

FIG. 1 illustrates an example of a technique 100 for collaborative document editing using augmented reality (AR) applications according to an embodiment of the present disclosure. In some embodiments, one or more user devices 102, implementing a mobile application as described in more detail in reference to FIG. 2, receive a digital document file 104 for collaborative editing in an AR application. In some cases, a user device 110 (e.g., an AR headset, a tablet, a smartphone, a VR headset, etc. ) , which may be different from the user devices 102, may generate the digital document file 104 from a physical document 120 using one or more images of the physical document 120 captured and/or generated using a camera in communication with the user device 110. For example, the user device 110 may capture and/or generate an image of a first page of the physical document 120 such that the first page is depicted in a perspective relative to one or more vanishing points in the image. In some cases, the user device 110 may implement one or more software modules for implementing plane-rectification of an extracted image of one or more pages of the physical document 120 from the image. For example, the user device 110 may implement an edge-finder module to define one or more edges of a page of the physical document 120 appearing in the image, from which the user device may then isolate the image of the page, adjust and/or scale the extracted image to form a rectangular image conforming to one or more standard document page sizes, and compensate for distortion effects on resolution of images and/or text appearing in the extracted image. In some embodiments, as described in more detail in reference to FIG. 2, additionally and/or alternatively the one or more software modules may be executed by a mobile application server in communication with the user devices 102 via a network.

In some embodiments, the physical document 120 may include both print text and handwritten script. The print text may further include print text originating from multiple authors, for example, as indicated in multiple different visual formats (e.g., text color, text font, text style, etc. ) . In some embodiments, text from each respective author may be associated with a different identifier in the digital document file 104, which may, in turn, be linked to one or more visual attributes and/or text properties. For example, print text may be associated with one or more authors (e.g., through account data as described in reference to FIG. 2) and may be converted to editable text via one or more optical character recognition (OCR) modules implemented by the user device 110 or remotely by the mobile application server. Additionally, handwritten script may be recognized, and may be converted to print text and/or rendered as a digital script element (e.g., a vector graphics element presented as a document object) . Text originating from different authors may be assigned to different layers of the digital document file 104, such that the user device may display and/or hide text from one or more authors, for example, in response to a user selection (e.g., via one or more UI elements in an AR scene, discussed in more detail in subsequent paragraphs) . Similarly, a translation module may provide language support for the OCR module to generate and/or identify translations of print and/or script in the digital document file 104, for, example, in accordance with one or more preferences of a user of a user device of the user devices 102. For example, the translation module may be hosted by the mobile application server, and may generate multiple translations for the user devices 102 where the user devices 102 are associated with users with multiple different language preferences. Additionally and/or alternatively, each user device of the user devices 102 may include an instance of the translator module for translating the digital document file 104 without reference to a mobile application server.

In some embodiments, the user devices and/or the mobile application server generate an AR scene 140 to present the digital document file 104 as an AR projection. In particular, the user devices 102 and/or the mobile application server may generate and/or present the AR scene 140 wherein one or more AR elements are overlaid on a camera field of view of the environment around each respective user device of the user devices 102. For example, user device 110 may prepare the digital document file 104 from the physical document 120 as described above, from which an AR file 152 may be generated. In some embodiments, the AR file may include an editable text document encoded for visualization as a projection in a three dimensional (3D) environment. Using the AR file 152, the user device 110 may generate and/or present the AR scene 140 including a virtual plane 150 on which a holographic projection of the AR file 152 may be presented. In some embodiments, the AR file 152 may be presented as a projection onto a surface of an object in the environment of the user device. For example, the virtual plane 150 may be parallel to a planar surface of an object (e.g., a tabletop, a wall, a door, etc. ) , such that the AR file 152 may be presented as a 2D projection on the virtual plane 150 with perspective adjusted to align with one or more vanishing points determined in the AR scene 140. As described in more detail in reference to FIG. 2, identifying the object, determining the one or more vanishing points, presenting the AR file 152, adjusting the AR file 152, and other processes involved in generating and presenting the AR scene 140 and the elements therein may be accomplished by an AR generation module, whereby the user device (e.g., user devices 102) may generate a camera pose, a coordinate mapping, and may determine one or more features of objects in the environment of the user device. For example, a room within which the AR scene 140 may be provided may include multiple objects (e.g., a desk, a chair, a wall, etc. ) . An AR generation module in a user device 102 may detect and track one or more features of the multiple objects, for example, using edge detection to discern outlines of the multiple objects. From the feature data, the AR generation module may determine a coordinate map defining and/or delineating one or more planar surfaces in the AR scene 140, such that the edges are processed by the AR generation module to determine the one or more vanishing points. Using the vanishing point data and the feature data the AR generation module may determine adjustment factors for projecting images in the AR scene 140 to make the images appear as though they were real objects in the 3D environment of the AR scene 140.

In some embodiments, the AR file 152, as projected in the AR scene 140, includes text editing and/or annotating functions, whereby a user of the user device presenting the AR scene 140 may introduce annotations 156 and/or modifications 154 into the text presented in the AR scene 140. In some embodiments, the modifications 154 and annotations 156 may be received by the user device as data generated from image recognition of hand gestures by the user of the user device 102. For example, the user device may recognize the motion of a fingertip of the user of the user device, and may introduce a modification 154 and/or annotation 156 to the AR file 154 as projected in the AR scene 140. In some embodiments, the hand gesture may be recognized when made within a given distance of the boundary of the virtual plane 150 in a direction normal to surface of the object on which the virtual plane 150 is projected. For example, to improve the quality and accuracy of gesture recognition, the user of the user device may be required to make modification gestures in contact with the surface of the object on which the virtual plane 150 is projected. In some embodiments, hand gestures and/or tools may be employed for multiple different types of modifications 154, as described in more detail in reference to FIG. 4. For example, a fingertip gesture may be employed to add an annotation 156, while an open-handed waving motion may be employed to erase the annotation 156.

To facilitate user interaction, the AR scene 140 may include an audiovisual communication element 160 presented as UI elements overlaid on the 3D environment of the user device. The user device may generate and/or present the audiovisual communication element 160 using an audiovisual communication application implemented by the AR generation module. In some embodiments, the audiovisual communication module is separate from the AR generation module, and is implemented by the user device and/or the mobile application server. For example, the user device may implement multi-party communication via a voice chat 162 platform and/or a video chat 164 platform. Communication sent and/or received via the voice chat 162 and the video chat 164 may be communicated via the mobile application server and/or directly from one user device to another (e.g., via Bluetooth connection) . As described in more detail in reference to FIG. 3, the digital document file 104 may be editable by multiple user devices in real time, via implementation of multiple AR scenes.

The AR file 152 may be updated to include the modifications 154 and/or annotations as digital markup. For example, if a user of a user device implementing the AR scene 140 makes a gesture that is recognized by the user device to add a modification 154 to the AR file 152 (e.g., moving a fingertip or a tool in the virtual plane 150) , the AR file 152 may be modified by the AR generation module to include a visualization of that modification 154. The visualization may include, but is not limited to, a scalable vector image object depicting the modification 154 as recognized from the gesture, a bitmap image depicting the modification 154 embedded in the AR file 152 as an image object, or the like. In some embodiments, the AR generation module may generate and/or present an updated AR file 161 by converting the modification 154 into an editable text insertion 164, for example, by converting the script to print including language recognition, translation, and rendering, to produce the editable text insertion 164. Similarly, annotations 156 may be converted to editable text or objects 166 inserted at the position of the annotation 156 in the page of the updated AR file 161. These changes may be reflected in the AR scene 140 by updating the presentation of the AR file 152 with the updated AR file 161. In some embodiments, the modifications 154 and annotations 156, either as script or as editable text, are incorporated into an updated digital document file, which is stored in a data store 170, for example on device storage 172 or via a content network 174, as described in more detail in reference to FIG. 2. In this way, the digital document file may be edited by a user of a user device via the AR scene 140.

In some embodiments, portions of the digital document file may be access-restricted based on an identifier of a user of a user device and/or on an identifier of a user device of the user devices 102. For example, a user of a user device may be assigned a specific role in a team using a collaborative editing application to edit a document covering multiple assignments (e.g., a team report for which a user is assigned a section or subsection) . In some embodiments, the AR file 152 may permit the user to view all sections of the AR file 152, but may not recognize modification or annotation actions made to the AR file 152 in sections for which the user has no editing rights. Similarly, the AR file 152 may be view-restricted for which a user must be authorized to view sections of the AR file 152 in the AR scene 140. In some embodiments, locking and unlocking of sections of the AR file 152 may be made in real time by a user of the user devices 102 (e.g., the AR scene 140 may include a user interface element for locking and/or unlocking sections of the AR file 152) .

FIG. 2 is a diagram illustrating an example system architecture for a system for collaborative document editing in an AR environment according to an embodiment of the present disclosure. In FIG. 2, a user device 202 may be in communication with a number of other components, including at least a mobile application server 204. The mobile application server 204 may perform at least a portion of the processing functions required by a mobile application installed upon the user device. The user device 202 may be an example of the user devices 102 described with respect to FIG. 1.

A user device 202 may be any suitable electronic device that is capable of providing at least a portion of the capabilities described herein. In particular, the user device 202 may be any electronic device capable of presenting an augmented reality (AR) scene including one or more AR features overlaid on a field of view of a real-world environment of the user device 202. In some embodiments, a user device may be capable of establishing a communication session with another electronic device (e.g., mobile application server 204) and transmitting /receiving data from that electronic device. A user device may include the ability to download and/or execute mobile applications. User devices may include mobile communication devices as well as personal computers and thin-client devices. In some embodiments, a user device may comprise any portable electronic device that has a primary function related to communication. For example, a user device may be a smart phone, a personal data assistant (PDA) , or any other suitable handheld device. The user device can be implemented as a self-contained unit with various components (e.g., input sensors, one or more processors, memory, etc. ) integrated into the user device. Reference in this disclosure to an “output” of a component or an “output” of a sensor does not necessarily imply that the output is transmitted outside of the user device. Outputs of various components might remain inside a self-contained unit that defines a user device.

In one illustrative configuration, the user device 202 may include at least one memory 206 and one or more processing units (or processor (s) ) 208. The processor (s) 208 may be implemented as appropriate in hardware, computer-executable instructions, firmware or combinations thereof. Computer-executable instruction or firmware implementations of the processor (s) 208 may include computer-executable or machine executable instructions written in any suitable programming language to perform the various functions described. The user device 202 may also include one or more input sensors 210 for receiving user and/or environmental input. There may be a variety of input sensors 210 capable of detecting user or environmental input, such as an accelerometer, a camera device, a depth sensor, a microphone, a global positioning system (e.g., GPS) receiver, etc. The one or more input sensors 210 may include at least a range camera device (e.g., a depth sensor) capable of generating a range image, as well as a camera device configured to capture image information.

For the purposes of this disclosure, a range camera (e.g., a depth sensor) may be any device configured to identify a distance, or range, of an object or objects from the range camera. In some embodiments, the range camera may generate a range image (or range map) , in which pixel values correspond to the detected distance for that pixel. The pixel values can be obtained directly in physical units (e.g., meters) . In at least some embodiments of the disclosure, the 3D imaging system may employ a range camera that operates using structured light. In a range camera that operates using structured light, a projector projects light onto an object or objects in a structured pattern. The light may be of a range that is outside of the visible range (e.g., infrared or ultraviolet) . The range camera may be equipped with one or more camera devices configured to obtain an image of the object with the reflected pattern. Distance information may then be generated based on distortions in the detected pattern. It should be noted that any suitable type of range camera, including those that operate using stereo triangulation, sheet of light triangulation, time-of-flight, interferometry, coded aperture, or any other suitable technique for range detection, would be useable by the described system.

The memory 206 may store program instructions that are loadable and executable on the processor (s) 208, as well as data generated during the execution of these programs. Depending on the configuration and type of user device 202, the memory 206 may be volatile (such as random access memory (RAM) ) and/or non-volatile (such as read-only memory (ROM) , flash memory, etc. ) . The user device 202 may also include additional storage 212, such as either removable storage or non-removable storage including, but not limited to, magnetic storage, optical disks, and/or tape storage. The disk drives and their associated computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices. In some implementations, the memory 206 may include multiple different types of memory, such as static random access memory (SRAM) , dynamic random access memory (DRAM) or ROM. Turning to the contents of the memory 206 in more detail, the memory 206 may include an operating system 214 and one or more application programs or services for implementing the features disclosed herein including at least a mobile application 216. The memory 206 may also include application data 218, which provides information to be generated by and/or consumed by the mobile application 216. In some embodiments, the application data 218 may be stored in a database.

For the purposes of this disclosure, a mobile application may be any set of computer executable instructions installed upon, and executed from, a user device 202. Mobile applications may be installed on a user device by a manufacturer of the user device or by another entity. In some embodiments, the mobile application may cause a user device to establish a communication session with a mobile application server 204 that provides backend support for the mobile application. A mobile application server 204 may maintain account information associated with a particular user device and/or user. In some embodiments, a user may be required to log into a mobile application in order to access functionality provided by the mobile application 216.

In accordance with at least some embodiments, the mobile application 216 may be configured to identify objects within an environment surrounding the user device 202. In accordance with at least some embodiments, the mobile application 216 may receive output from the input sensors 210 and identify objects or potential objects within that output. For example, the mobile application 216 may receive depth information (e.g., a range image) from a depth sensor (e.g., a range camera) , such as the depth sensors previously described with respect to input sensors 210. Based on this information, the mobile application 216 may determine the bounds of an object to be identified. For example, a sudden variance in depth within the depth information may indicate a border or outline of an object. In another example, the mobile application 216 may utilize one or more machine vision techniques to identify the bounds of an object. In this example, the mobile application 216 may receive image information from a camera input sensor 210 and may identify potential objects within the image information based on variances in color or texture data detected within the image. In some embodiments, the mobile application 216 may cause the user device 202 to transmit the output obtained from the input sensors 210 to the mobile application server 204, which may then perform one or more object recognition techniques upon that output. Additionally, the mobile application 216 may cause the user device 202 to transmit a current location of the user device 202 as well as an orientation (e.g., facing) of the user device 202 to the mobile application server 204.

The user device 202 may also contain communications interface (s) 220 that enable the user device 202 to communicate with any other suitable electronic devices. In some embodiments, the communication interface 220 may enable the user device 202 to communicate with other electronic devices on a network (e.g., on a private network) . For example, the user device 202 may include a Bluetooth wireless communication module, which allows it to communicate with another electronic device (e.g., a different user device implementing the mobile application, etc. ) . The user device 202 may also include input/output (I/O) device (s) and/or ports 222, such as for enabling connection with a keyboard, a mouse, a pen, a voice input device, a touch input device, a display, speakers, a printer, etc.

In some embodiments, the user device 202 may communicate with the mobile application server 204 via a communication network. The communication network may include any one or a combination of many different types of networks, such as cable networks, the Internet, wireless networks, cellular networks, and other private and/or public networks. In addition, the communication network may comprise multiple different networks. For example, the user device 202 may utilize a wireless local area network (WLAN) to communicate with a wireless router, which may then route the communication over a public network (e.g., the Internet) to the mobile application server 204.

The mobile application server 204 may be any computing device or plurality of computing devices configured to perform one or more calculations on behalf of the mobile application 216 on the user device 202. In some embodiments, the mobile application 216 may be in periodic communication with the mobile application server 204. For example, the mobile application 216 may receive updates, push notifications, or other instructions from the mobile application server 204. In some embodiments, the mobile application 216 and mobile application server 204 may utilize a proprietary encryption and/or decryption scheme to secure communications between the two. In some embodiments, the mobile application server 204 may be executed by one or more virtual machines implemented in a hosted computing environment. The hosted computing environment may include one or more rapidly provisioned and released computing resources, which computing resources may include computing, networking, and/or storage devices. A hosted computing environment may also be referred to as a cloud-computing environment.

In one illustrative configuration, the mobile application server 204 may include at least one memory 224 and one or more processing units (or processor (s) ) 226. The processor (s) 226 may be implemented as appropriate in hardware, computer-executable instructions, firmware or combinations thereof. Computer-executable instruction or firmware implementations of the processor (s) 226 may include computer-executable or machine executable instructions written in any suitable programming language to perform the various functions described.

The memory 224 may store program instructions that are loadable and executable on the processor (s) 226, as well as data generated during the execution of these programs. Depending on the configuration and type of mobile application server 204, the memory 224 may be volatile (such as random access memory (RAM) ) and/or non-volatile (such as read-only memory (ROM) , flash memory, etc. ) . The mobile application server 204 may also include additional storage 228, such as either removable storage or non-removable storage including, but not limited to, magnetic storage, optical disks, and/or tape storage. The disk drives and their associated computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices. In some implementations, the memory 224 may include multiple different types of memory, such as static random access memory (SRAM) , dynamic random access memory (DRAM) or ROM. Turning to the contents of the memory 224 in more detail, the memory 224 may include an operating system 230 and one or more application programs or services for implementing the features disclosed herein including at least a module for mapping surfaces of the objects identified by the mobile application 216 for generating AR scene data (AR generation module 232) and/or a module for managing document data (document management module 234) . The memory 224 may also include account data 236, which provides information associated with user accounts maintained by the described system (e.g., access rights, permissions, etc. ) , AR data 238, which provides information related to object/user locations as well as AR interface information, and/or documents database 240, which provides networked data storage of document files (e.g., cloud storage) . In some embodiments, one or more of the account data 236, the AR data 238, or the document database 240 may be stored in a database.

The memory 224 and the additional storage 228, both removable and non-removable, are examples of computer-readable storage media. For example, computer-readable storage media may include volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. As used herein, the term “modules” may refer to programming modules executed by computing systems (e.g., processors) that are installed on and/or executed from the mobile application server 204. The mobile application server 204 may also contain communications connection (s) 242 that allow the mobile application server 204 to communicate with a stored database, another computing device or server, user terminals, and/or other components of the described system. The mobile application server 204 may also include input/output (I/O) device (s) and/or ports 244, such as for enabling connection with a keyboard, a mouse, a pen, a voice input device, a touch input device, a display, speakers, a printer, etc.

Turning to the contents of the memory 224 in more detail, the memory 224 may include the augmented reality (AR) module 232, the document management module 234, the database containing account data 236, the database containing AR data 238, and/or the database containing document data 240. In some embodiments, the memory 224 and/or the AR generation module 232 may further include a communication application module (not shown) to implement a multi-party communication application, as described in more detail in reference to FIG. 1.

In some embodiments, the AR generation module 232 may be configured to, in conjunction with the processors 226, receive input sensor data from the user device 202 and identify one or more objects within the input sensor data. As described above, the system may use any suitable object recognition technique to identify the objects within the input sensor data. In some embodiments, the AR generation module 232 may use one or more techniques to map at least a portion of the received input sensor data to one or more surfaces of objects in the environment of the user device 202. Although the use of depth sensor output is described in identifying a distance between the user device 202 and the object, it should be noted that other techniques may be used to determine such a distance. For example, the AR generation module 232 may implement visual Simultaneous Localization and Mapping (vSLAM) to detect and track features of the objects, and calculate from those features a coordinate mapping and an output pose of the input sensor. In this way, AR generation module 232 may map an AR file (e.g., AR file 152 of FIG. 1) to a virtual plane (e.g., virtual plane 150 of FIG. 1) on a surface of one or more objects, using the received sensor input data. In some embodiments, the AR generation module 232 may generate AR data 238 describing the vSLAM data, which may be stored and accessed by other user devices for use in a shared AR scene (e.g., shared AR scene 340 of FIG. 3) . For example, the orientation, position, and perspective mapping of a virtual plane in the shared AR scene may be generated based on the field of view and vSLAM data generated by a primary user device. To preserve the orientation of the virtual plane in a second field of view, a second user device may receive AR data 238 for projecting the AR file in the virtual plane. In some embodiments, the AR generation module 232 is further configured to provide user interface data for generating a user interface to receive user input and to present in the field of view of the user device 202 one or more interface elements including, but not limited to an audiovisual communication application, as described in more detail in reference to FIG. 1. In some embodiments, the user interface is further configured to recognize gestures and target object motion as a form of input when the gestures and target object motion occur in the virtual plane.

In some embodiments, the document management module 234 may be configured to, in conjunction with the processors 226, manage information for the document files and AR files implemented in AR scenes by the AR generation module 232. In some embodiments, document information may be obtained with respect to a particular account or user. The document management module 234 may provide an AR file associated with a digital document file for presentation by the AR generation module 232. To do this, the document management module 234 may use an imaging sensor of the user device 202 to generate a digital document file from a physical document, as described in more detail in reference to FIG. 1. The document management module 234 may use plane-rectification techniques to correct the image of the physical document from a skewed perspective view to a standard rectangular document size. The document management module 234 may further recognize text on the physical document using optical character recognition, translation, and script-to-text conversion techniques (for example, implemented as a form of natural language processing) . The document management module 234 may then store the digital document file, and generate an AR file from the digital document file for presentation in an AR scene.

The document management module 234 may also generate, receive, and store document data 240 including but not limited to modifications and/or annotations made during collaborative editing. For example, document data 240 may include editable text insertions made via one or more user input devices. Document data 240 may also include recognized modifications received by the user device 202 via the AR interface as gestures or target object motion. The document management module 234 may update the digital document file in real time, generating an updated digital document file incorporating the latest modifications and edits, which may be provided to user devices for presentation in updated AR scenes. In some embodiments, document data 240 may include metadata including chat transcripts and or comments transcribed during collaborative editing. In some embodiments, the document management module 234 may receive account data 236 for implementing access restrictions and for limiting editing permissions for one or more user devices participating in collaborative editing.

FIG. 3A is a diagram showing an example technique for collaborative document editing using AR applications according to an embodiment of the present disclosure. As described in more detail in reference to FIG. 1, an AR scene 140 may include a projection of an AR file onto a surface of an object in the environment of the user device implementing the AR scene 140. To implement collaborative editing, the AR scene 140 may include audiovisual communication with other users of other user devices also implementing AR scenes. Modifications made to the AR file, for example, using gesture recognition or an editing tool (e.g., a stylus or other target object) , may be shared across the user devices and implemented in updates made to AR scenes in real time. As described in more detail in reference to FIG. 1, the digital document file may be updated using the modifications made in the AR scene 140.

FIG. 3B is another diagram showing another example technique for collaborative document editing using AR applications according to an embodiment of the present disclosure. In a real time collaborative editing system, a second AR scene 310 may be generated and/or presented by a second user device (e.g., user devices 102 of FIG. 1) in a different location from that of the AR scene presented in FIG. 3A (e.g., AR scene 140 of FIG. 3A) . For example, the user of the user device generating and/or presenting the second AR scene 310 may be in another city or may be in another office in the same building as the user of the user device generating the AR scene presented in FIG. 3A. In the second AR scene 310, a second virtual plane 320 may be determined and/or defined on a surface of an object in the environment of the user device (e.g., on a wall or a tabletop) . The virtual plane 320 may define and or delineate the boundaries of a projection of the AR file 322, adjusted to appear as a real-world object overlaid in the environment of the second AR scene 310. Also similarly to the AR scene presented in FIG. 3A, the second AR scene 310 may include a collaborative communication interface 330 including voice and/or video communication between users participating in collaborative document editing.

FIG. 3C is another diagram showing another example technique for collaborative document editing using AR applications according to an embodiment of the present disclosure. In some embodiments, collaborative document editing may include multiple users of user devices viewing AR scenes in a single location. In such cases, a shared AR scene 340 may include a primary user device 346 from whose perspective a shared virtual plane 350 is defined on a surface of an object in the environment of the shared AR scene 340. In the virtual plane, a shared AR file 352 may be projected as an overlay to appear as a real object on the surface. In the shared AR scene 340, the shared AR file 352 may appear in an orientation that is aligned with the primary user device 346, for example, by being positioned according to a location selected by the user of the primary user device 346. In some embodiments, the primary user device 346 generates and/or presents the shared virtual plane 350 and the shared AR file 352 automatically (e.g., without user input) .

Information describing the position of the shared virtual plane 350 and the orientation and/or presentation of the shared AR file 352 may be shared between the primary user device 346 and additional user devices participating in the shared AR scene 340. For example, a secondary user device 344 in the same location as the primary user device 346 may receive the information and may reproduce the position and orientation of the shared virtual plane 350 and the shared AR file 352 according to the perspective of the primary user device 346. In some embodiments, this includes the shared AR file 352 being viewable via the second user device 344 as a document oriented toward the primary user device 346, as if it were a physical document placed on a surface before the user of the primary user device 346.

In some embodiments the shared AR file 352 may be projected on a vertical plane (e.g., virtual plane 320 of FIG. 3B) such that multiple user devices may view the shared AR file 352 in the same orientation and from multiple different perspectives. To that end, each user device participating in the shared AR scene 340 may adjust the presentation of the shared AR file 352 to match a different coordinate map based on the same set of features detected and tracked in a shared environment. As part of real time collaborative editing, one or more users 342 may provide and/or receive updates and document modifications while located in separate and/or remote environments, generating individual AR scenes, for example, an individual vertical AR scene 354 in one environment and/or an individual horizontal AR scene 356 in a different environment, whereby each AR scene presents the updated AR file according to a respective orientation, position, and perspective.

FIG. 4 depicts an illustrative example of a technique 400 for collaborative editing of a document including AR features according to an embodiment of the present disclosure. As described in more detail in reference to FIG. 1, an AR file 410 may be edited by multiple users participating in collaborative editing in AR scenes (e.g., AR scene 140 of FIG. 1, shared AR scene 340 of FIG 3. ) . Collaborative editing may include modifications 412 and annotations 414 made using hand gesture and/or target object recognition on the part of a user device implementing an AR generation module. In some embodiments, the AR file 410 may be updated to include edits in real time, for example, by converting modifications 412 into editable text 422 presented in an updated AR file 420. The updated AR file 420 may be modified with further edits, for example, a hand gesture indicating that a line is to be removed 424 from the digital document file (e.g., digital document file 104 of FIG. 1) that is represented in the updated AR file 420. The removal edit 424 may be shown, for example, by markup 426 in the AR file.

In some embodiments, AR features 432 may be added to the AR file and/or the updated AR file 420 to provide a dynamic AR file 430. AR features 432 may include animated AR elements, AR elements presented such that they appear in the AR scene as physical objects leaving the virtual plane within which the dynamic AR file 430 is presented (e.g., virtual plane 150 of FIG. 1) . For example, AR features 432 may include a user icon or badge as a comment flag that exits the plane of the dynamic AR file 430 and that casts a shadow on the dynamic AR file 430. In some embodiments, the AR features 432 may be converted to graphical elements 442 in a planar AR file 440. The planar AR file 440 may include a flattened perspective form of the AR features 432, for example, including dynamic elements (e.g., animations) but without appearing in the AR scene as a real object leaving the virtual plane.

FIG. 5 is a simplified flowchart illustrating a method 500 of collaborative document editing using AR applications according to an embodiment of the present disclosure. The flow is described in connection with a computer system that is an example of the computer systems described herein. Some or all of the operations of the flows can be implemented via specific hardware on the computer system and/or can be implemented as computer-readable instructions stored on a non-transitory computer-readable medium of the computer system. As stored, the computer-readable instructions represent programmable modules that include code executable by a processor of the computer system. The execution of such instructions configures the computer system to perform the respective operations. Each programmable module in combination with the processor represents a means for performing a respective operation (s) . While the operations are illustrated in a particular order, it should be understood that no particular order is necessary and that one or more operations may be omitted, skipped, and/or reordered.

The method 500 includes obtaining a document file (502) . As described in more detail in reference to FIG. 1, the document file may include a digital document file (e.g., digital document file 104 of FIG. 1) including but not limited to a text document or portable document format encoded for presentation and/or editing on a computer system. Optionally, obtaining a document includes obtaining an image of a physical document (e.g., physical document 120 of FIG. 1) , for example, using a camera in communication with the computer system (e.g., user devices 102 of FIG. 1 and/or mobile application server 204 of FIG. 2) . Optionally, obtaining a document also includes generating a document file in an editable document format using the image of the physical document. As described in more detail in reference to FIG. 1, this may include implementing image processing applications to adjust the shape of text and writing depicted in the image of the physical document, and may also include optical character recognition to generate an editable document. Optionally, obtaining a document also includes generating a new (empty) document file. In some embodiments, as when the physical document includes handwritten script or multiple font types or text styles, the method further includes converting printed text into editable text data and the handwritten script into annotation data. As described in more detail in reference to FIG. 1, annotation data may include image objects embedded in the digital document file.

The method further includes converting the digital document file to an AR format file (504) . As described in more detail in reference to FIG. the AR format file (e.g., AR file 152 of FIG. 1) may include the digital document file encoded in a format for presentation as an overlay in an environment of the computer system. For example, it may be encoded for visualization as a virtual object occupying space in the field of view of the user of the computer system that occludes the three dimensional environment of the user.

The method further includes presenting the AR format file in an AR scene with a user interface (506) . As described in more detail in reference to FIG. 1, the AR scene (e.g., AR scene 140 of FIG. 1) may include identifying a virtual plane (e.g., virtual plane 150 of FIG. 1) on a surface of a real object in the environment of the user upon which to project the AR file. Optionally, the AR scene may be presented by a user device using AR scene data generated by a mobile application server. Optionally, the mobile application server provides the AR scene data to a user device for presenting the AR scene. Alternatively, the user device may generate the AR scene data. Optionally, the AR scene data includes computer-readable instructions for adjusting the AR file for presentation in the AR scene using feature tracking data, a coordinate map, and pose data associated with a real-world environment of a first user device. For example, as described in more detail in reference to FIG. 3, a primary user device may determine the position and orientation of the virtual plane and the visualization of the AR file in a shared AR scene wherein multiple user devices view the AR file from multiple different perspectives, according to a single orientation. For example, a primary user device may determine a position and orientation of an object using position and location data of the user device relative to other user devices in the environment. Similarly, feature tracking data may be shared between the user devices to define a common coordinate map, based on which the visualization of the AR file may be adjusted to correct for different fields of view or perspectives. Optionally, different virtual planes, visualizations of the AR file, and AR features may be introduced by different AR datasets for user devices in different physical locations. For example, when two user devices are located in different physical locations, a common coordinate system based on features in the environment may not be possible, and as such different virtual planes may be defined. In such cases, feature tracking, coordinate mapping, and pose data for each user device will be used in determining the position and orientation of the AR file in an individual AR scene. Optionally, the user interface includes a two-way communication application (e.g., group chat) including audio and visual communication. Additionally and/or alternatively, as described in more detail in reference to FIG. 1, the user interface may include gestural input and or target object recognition.

The method further includes receiving document modification data from one or more sources (508) . As described in more detail in reference to FIG. 1, the document modification data may include but are not limited to modifications, annotations, text edits, or the like received by the computer system via the user interface elements. For example, a hand gesture in the virtual plane using a fingertip to annotate the AR file may be received as a modification of the AR file. Similarly, the use of a target object (e.g., a stylus, pen, pencil, etc) may be recognized as an editing tool. Optionally, the document modification data may be received from a data store (e.g., storage 170 of FIG. 1) and/or from different user devices via direct communication methods (e.g., Bluetooth) or via a network.

The method further includes converting the document modification data into digital text (510) . This involves converting handwritten text included in the document modification data into digital text as well as converting a sketch, a diagram (e.g., made with another application) , or other illustrative representation into a digital representation of that data. As described in more detail in reference to FIG. 1 and FIG. 4, the method 500 may include collaborative editing of the digital document file by multiple users of multiple user devices, for example, via multiple instances of AR scenes in one or more physical locations. In some embodiments, the method 500 includes real time updating of the digital document file to incorporate modifications, annotations, edits, etc., of the AR file presented in the AR scene by document modification data received via a user interface of a user device of the user devices. Real time updating may include providing an updated AR file for each user device presenting the AR file in an AR scene. Real time updating may include updating the digital document file with edits as they are made (512) . As part of an updated AR scene, the method further includes presenting the updated file in the AR scene (514) . By providing updated visualization data for the AR file in real time, collaborative editing may include editing of modifications, annotations, and edits by subsequent edits. The method further includes storing the updated document file (516) . In some embodiments, the computer system is in communication with a data store and/or a distributed storage network (e.g., cloud storage system) , wherein the updated digital document file may be stored. In this way, the updated digital document file may be available for provision to additional computer systems for use in collaborative editing or for other purposes (e.g., publication) .

It should be appreciated that the specific steps illustrated in FIG. 5 provide a particular method of collaborative document editing via augmented reality environments according to an embodiment of the present disclosure. As noted above, other sequences of steps may also be performed according to alternative embodiments. For example, alternative embodiments of the present disclosure may perform the steps outlined above in a different order. Moreover, the individual steps illustrated in FIG. 5 may include multiple sub-steps that may be performed in various sequences as appropriate to the individual step. Furthermore, additional steps may be added or removed depending on the particular applications. One of ordinary skill in the art would recognize many variations, modifications, and alternatives.

FIG. 6 illustrates examples of components of a computer system 600 according to certain embodiments. The computer system 600 is an example of the computer system described herein above. Although these components are illustrated as belonging to a same computer system 600, the computer system 600 can also be distributed.

The computer system 600 includes at least a processor 602, a memory 604, a storage device 606, input/output peripherals (I/O) 608, communication peripherals 610, and an interface bus 612. The interface bus 612 is configured to communicate, transmit, and transfer data, controls, and commands among the various components of the computer system 600. The memory 604 and the storage device 606 include computer-readable storage media, such as RAM, ROM, electrically erasable programmable read-only memory (EEPROM) , hard drives, CD-ROMs, optical storage devices, magnetic storage devices, electronic non-volatile computer storage, for example

memory, and other tangible storage media. Any of such computer readable storage media can be configured to store instructions or program codes embodying aspects of the disclosure. The memory 604 and the storage device 606 also include computer readable signal media. A computer readable signal medium includes a propagated data signal with computer readable program code embodied therein. Such a propagated signal takes any of a variety of forms including, but not limited to, electromagnetic, optical, or any combination thereof. A computer readable signal medium includes any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use in connection with the computer system 600.

Further, the memory 604 includes an operating system, programs, and applications. The processor 602 is configured to execute the stored instructions and includes, for example, a logical processing unit, a microprocessor, a digital signal processor, and other processors. The memory 604 and/or the processor 602 can be virtualized and can be hosted within another computer system of, for example, a cloud network or a data center. The I/O peripherals 608 include user interfaces, such as a keyboard, screen (e.g., a touch screen) , microphone, speaker, other input/output devices, and computing components, such as graphical processing units, serial ports, parallel ports, universal serial buses, and other input/output peripherals. The I/O peripherals 608 are connected to the processor 602 through any of the ports coupled to the interface bus 612. The communication peripherals 610 are configured to facilitate communication between the computer system 600 and other computing devices over a communications network and include, for example, a network interface controller, modem, wireless and wired interface cards, antenna, and other communication peripherals.

While the present subject matter has been described in detail with respect to specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing may readily produce alterations to, variations of, and equivalents to such embodiments. Accordingly, it should be understood that the present disclosure has been presented for purposes of example rather than limitation, and does not preclude inclusion of such modifications, variations, and/or additions to the present subject matter as would be readily apparent to one of ordinary skill in the art. Indeed, the methods and systems described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the methods and systems described herein may be made without departing from the spirit of the present disclosure. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the present disclosure.

Unless specifically stated otherwise, it is appreciated that throughout this specification discussions utilizing terms such as “processing, ” “computing, ” “calculating, ” “determining, ” and “identifying” or the like refer to actions or processes of a computing device, such as one or more computers or a similar electronic computing device or devices, that manipulate or transform data represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the computing platform.

The system or systems discussed herein are not limited to any particular hardware architecture or configuration. A computing device can include any suitable arrangement of components that provide a result conditioned on one or more inputs. Suitable computing devices include multipurpose microprocessor-based computer systems accessing stored software that programs or configures the computer system from a general-purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter. Any suitable programming, scripting, or other type of language or combinations of languages may be used to implement the teachings contained herein in software to be used in programming or configuring a computing device.

Embodiments of the methods disclosed herein may be performed in the operation of such computing devices. The order of the blocks presented in the examples above can be varied-for example, blocks can be re-ordered, combined, and/or broken into sub-blocks. Certain blocks or processes can be performed in parallel.

Conditional language used herein, such as, among others, “can, ” “could, ” “might, ” “may, ” “e.g., ” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain examples include, while other examples do not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements and/or steps are in any way required for one or more examples or that one or more examples necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or steps are included or are to be performed in any particular example.

The terms “including, ” “including, ” “having, ” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list. The use of “adapted to” or “configured to” herein is meant as open and inclusive language that does not foreclose devices adapted to or configured to perform additional tasks or steps. Additionally, the use of “based on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Similarly, the use of “based at least in part on” is meant to be open and inclusive, in that a process, step, calculation, or other action “based at least in part on” one or more recited conditions or values may, in practice, be based on additional conditions or values beyond those recited. Headings, lists, and numbering included herein are for ease of explanation only and are not meant to be limiting.

The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of the present disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed examples. Similarly, the example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed examples.

Claims

A method, implemented by a computer system, for cooperative document editing in an augmented reality (AR) environment, the method comprising:

obtaining, by the computer system, a document file;

generating, by the computer system, an AR file using the document file;

providing, by the computer system:

AR scene data, an AR scene generated from the AR scene data being configured to present the AR file in the AR scene disposed on a surface of a real-world object; and

a user interface comprising one or more interactive elements, the one or more interactive elements configured to receive user interaction;

receiving, by the computer system, document modification data;

converting, by the computer system, the document modification data into digital information;

providing, by the computer system, an updated AR file using the digital information;

updating, by the computer system, the AR scene data using the updated AR file;

generating, by the computer system, an updated document file using the updated AR file; and

storing, by the computer system, the updated document file in a data store.
The method of claim 1 wherein obtaining a document file comprises:

obtaining, by the computer system, an image of a physical document comprising printed text; and

generating, by the computer system, a document file in an editable document format using the image of the physical document.
The method of claim 2 wherein the physical document further comprises script and wherein generating a document file in an editable document format further comprises:

converting printed text of the image of the physical document into editable text data; and

converting script of the image of the physical document into annotation data.
The method of claim 1 wherein the AR file comprises an editable text document encoded for visualization in a three dimensional (3D) environment of the AR scene.
The method of claim 1 wherein the AR scene data comprises computer-readable instructions for adjusting the AR file for presentation in the AR scene using feature tracking data, a coordinate map, and pose data associated with a real-world environment of a first user device.
The method of claim 5 wherein providing AR scene data comprises providing a plurality of AR scene data sets to a plurality of user devices, and wherein each user device of the plurality of user devices adjusts the AR file for presentation in a respective AR scene of each user device of the plurality of user devices.
The method of claim 6 wherein:

a second user device of the plurality of the user devices is located in a different real-world environment than the real-world environment of the first user device; and

the AR scene data comprises computer-readable instructions for adjusting the AR file for presentation in the AR scene using feature tracking data, a coordinate map, and pose data associated with the different real-world environment.
The method of claim 6 wherein one or more user devices of the plurality of user devices are located in the real-world environment of the first user device, such that the one or more user devices of the plurality of user devices adjust the AR file for presentation in the AR scene using an orientation determined by the first user device.
The method of claim 1 wherein the one or more interactive elements comprise a two-way audiovisual communication application.
The method of claim 1 wherein the user interaction comprises at least one of a hand gesture or a target object interaction associated with the AR file presented in the AR scene.
The method of claim 10 wherein receiving document modification data comprises:

receiving, by the computer system, the user interaction via the user interface; and

receiving, by the computer system, document-edit data associated with the document file from at least one of a content network or a data store, wherein the document-edit data is generated by a plurality of user devices.
A system comprising a processor; and

a memory including instructions that, when executed with the processor, cause the system to, at least:

obtain a document file;

generate an AR file using the document file;

provide:

AR scene data, an AR scene generated from the AR scene data being configured to present the AR file in the AR scene disposed on a surface of a real-world object; and

a user interface comprising one or more interactive elements, the one or more interactive elements configured to receive user interaction;

receive document modification data;

convert the document modification data into digital information;

provide an updated AR file using the digital information;

update the AR scene data using the updated AR file;

generate an updated document file using the updated AR file; and

store the updated document file in a data store.
The system of claim 12 wherein obtaining a document file comprises:

obtaining an image of a physical document comprising printed text; and

generating a document file in an editable document format using the image of the physical document.
The system of claim 12 wherein the AR scene data comprises computer-readable instructions for adjusting the AR file for presentation in the AR scene using feature tracking data, a coordinate map, and pose data associated with a real-world environment of a first user device.
The system of claim 14 wherein providing AR scene data comprises providing a plurality of AR scene data sets to a plurality of user devices, and wherein each user device of the plurality of user devices adjusts the AR file for presentation in a respective AR scene of each user device of the plurality of user devices.
The system of claim 15 wherein:

a second user device of the plurality of the user devices is located in a different real-world environment than the real-world environment of the first user device; and

the AR scene data comprises computer-readable instructions for adjusting the AR file for presentation in the AR scene using feature tracking data, a coordinate map, and pose data associated with the different real-world environment.
The system of claim 15 wherein one or more user devices of the plurality of user devices are located in the real-world environment of the first user device, such that the one or more user devices of the plurality of user devices adjust the AR file for presentation in the AR scene using an orientation determined by the first user device.
A non-transitory computer readable medium storing specific computer-executable instructions that, when executed by a processor, cause a computer system to at least:

obtain a document file;

generate an AR file using the document file;

provide:

AR scene data, an AR scene generated from the AR scene data being configured to present the AR file in the AR scene disposed on a surface of a real-world object; and

a user interface comprising one or more interactive elements, the one or more interactive elements configured to receive user interaction;

receive document modification data;

convert the document modification data into digital information;

provide an updated AR file using the digital information;

update the AR scene data using the updated AR file;

generate an updated document file using the updated AR file; and

store the updated document file in a data store.
The non-transitory computer readable medium of claim 18 wherein obtaining a document file comprises:

obtaining an image of a physical document comprising printed text; and

generating a document file in an editable document format using the image of the physical document.
The non-transitory computer readable medium of claim 18 wherein the AR scene data comprises computer-readable instructions for adjusting the AR file for presentation in the AR scene using feature tracking data, a coordinate map, and pose data associated with a real-world environment of a first user device.