NO20111183A1

NO20111183A1 - Video conferencing system, method and computer program storage device

Info

Publication number: NO20111183A1
Application number: NO20111183A
Authority: NO
Inventors: Jason Catchpole
Original assignee: Cisco Tech Inc
Priority date: 2011-08-31
Filing date: 2011-08-31
Publication date: 2013-03-01
Also published as: NO333234B1

Abstract

En videokonferanseenhet for å presentere utvidede bilder som inkluderer minst ett grensesnitt, et nettverk og en datamaskinprosessor programmert til å motta en første informasjon som identifiserer en scene via minst ett grensesnitt. Datamaskinprosessoren detekterer også hvorvidt scenen inneholder minst en markør, og identifiserer en lokasjon av hver detektert markør innenfor scenen. Som respons på å bestemme at scenen inneholder en første markør, og basert på lokasjonen av den første markør, utvider datamaskinprosessoren deretter porsjonen av scenen som inneholder den første markør med en andre informasjon. Datamaskinprosessoren overfører deretter den utvidede scenen til minst en ekstern enhet via nettverket.A video conferencing device for presenting extended images that includes at least one interface, a network, and a computer processor programmed to receive an initial information that identifies a scene through at least one interface. The computer processor also detects whether the scene contains at least one marker, and identifies a location of each detected marker within the scene. In response to determining that the scene contains a first marker, and based on the location of the first marker, the computer processor then expands the portion of the scene containing the first marker with a second information. The computer processor then transfers the extended scene to at least one external device over the network.

Description

Videokonferansesystem, metode og dataprogramlagringsenhet Video conferencing system, method, and computer program storage device

BAKGRUNN BACKGROUND

Teknisk område Technical area

[0001] Beskrivelsen angår en videokonferanseinnretning, assosiert metodikk og en ikke-transitorisk datamaskinprogramlagringsenhet som benytter utvidet virkelighet (eng.: augmented reality) til å implementere utvidede bilder under en videokonferanse. [0001] The description relates to a video conferencing device, associated methodology and a non-transitory computer program storage device that uses augmented reality to implement augmented images during a video conference.

BESKRIVELSE AV RELATERT TEKNIKK DESCRIPTION OF RELATED ART

[0002] Ved utførelse av videokommunikasjon mellom et senderendepunkt og mottaksendepunkt anvender videokonferansesystemer to videokanaler til å formidle informasjon mellom seg. En "hoved-"videokanal omfatter videoinformasjon av en scene spilt inn av en avbildningsenhet, for eksempel av eksterne publikumsmedlemmer i en konferanse, eller en presentatør, mens den andre videokanalen er forbundet til en rekke eksterne enheter til å motta ytterligere videoinformasjon som presentasjonsmateriale. Imidlertid fører bruken av disse to kommunikasjonskanalene i videokonferansesystemer til en rekke problemer. [0002] When performing video communication between a sending end point and a receiving end point, video conference systems use two video channels to convey information between them. A "main" video channel comprises video information of a scene recorded by an imaging device, for example of external audience members in a conference, or a presenter, while the other video channel is connected to a number of external devices to receive additional video information as presentation material. However, the use of these two communication channels in video conferencing systems leads to a number of problems.

[0003] Ett problem med å sende scenevideoinformasjon over en annen videokanal enn presentasjonsmaterialet, er at presentatøren ikke lenger er på samme fysiske sted som presentasjonen, noe som dermed vanskeliggjør et naturlig samspill med presentasjonen. Med andre ord vil de eksterne publikumsmedlemmer på mottaksendepunktet enten se presentatøren eller presentasjonsmaterialet, slik at den eneste måten presentatøren kan peke informasjon ut i presentasjonsmaterialet på, er ved hjelp av en mus på PC eller en fjernkontroll. Presentasjoner er imidlertid mest effektive når eksterne publikumsmedlemmer er i stand til å se den faktiske presentatør sammen med presentatørens eventuelle interaksjoner eller håndbevegelser med hensyn til å presentere presentasjonsmaterialet. [0003] One problem with sending stage video information over a different video channel than the presentation material is that the presenter is no longer in the same physical location as the presentation, which thus makes natural interaction with the presentation difficult. In other words, the external audience members on the receiving end will either see the presenter or the presentation material, so that the only way the presenter can point out information in the presentation material is with the help of a mouse on a PC or a remote control. However, presentations are most effective when external audience members are able to see the actual presenter along with any interactions or hand gestures of the presenter in presenting the presentation material.

[0004] Et annet problem med typisk videokonferanseutstyr er at presentatøren ikke alltid vet når eksterne publikumsmedlemmer ved mottaksendepunktet kan se ham, siden eksterne brukere kan benytte en liten enhet som bare er stor nok til å vise enten presentatørens hovedvideo eller den sekundære video for presentasjonsmaterialet. Dette pålegger en ytterligere byrde på presentatøren, siden han ikke kan etablere noen form for øyekontakt med det eksterne publikum når han ikke engang vet om de eksterne publikumsmedlemmer kan se ham. Derfor, siden konfigurasjonen av videokonferanseinnretningen på mottaksendepunkt er ukjent, kan enhver faktisk øyekontakt gjort av presentatøren være hensiktsløs, siden eksterne publikumsmedlemmer kan bytte til å se utelukkende på presentasjonsmaterialet via den andre videokanalen, eller bytte til å se på presentatøren via hovedvideokanalen, og presentatøren vil være uvitende om endringene. [0004] Another problem with typical video conferencing equipment is that the presenter does not always know when remote audience members at the receiving endpoint can see him, since remote users may use a small device that is only large enough to display either the presenter's main video or the secondary video for the presentation material. This places an additional burden on the presenter, as he cannot establish any form of eye contact with the external audience when he does not even know if the external audience members can see him. Therefore, since the configuration of the video conferencing device on the receiving end is unknown, any actual eye contact made by the presenter may be futile, as remote audience members may switch to view the presentation material exclusively via the second video channel, or switch to view the presenter via the main video channel, and the presenter will be unaware of the changes.

US-2011/157179 Al vedrører en fremgangsmåte, et system og et datamaskinprogramprodukt for å tilveiebringe utvidet virkelighet, basert på markørsporing. I fremgangsmåten innfanges et bilde av en bildeinnfangingsenhet, og det bestemmes om en firkant er tilstede. Hvis en firkant finnes i bildet, bestemmes om firkanten er en markør som samsvarer med en markørdefinisjon. Hvis firkanten er markøren, identifiseres en identitet ved markøren og fire kantkoordinater for markørbidlet. En rotasjonstilstand for markøren bestemmes i henhold til kantkoordinatene for markørbildet, og en relativ forskyvning mellom markøren og bildeinnfangingsenheten beregnes. Et tredimensjonalt objekt kombineres inn i bildet i samsvar med den relative forskyvning, rotasjonstilstanden og markørens identitet, for derved å tilveiebringe et utvidet virkelighet-bilde. US-2011/157179 A1 relates to a method, a system and a computer program product for providing augmented reality, based on cursor tracking. In the method, an image is captured by an image capture device and it is determined whether a square is present. If a square exists in the image, it is determined whether the square is a marker that matches a marker definition. If the square is the marker, an identity is identified at the marker and four edge coordinates of the marker bit. A rotational state of the marker is determined according to the edge coordinates of the marker image, and a relative displacement between the marker and the image capture unit is calculated. A three-dimensional object is combined into the image in accordance with the relative displacement, rotation state and identity of the marker, thereby providing an augmented reality image.

US-5 444 476 A vedrører systemer og fremgangsmåter for å utføre teleinteraktive videokonferanser mellom to eller flere telekonferansesteder, og for å muliggjøre ethvert antall av telekonferansesteder å overlegge en peker til å peke i videobildet som kommer fra ethvert telekonferansested. Systemene og fremgangsmåtene anvender minst en videoavbildningsinnretning til å innfange bilder ved et lokalt telekonferansested og N displayenheter til å fremvise bilder innfanget ved de lokale og eksterne telekonferanse stedene, der N er det totale antallet telekonferansesteder omfattet av systemet. Teleinteraktivitets-kapabiliteten gjennomføres når et lokalt telekonferansested selektivt inngir videobildesignalet fra et eksternt telekonferansested til en pekeroverleggsinnretning i stedet for det lokale videobildet. Pekeroverleggsinnretningen overlegger så pekeren, og det overlagte videosignalet overføres til de eksterne telekonferansestedene. Til slutt, ved å tilveiebringe fjernkontrollmidler for aktivt å posisjonere US-5,444,476 A relates to systems and methods for performing teleinteractive videoconferencing between two or more teleconferencing locations, and for enabling any number of teleconferencing locations to superimpose a pointer to point in the video image coming from any teleconferencing location. The systems and methods use at least one video imaging device to capture images at a local teleconference location and N display units to display images captured at the local and remote teleconference locations, where N is the total number of teleconference locations covered by the system. The teleinteractivity capability is implemented when a local teleconferencing site selectively inputs the video image signal from a remote teleconferencing site to a pointer overlay device instead of the local video image. The pointer overlay device then overlays the pointer, and the overlay video signal is transmitted to the remote teleconference locations. Finally, by providing remote control means to actively position

videoavbildningsinnretningen ved et eksternt sted, kan posisjonen for den overlagte peker innen et fremvist bilde benyttes til å reposisjonere en the video imaging device at a remote location, the position of the superimposed pointer within a displayed image can be used to reposition a

videoavbildningsinnretning ved et eksternt telekonferansested. video imaging device at a remote teleconference location.

SAMMENFATNING SUMMARY

[0005] Den foreliggende fremleggelse beskriver et videokonferansesystem og tilhørende metodikk for bruk av utvidet virkelighet til å presentere utvidede bilder av presentasjonsmateriale på en måte som løser de ovenfor angitte problemer. Som sådan samler videokonferansesystemet datagenerert grafikk av presentasjonsmaterialet i sann-verden-omgivelser ved å innsette 2D- eller 3D-obj ekter i live-hovedkanal-video-feeden fra avbildningsenheten, for derved å kreve bare en videokanal og tilveiebringe en presentasjon, siden den vil bli utført lokalt, slik at eksterne publikumsmedlemmer kan se både presentatør og presentasjonsmateriale. Videre kan presentatøren gjøre bruk av øyekontakt og håndgester for aktivt å engasjere de eksterne publikumsmedlemmer. I tillegg kan de datamaskingenererte bilder av presentasjonsmaterialet festes til fysiske objekter, som tillater bildene å bli fysisk samvirket med, og/eller flyttet som om presentasjonen ble utført lokalt i et konferanserom. [0005] The present disclosure describes a video conferencing system and associated methodology for using augmented reality to present augmented images of presentation material in a manner that solves the above stated problems. As such, the video conferencing system collects computer-generated graphics of the presentation material in real-world environments by inserting 2D or 3D objects into the live main channel video feed from the imaging device, thereby requiring only one video channel and providing a presentation, since it will be performed locally, so that external audience members can see both the presenter and presentation material. Furthermore, the presenter can make use of eye contact and hand gestures to actively engage the external audience members. In addition, the computer-generated images of the presentation material can be attached to physical objects, allowing the images to be physically interacted with, and/or moved as if the presentation was performed locally in a conference room.

[0006] For å løse minst de ovenfor angitte problemer, vedrører den foreliggende fremleggelse en videokonferanseinnretning, tilknyttet metodikk og ikke-transitorisk datamaskinprogram for å presentere utvidede bilder som inkluderer minst ett grensesnitt, et nettverk og en datamaskinprosessor programmert til å motta første informasjon som identifiserer en scene via minst ett grensesnitt. Datamaskinens prosessor detekterer også om scenen inneholder minst en markør og identifiserer en plassering av hver detekterte markør innenfor scenen. Som respons på å bestemme at scenen inneholder en første markør, og basert på plasseringen av den første markør, utvider (eng.: augments) datamaskinprosessoren deretter den porsjon av scenen som inneholder den første markør med andre opplysninger. Datamaskinens prosessor overfører deretter den utvidede scenen til minst en ekstern enhet via nettverket. [0006] To solve at least the above problems, the present disclosure relates to a video conferencing device, associated methodology and non-transitory computer program for presenting extended images that includes at least one interface, a network and a computer processor programmed to receive first information that identifies a scene via at least one interface. The computer's processor also detects whether the scene contains at least one marker and identifies a location of each detected marker within the scene. In response to determining that the scene contains a first marker, and based on the location of the first marker, the computer processor then augments the portion of the scene containing the first marker with other information. The computer's processor then transmits the expanded scene to at least one external device via the network.

[0007] Den foregående beskrivelse har det formål generelt å presentere konteksten av fremleggelsen. Oppfinnerens arbeid, i den utstrekning det er beskrevet i denne bakgrunnsdelen, såvel som aspekter av den beskrivelse som ellers ikke kvalifiserer som teknikkens ved søknadens innlevering, er verken uttrykkelig eller implisitt tatt opp som teknikkens mot den foreliggende oppfinnelse.De foregående avsnitt har blitt gitt i form av generell innføring, og er ikke ment å begrense omfanget av de etterfølgende patentkrav. De beskrevne utførelsesformer, sammen med ytterligere fordeler, vil bli best forstått med henvisning til den følgende detaljerte beskrivelse i sammenheng med de ledsagende tegninger. [0007] The preceding description has the purpose of generally presenting the context of the presentation. The inventor's work, to the extent that it is described in this background section, as well as aspects of the description which otherwise do not qualify as state of the art at the time of the filing of the application, are neither expressly nor implicitly taken up as state of the art against the present invention. The preceding paragraphs have been given in form of general introduction, and is not intended to limit the scope of the subsequent patent claims. The described embodiments, together with additional advantages, will be best understood by reference to the following detailed description in conjunction with the accompanying drawings.

KORT BESKRIVELSE AV TEGNINGENE BRIEF DESCRIPTION OF THE DRAWINGS

[0008] En mer komplett forståelse av den foreliggende utvikling, og mange av medfølgende fordeler ved denne, vil lett oppnås, mens det samme blir bedre forstått, ved henvisning til den følgende detaljerte beskrivelse når den betraktes i forbindelse med de medfølgende tegninger. Imidlertid skal de vedlagte tegninger og deres eksemplariske avbildninger ikke på noen måte begrense omfanget av den utvikling som omfattes av beskrivelsen. Omfanget av den utvikling som omfattes av beskrivelse og tegninger defineres av ordene i de ledsagende patentkrav. [0008] A more complete understanding of the present development, and many of the accompanying advantages thereof, will be readily obtained, while the same will be better understood, by reference to the following detailed description when considered in conjunction with the accompanying drawings. However, the attached drawings and their exemplary depictions shall not in any way limit the scope of the development covered by the description. The extent of the development covered by the description and drawings is defined by the words in the accompanying patent claims.

[0009] Figur 1 er et skjematisk diagram av en videokonferansesystem for å presentere utvidede bilder i henhold til en eksempelvis utførelsesform; [0009] Figure 1 is a schematic diagram of a video conferencing system for presenting extended images according to an exemplary embodiment;

[0010] Figur 2 er et flytdiagram av et videokonferansesystem for å presentere utvidede bilder i henhold til en eksempelvis utførelsesform; [0010] Figure 2 is a flow diagram of a video conferencing system for presenting extended images according to an exemplary embodiment;

[0011] Figur 3 er et algoritmisk systemflytskjema for å presentere utvidede bilder via en videokonferanseinnretning ifølge en eksempelvis utførelsesform; [0011] Figure 3 is an algorithmic system flow chart for presenting extended images via a video conference device according to an exemplary embodiment;

[0012] Figur 4 er et illustrerende eksempel på et videokonferansemiljø for presentasjon av de utvidede bildene i henhold til en eksempelvis utførelsesform, og [0012] Figure 4 is an illustrative example of a video conference environment for presenting the extended images according to an exemplary embodiment, and

[0013] Figur 5 er et skjematisk diagram av en videokonferanseinnretning for presentasjon av de utvidede bildene i henhold til en eksempelvis utførelsesform. [0013] Figure 5 is a schematic diagram of a video conference device for presenting the extended images according to an exemplary embodiment.

DETALJERT BESKRIVELSE AV UTFØRELSESFORMER DETAILED DESCRIPTION OF EMBODIMENTS

[0014] Det henvises nå til tegningene, hvor like henvisningstall betegner identiske eller tilsvarende deler gjennom de forskjellige figurer. Den følgende beskrivelse vedrører en enhet og tilhørende metodikk for en videokonferanseinnretning for å presentere utvidede bilder. Videokonferanseinnretningen omfatter minst ett grensesnitt, et nettverk og en datamaskinprosessor programmert til å motta første informasjon som identifiserer en scene via ved minst ett grensesnitt. Datamaskinens prosessor detekterer også om scenen inneholder minst en markør og identifiserer en plassering av hver detektert markør innenfor scenen. Som respons på å bestemme at scenen inneholder en første markør, og basert på plasseringen av den første markør, utvider (eng.: augments) datamaskinprosessoren deretter det parti av scenen som inneholder den første markør, med andre opplysninger. Datamaskinens prosessor overfører deretter den utvidede scenen til minst en ekstern enhet via nettverket. [0014] Reference is now made to the drawings, where like reference numbers denote identical or corresponding parts throughout the various figures. The following description relates to a device and associated methodology for a video conferencing device for presenting extended images. The video conferencing device comprises at least one interface, a network and a computer processor programmed to receive first information identifying a scene via at least one interface. The computer's processor also detects whether the scene contains at least one marker and identifies a location of each detected marker within the scene. In response to determining that the scene contains a first marker, and based on the location of the first marker, the computer processor then augments the portion of the scene containing the first marker with other information. The computer's processor then transmits the expanded scene to at least one external device via the network.

[0015] Figur 1 er et skjematisk diagram av en videokonferansesystem for å presentere utvidede bilder i henhold til en eksempelvis utførelsesform. I figur 1 er en videokonferanseinnretning 2 forbundet til en server 4, en database 6, en mobil enhet 8 og en ekstern enhet 14 via et nettverk 10. Videokonferanseinnretningen 2 er også forbundet til en avbildningsenhet 12 og en PC 16. Serveren 4 representerer en eller flere servere tilknyttet videokonferanseinnretningen 2, databasen 6, den mobile enheten 8 og den eksterne enheten 14 via nettverket 10. Databasen 6 representerer en eller flere databaser tilknyttet videoenkonferanseenheten 2, serveren 4, den mobile enheten 8 og videokonferanseinnretningen 14 via nettverket 10. Den mobile enheten 8 representerer en eller flere mobile enheter som er forbundet til videokonferanseinnretningen 2, serveren 4, databasen 6 og den eksterne enheten 14 via nettverket 10. Den eksterne enheten 14 representerer en eller flere eksterne enheter som er forbundet til videokonferanseinnretningen 2, serveren 4, databasen 6 og den mobile enheten 8 via nettverket 10. Nettverket 10 representerer ett eller flere nettverk, slik som Internett, som forbinder videokonferanseinnretningen 2, serveren 4, databasen 6, den mobile enheten 8 og den eksterne enheten 14. [0015] Figure 1 is a schematic diagram of a video conferencing system for presenting extended images according to an exemplary embodiment. In Figure 1, a video conferencing device 2 is connected to a server 4, a database 6, a mobile device 8 and an external device 14 via a network 10. The video conferencing device 2 is also connected to an imaging device 12 and a PC 16. The server 4 represents a or several servers connected to the video conference device 2, the database 6, the mobile device 8 and the external device 14 via the network 10. The database 6 represents one or more databases connected to the video conference device 2, the server 4, the mobile device 8 and the video conference device 14 via the network 10. The mobile device 8 represents one or more mobile devices that are connected to the video conferencing device 2, the server 4, the database 6 and the external device 14 via the network 10. The external device 14 represents one or more external devices that are connected to the video conferencing device 2, the server 4, the database 6 and the mobile device 8 via the network 10. The network 10 represents one or more network, such as the Internet, connecting the video conferencing device 2, the server 4, the database 6, the mobile device 8 and the external device 14.

[0016] Videokonferanseinnretningen 2 mottar bilder av en omkringliggende scene fra avbildningsenheten 12 forbundet til videokonferanseinnretningen 2. Scenebilder kan være alle typer informasjon, for eksempel streaming video, opptatt av avbildningsenheten 12, men i konteksten av den foreliggende fremleggelse vedrører de miljøer hvori presentasjonen blir utført av presentatøren. Videokonferanseinnretningen 2 bestemmer så hvilken enhet, for eksempel PC 16 og/eller den mobile enheten 8, man kan oppnå presentasjonsmaterialet fra. Når videokonferanseinnretningen 2 har innhentet presentasjonsmaterialet, identifiserer video-konferanserenheten 2 om scenebildene innhentet fra avbildningsenheten 12 inneholder minst en markør. Så snart markøren er oppdaget, identifiserer videokonferanseinnretningen 2 en lokasjon for markøren innenfor scenebildene og utvider en porsjon av scenebildene som inneholder den første markør med videoinformasjon av presentasjonsmateriale mottatt fralokale enheter som datamaskinen 16 eller eksternt fra serveren 4 eller mobil enhet 8 via nettverk 10. Størrelsen og orienteringen til utvidelser avhenger av plasseringen av markør, slik at videokonferanseinnretningen utvider scenebildene med presentasjonsmateriale samtidig som likevel gir rom i scenen bildene for å vise presentatør.På dette punktet omfatter videoinformasjon på hovedvideokanalen video av presentatør innenfor scenebildene sammen med et utvidet bilde som inneholder presentasjonsmaterialet. Den utvidede videoinformasjonen overføres deretter til den eksterne enheten 14 for å fremvises til eksterne publikumsmedlemmer. [0016] The video conference device 2 receives images of a surrounding scene from the imaging unit 12 connected to the video conference device 2. Scene images can be all types of information, for example streaming video, captured by the imaging device 12, but in the context of the present presentation they relate to environments in which the presentation is carried out by the presenter. The video conference device 2 then determines which device, for example PC 16 and/or the mobile device 8, the presentation material can be obtained from. When the video conference device 2 has obtained the presentation material, the video conference unit 2 identifies whether the scene images obtained from the imaging unit 12 contain at least one marker. As soon as the marker is detected, the video conferencing device 2 identifies a location for the marker within the scene images and expands a portion of the scene images containing the first marker with video information of presentation material received from local devices such as the computer 16 or remotely from the server 4 or mobile device 8 via network 10. The size and the orientation of extensions depends on the location of the marker, so that the video conference device extends the stage images with presentation material while still leaving room in the stage for the images to show the presenter. At this point, video information on the main video channel includes video of the presenter within the stage images together with an extended image containing the presentation material . The extended video information is then transferred to the remote device 14 for display to remote audience members.

[0017] Den utvidede scene, inkludert både presentasjonsmaterialet og videoinformasjon av presentatør, inngår i en enkelt (hoved-) videokanal uten behov for den andre videokanalen. Dette tilveiebringer de fordeler at det mottakende endepunktsvideokonferansesystemet trenger ikke å bekymre seg for å håndtere flere videofeeds, og muliggjør derfor at eksternt publikum kan ha både enkle og komplekse videokonferansesystemer for å motta de samme presentasjoner. Bruken av én hovedvideokanal krever også mindre båndbredde enn å bruke to videokanaler, og åpner dermed for bedre forbindelser mellom endepunktene og video.av høyere kvalitet. [0017] The extended scene, including both the presentation material and video information of the presenter, is included in a single (main) video channel without the need for the second video channel. This provides the advantages that the receiving endpoint video conferencing system does not have to worry about handling multiple video feeds, and therefore enables remote audiences to have both simple and complex video conferencing systems to receive the same presentations. The use of one main video channel also requires less bandwidth than using two video channels, thus allowing for better connections between the endpoints and higher quality video.

[0018] Figur 2 er et flytdiagram av et videokonferansesystem for å presentere utvidede bilder i henhold til en eksempelvis utførelsesform. Datamaskinen 16, server 4, database 6 og videokonferanseinnretningen 2 i figur 1 er illustrert i figur 2, og derfor er like betegnelser gjentatt. I figur 2 er en flerhet av enheter 200 forbundet til den sendende endepunktsvideokonferanseinnretning 2 som er i sin tur forbundet til et mottakende endepunkt 212.Enhetene 200 overfører presentasjonsmateriell til videokonferanseinnretning 2 for å være inkludert som et virtuelt gjengitt utvidet bilde over en markør identifisert i scenebilder tatt opp på hovedvideokanalen av avbildningsenheten 12. Videokonferanserenheten 2 kan motta presentasjonsmateriale fra et dokumentkamera 202, PC 16,VCR- / DVD- / BLU-RAY- spiller 214, og/eller serveren 4. For eksempel kan dokumentkameraet 202 ta opp bildeinformasjon som overføres direkte til videokonferanseinnretningen 2 som presentasjonsmateriale. Dokumentkameraet 202 kan også være den mobile enheten 8, slik at informasjonen som er lagret i den mobile enheten 8 eller bilder tatt av den mobile enheten 8 kan overføres til videokonferanseinnretningen 2 som presentasjonsmateriale. PC 16,VCR / DVD / BLU-RAY spiller 214 og server 4 kan også tilveiebringe en hvilken som helst type presentasjonmateriale, som for eksempel Microsoft™ Power Point™-presentasjoner, Microsoft™ Word™-dokumenter eller ethvert annet presentasjonsmateriale slik det vil innses av en med vanlige kunnskaper innen faget. [0018] Figure 2 is a flow diagram of a video conferencing system for presenting extended images according to an exemplary embodiment. The computer 16, server 4, database 6 and video conferencing device 2 in Figure 1 are illustrated in Figure 2, and therefore similar designations are repeated. In Figure 2, a plurality of devices 200 are connected to the sending endpoint video conferencing device 2 which is in turn connected to a receiving endpoint 212. The devices 200 transmit presentation material to the video conferencing device 2 to be included as a virtually rendered extended image over a marker identified in scene images recorded on the main video channel by the imaging device 12. The video conferencing device 2 can receive presentation material from a document camera 202, PC 16, VCR / DVD / BLU-RAY player 214, and/or the server 4. For example, the document camera 202 can record image information that is transmitted directly to the video conference device 2 as presentation material. The document camera 202 can also be the mobile device 8, so that the information stored in the mobile device 8 or images taken by the mobile device 8 can be transferred to the video conference device 2 as presentation material. PC 16, VCR / DVD / BLU-RAY player 214 and server 4 may also provide any type of presentation material, such as Microsoft™ Power Point™ presentations, Microsoft™ Word™ documents or any other presentation material as will be realized by someone with common knowledge in the subject.

[0019] Videokonferanseinnretningen 2 innbefatter avbildningsenheten 12, en markørdeteksjonenhet 206, en virtuell-objektgjengivelsesenhet 209, en gestidentifikasjonsenhet 208 og en videotransmisjonsenhet 210. Avbildningsenheten 12 opptar live-streaming-video av scenebilder som en serie av rammer, som deretter sendes til markørdeteksjonsenheten 206. Markørdeteksjonsenheten 206 analyserer så scenebildene ramme for ramme, for å avgjøre om scenen inneholder noen markører. [0019] The video conference device 2 includes the imaging unit 12, a marker detection unit 206, a virtual object rendering unit 209, a gesture identification unit 208 and a video transmission unit 210. The imaging unit 12 captures live streaming video of scene images as a series of frames, which are then sent to the marker detection unit 206. The marker detection unit 206 then analyzes the scene images frame by frame to determine if the scene contains any markers.

[0020] For å sikre bedre gjenkjennelse av videokonferanseinnretningen 2, kan markørene være utformet med spesifikke mønstre slik at de er mer lett oppdages og ekstraheres av videokonferanseinnretningen 2. Markører kan utformes for å være klart identifiserbare inne i en betraktning av scenen, slik at de lett kan trekkes ut ved markørdeteksjonsenheten 206. Typer av markører innbefatter retro-reflekterende markører, trykte markører med unike mønstre som tillatermarkørdeteksjonsenheten 206 å avgjøre hvilken markør den ser, aktive markører som avgir lys for å skille seg bedre ut fra resten av scenebildene, eller en hvilken som helst annet mønster skjelnet fra scenebildene, slik det vil forstås av en fagmann innen teknikken. [0020] To ensure better recognition of the video conferencing device 2, the markers can be designed with specific patterns so that they are more easily detected and extracted by the video conferencing device 2. Markers can be designed to be clearly identifiable within a view of the scene, so that they can be easily extracted by the marker detection unit 206. Types of markers include retro-reflective markers, printed markers with unique patterns that allow the marker detection unit 206 to determine which marker it is seeing, active markers that emit light to better stand out from the rest of the scene images, or a any other pattern discerned from the scene images, as would be understood by one skilled in the art.

[0021] Når eventuelle potensielle markører er detektert av markørdeteksjonsenheten 206, beregner markørdeteksjonsenheten 206 referansepunkter for markørene i scenenbilderammene slik at virtuelle utvidelser deretter kan kalibreres geometrisk ved å definere plassering av merket i den virkelige verden med hensyn til markørreferansepunktene. Markørdeteksjonsenheten 206 beregner deretter avbildningsenheten er 12 positur i forhold til referansenpunktene, og sammen med kjennskap til utvidelsens kalibrering, beregner den forholdet mellom avbildningsenhetens positur og den for et virtuelt objekt som skal gjengis, slik som presentasjonsmaterialet. Dette forholdet muliggjør at det virtuelle objektet skal gjengis for derved å fremstå som festet til markøren. [0021] When any potential markers are detected by the marker detection unit 206, the marker detection unit 206 calculates reference points for the markers in the scene image frames so that virtual extensions can then be calibrated geometrically by defining the location of the marker in the real world with respect to the marker reference points. The marker detection unit 206 then calculates the imaging unit's 12 pose relative to the reference points and, together with knowledge of the extension's calibration, calculates the relationship between the imaging unit's pose and that of a virtual object to be rendered, such as the presentation material. This relationship makes it possible for the virtual object to be rendered so as to appear attached to the cursor.

[0022] Når markørene har blitt identifisert av markørdeteksjonsenheten 206, blir rammen input til gestidentifikasjonsenheten 208 for å identifisere om det er minst en gest av presentatøren som kan påvirke presentasjonen og/eller innholdet i presentasjonsmaterialet. Gestidentifikasjonsenheten 208 analyserer presentatørens positur i hver ramme for å avgjøre om presentatør har gitt en bestemt gest gjenkjent av gestidentifikasjonenhet 208. Hvis den spesifikke bevegelsen er gjenkjent av gestidentifikasjon enhet 208, overføres rammen til virtuelt-objektet-gjengivelsesenheten 209 slik at den kan bli gjengitt basert på gesten. Ellers er rammen input til virtuelt-objekt-gj engi vel sesenheten 209, og det originale, umodifiserte presentasjonsmaterialet gjengis. Mer informasjon med hensyn tilspesifikke gester gjenkjent av gestidentifikasjonsenheten 208 er tilveiebrakt nedenfor med hensyn til fig 4. [0022] When the markers have been identified by the marker detection unit 206, the frame is input to the gesture identification unit 208 to identify if there is at least one gesture by the presenter that can affect the presentation and/or the content of the presentation material. The gesture identification unit 208 analyzes the pose of the presenter in each frame to determine whether the presenter has given a particular gesture recognized by the gesture identification unit 208. If the specific movement is recognized by the gesture identification unit 208, the frame is transferred to the virtual object rendering unit 209 so that it can be rendered based on the gesture. Otherwise, the frame is input to the virtual-object-repeat viewer 209, and the original, unmodified presentation material is rendered. More information regarding specific gestures recognized by the gesture identification unit 208 is provided below with respect to FIG. 4.

[0023] I en eksempelvis utførelsesform av foreliggende oppfinnelse, er det virtuelle objekt gjengitt basert på hvilke enheter 200 som er forbundet til videokonferanseinnretningen 2, og hvilke av enhetene 200 presentatøren har valgt å aktivere. For eksempel kan presentasjonsmaterialet være innhentet fra serveren 4 via bruk av Quick Response- (QR-) koder innebygd i markører som ligger innenfor scenebildene. QR-koder er kodede bilder som inneholder binære data eller tekstdata, for eksempel en Uniform Resource Locator (URL), som når de dekodes kan brukes av videokonferanseinnretningen 2 for å hente presentasjonsmateriale. For eksempel når en markør oppdages av markørdeteksjonsenheten 206, og inneholder QR-koder, dekoder videokonferanseinnretningen 2 QR-koden for å oppnå URL-en. URL-en kan deretter brukes av videokonferanseinnretningen 2 for å oppnå presentasjonsmateriell fra serveren 4 via nettverket 10. Når [0023] In an exemplary embodiment of the present invention, the virtual object is rendered based on which units 200 are connected to the video conference device 2, and which of the units 200 the presenter has chosen to activate. For example, the presentation material can be obtained from the server 4 via the use of Quick Response (QR) codes embedded in markers located within the scene images. QR codes are coded images containing binary data or text data, such as a Uniform Resource Locator (URL), which when decoded can be used by the video conferencing device 2 to retrieve presentation material. For example, when a marker is detected by the marker detection unit 206 and contains QR codes, the video conference device 2 decodes the QR code to obtain the URL. The URL can then be used by the video conferencing device 2 to obtain presentation material from the server 4 via the network 10. When

videokonferanseinnretningen 2 mottar presentasjonsmaterialet, kan det gjengis av the video conference device 2 receives the presentation material, it can be reproduced by

virtuelt-objekt-gj engi vel sesenheten 209. For eksempel kunne URL representere presentatørens personlige nettside, der en gjeldende presentasjon opprettholdes i databasen 6 og innhentes fra serveren 4 via nettverket 10. Presentatøren kunne så bare oppdatere gjeldende presentasjon til enhver tid, og QR-kode som peker til presentatørens URL ville ikke endres, for derved å lette behovet for en ny markør for hver presentasjon. Imidlertid, siden bruk av QR-koder ikke krever at en ekstern enhet 200, slik som PC 16, er forbundet til videokonferanseinnretningen 2, kan kontroll-og/eller navigasjon av presentasjonsmaterialet oppnås med andre midler i tillegg til eller separat fra å bruke en fjernkontroll og/eller PC 16. Disse midler er beskrevet senere i nærmere detalj med hensyn til fig 4. virtual-object-repeat the viewing device 209. For example, the URL could represent the presenter's personal website, where a current presentation is maintained in the database 6 and obtained from the server 4 via the network 10. The presenter could then simply update the current presentation at any time, and QR- code pointing to the presenter's URL would not change, thereby alleviating the need for a new marker for each presentation. However, since the use of QR codes does not require an external device 200, such as the PC 16, to be connected to the video conferencing device 2, control and/or navigation of the presentation material can be achieved by other means in addition to or separate from using a remote control and/or PC 16. These means are described later in more detail with regard to fig 4.

[0024] Med QR-koder for å innhente presentasjonsmaterialet tilveiebringes en rekke fordeler. For det første er bare en del av maskinvaren, videokonferanseinnretning 2, nødvendig når presentasjonsmaterialet skal presenteres til eksterne publikumsmedlemmer. Dette reduserer mengden oppsettingstid som kreves når man forbereder en presentasjon, og eliminerer også risikoen for hardwareproblemer eller glitcher når man prøver å forbinde PC 16 til videokonferanserenheten 2 for å framskaffe presentasjonsmateriale. Videre vil presentatøren ikke lenger ha den fysiske byrde av å måtte flytte PC-en 16 som lagrer presentasjonsmateriale, og kan faktisk unngå den økonomiske byrden av å måtte kjøpe en PC 16 for videokonferanseinnretningen 2, siden presentasjonsmaterialet i sin helhet kan lagres eksternt. Bruken av QR-koder løser også problemer som ofte møtes med sikring av at oppløsningen på videoutgangen fra PCen 16 ikke er for høy. [0024] With QR codes to obtain the presentation material, a number of advantages are provided. First, only one part of the hardware, the video conferencing device 2, is needed when the presentation material is to be presented to external audience members. This reduces the amount of setup time required when preparing a presentation, and also eliminates the risk of hardware problems or glitches when trying to connect the PC 16 to the video conferencing device 2 to provide presentation material. Furthermore, the presenter will no longer have the physical burden of having to move the PC 16 that stores presentation material, and can actually avoid the financial burden of having to purchase a PC 16 for the video conferencing device 2, since the presentation material in its entirety can be stored externally. The use of QR codes also solves problems that are often encountered with ensuring that the resolution of the video output from the PC 16 is not too high.

[0025] Presentasjonsmateriale, for eksempel en Power Point™-presentasjon, kan også oppnås ved å tillate PC 16 eller noen av de andre enhetene 200 å lagre Power Point™-presentasjon. Hvis presentatøren velger PC 16, mottar virtuelt-objekt-gj engi vel sesenheten 209 videorammen fra PC 16's video-inngang, og deretter gjengis rammen innenfor hovedvideorammen, idet det virtuelle objektet utnytter hovedvideokanalen. Denne gjengivelsen utføres ved hjelp av 3D-grafikk-gj engi vel sesbibliotek, for eksempel OpenGL eller DirectX, siden videorammen må gjengis som en overflate på et flatt polygon i et tredimensjonalt miljø for å tillate fysisk manipulering av det virtuelle objektet som om det var ekte. Dette setter presentatøren i stand til faktisk å plukke opp markørene og flytte dem rundt i scenebildene akkurat som hun ville hvis hun ga en presentasjon til et lokalt publikum. [0025] Presentation material, for example a Power Point™ presentation, can also be obtained by allowing the PC 16 or any of the other devices 200 to store the Power Point™ presentation. If the presenter selects PC 16, the virtual object renderer 209 receives the video frame from PC 16's video input, and then renders the frame within the main video frame, with the virtual object utilizing the main video channel. This rendering is performed using 3D graphics rendering libraries, such as OpenGL or DirectX, since the video frame must be rendered as a surface of a flat polygon in a three-dimensional environment to allow physical manipulation of the virtual object as if it were real. . This enables the presenter to actually pick up the markers and move them around the stage images just as she would if she were giving a presentation to a local audience.

[0026] Muligheten til å flytte markøren som inneholder presentasjonsmateriale som om det var en ekte presentasjon, forsyner også presentatøren med evne til å inkludere informasjon om "baksiden" av det virtuelt gjengitte presentasjonsmaterialet. Med andre ord kan presentasjonsmaterialet produsert som et virtuelt gjengitt, utvidet bilde på markøren, av virtuelt-objekt-gjengivelsesenheten 209, inkludere ulik informasjon på "baksiden" av utvidet bilde, som kan sees av eksterne publikumsmedlemmer når markøren er snudd av presentatøren. For eksempel, kan informasjonen på baksiden omfatte biografisk informasjon om presentatør eller ytterligere informasjon knyttet til presentasjonsmaterialet, for eksempel en nettside eller et bilde som taleren for tiden diskuterer i presentasjonen. Denne informasjon kan også tilveiebringes til videokonferanseinnretningen 2 via en annen ekstern enhet 200. Således kan presentasjonsmaterialet på framsiden av utvidet bilde være mottatt av videokonferanseinnretningen 2 fra PC 16, mens informasjonen på baksiden av presentasjonsmaterialet kan gis av VCR / DVD / BLU-RAY 204. [0026] The ability to move the cursor containing presentation material as if it were a real presentation also provides the presenter with the ability to include information about the "backside" of the virtually rendered presentation material. In other words, the presentation material produced as a virtually rendered expanded image on the cursor by the virtual object rendering unit 209 may include different information on the "backside" of the expanded image, which can be viewed by external audience members when the cursor is turned by the presenter. For example, the information on the back cover may include biographical information about the presenter or additional information related to the presentation material, such as a website or image that the speaker is currently discussing in the presentation. This information can also be provided to the video conference device 2 via another external device 200. Thus, the presentation material on the front side of the extended image can be received by the video conference device 2 from the PC 16, while the information on the back of the presentation material can be provided by the VCR / DVD / BLU-RAY 204.

[0027] Tilbake til diskusjon av figur 2, når presentasjonsmateriale har blitt gjengitt av virtuelt-objekt-gj engi vel sesenheten 209 som et virtuelt objekt innenfor hovedvideorammen via markøren detektert av markørdeteksjonsenheten 206, blir rammen input til videotransmisjonsenhet 210 som overfører rammen inkludert både presentatør og presentasjonsmateriale til mottakende endepunkt 212 via videomottaksenheten 214. Videomottaksenheten 214 overfører deretter video som skal fremvises til eksterne publikumsmedlemmer via skjerm 216. [0027] Returning to the discussion of Figure 2, once presentation material has been rendered by the virtual object renderer 209 as a virtual object within the main video frame via the marker detected by the marker detection unit 206, the frame is input to the video transmission unit 210 which transmits the frame including both presenter and presentation material to receiving endpoint 212 via video receiving device 214. Video receiving device 214 then transmits video to be shown to external audience members via screen 216.

[0028] Som kan sees fra figur 2, er en annen fordel tilveiebrakt ved oppfinnelsen at det mottakende endepunkt-videokonferansesystemet ikke må endres på noen måte for å vise videoinformasjon som inneholder presentatør og presentasjonsmateriale mottatt fra videokonferanseinnretning 2. Derfor behøver ikke entiteter eller individuelle brukere er å kjøpe ekstra videokonferanseutstyr når de kommuniserer med videokonferanseinnretningen 2. [0028] As can be seen from Figure 2, another advantage provided by the invention is that the receiving endpoint video conferencing system does not have to be modified in any way to display video information containing presenter and presentation material received from video conferencing device 2. Therefore, entities or individual users do not need is to buy additional video conferencing equipment when they communicate with the video conferencing device 2.

[0029] Figur 3 er et algoritmisk systemflytdiagram for å presentere utvidede bilder via videokonferanseinnretningen 2 ifølge en eksempelvis utførelsesform. Ved trinnet S300 mottar videokonferanseinnretningen 2 videoinformasjon av scenebilder fra avbildningsenheten 12 og presentasjonsmateriale fra en av de eksterne enheter 200. Videokonferanseinnretningen 2 behandler via markørdeteksjonsenheten 206 deretter videoinformasjonen for å detektere om det er minst en markør innenfor scenebildene mottatt fra avbildningsenheten 12 i trinn S302. Ved trinn S304, hvis ingen markører registreres innenfor scenenbildene mottatt fra avbildningsenheten 12, overfører videokonferanseinnretningen 2 bare scenebildene til den eksterne enheten 14 i trinn S305, slik at presentatør vil være det eneste de eksterne publikumsmedlemmer vil se. Prosesseringen returnerer deretter til å motta ytterligere scenebilder ved trinn S300. [0029] Figure 3 is an algorithmic system flow diagram for presenting extended images via the video conference device 2 according to an exemplary embodiment. At step S300, the video conference device 2 receives video information of scene images from the imaging unit 12 and presentation material from one of the external devices 200. The video conference device 2 then processes the video information via the marker detection unit 206 to detect whether there is at least one marker within the scene images received from the imaging unit 12 in step S302. At step S304, if no markers are registered within the scene images received from the imaging device 12, the video conference device 2 only transmits the scene images to the remote device 14 at step S305, so that the presenter will be the only thing the remote audience members will see. Processing then returns to receiving additional scene images at step S300.

[0030] Hvis det er minst én markør identifisert innenfor scenebildene, bestemmer videokonferanseinnretningen 2 via markørdeteksjonsenheten 206 ved trinn S303 plasseringen av markøren innenfor scenebildene slik at størrelsen og orienteringen for presentasjonsmaterialet som skal gjengis virtuelt, fortsattetterlater plass i scenebildene for å vise presentatøren. Det bør bemerkes at presentatøren kan starte en presentasjon uten bruk av en markør, for derved å sende video bare av seg selv til eksterne publikumsmedlemmer for å få full oppmerksomheten hos de eksterne publikumsmedlemmer når denne gjør åpningstale. Etter ferdigstillelse av åpningstale kan presentatøren deretter bringe en markør i scene bildene for sette eksterne publikumsmedlemmer i stand til å se presentasjonsmateriale. [0030] If there is at least one marker identified within the scene images, the video conference device 2 determines via the marker detection unit 206 at step S303 the location of the marker within the scene images so that the size and orientation of the presentation material to be rendered virtually still leaves room in the scene images to show the presenter. It should be noted that the presenter can start a presentation without the use of a marker, thereby sending video only of himself to external audience members in order to get the full attention of the external audience members when he makes the opening speech. After completing the opening speech, the presenter can then bring a cursor into the stage images to enable remote audience members to view presentation material.

[0031] Etter identifisering av plasseringen av hver detekterte markør ved trinn S303, bestemmer gestidentifiseringsenheten 208 så om minst en gest har blitt identifisert fra presentatøren ved trinn S306. Hvis ingen gester er gjenkjent av gestgjenkjennelsesenhet 208, gjengir virtuelt-objektet-gjengivelsesenheten 209 det opprinnelig mottatte presentasjonsmateriale inn i scenebildene ved trinn S308. Videokonferanseinnretningen 2 overfører deretter utvidede scenebilder til den eksterne enheten 14 ved trinn S312. Prosesseringen fortsetter deretter til trinn S314 for å fastslå om presentasjonen er over. Dersom presentasjonen er ikke over, fortsetter deretter prosesseringen tilbake til trinn S300 for å motta flere rammer, og hvis presentasjonen er over, så blir overføring av videoinformasjon fra videokonferanseinnretningen 2 til den ytre enhet 14 avsluttet ved trinn S316. Tilbake til trinn S306, hvis videokonferanseinnretningen 2 gjenkjenner gester fra presentatøren, blir presentasjonsmaterialet endret ved trinn S310 basert på gestene identifisert ved trinn S306 og det modifiserte presentasjonsmaterialet blir gjengitt inn i scenebildene. Videoinformasjonen som inneholder utvidede scenebilder overføres deretter til den eksterne enheten 14 ved trinn S312. Prosesseringen fortsetter så til trinn S314 for å avgjøre om presentasjonen er over. Dersom presentasjonen ikke er over, fortsetter deretter prosesseringen tilbake til trinn S300 for å motta ytterligere rammer, og hvis presentasjonen er over, så avsluttes overføring av bildeinformasjon fra videokonferanserenhet 2 til den eksterne enheten 14 ved trinn S316. Derfor kan en presentasjon endres basert på handlingene til presentatør, slik at presentatøren kan innføre en markør inn i scenebildene når han spesielt diskuterer presentasjonsmaterialet, og fjerne markøren fra scenebildene når han ønsker full oppmerksomhet fra de eksterne publikumsmedlemmer. [0031] After identifying the location of each detected marker at step S303, the gesture identification unit 208 then determines whether at least one gesture has been identified from the presenter at step S306. If no gestures are recognized by the gesture recognition unit 208, the virtual object rendering unit 209 renders the originally received presentation material into the scene images at step S308. The video conference device 2 then transmits expanded scene images to the external device 14 at step S312. Processing then proceeds to step S314 to determine whether the presentation is over. If the presentation is not over, processing then continues back to step S300 to receive more frames, and if the presentation is over, then transmission of video information from the video conference device 2 to the external unit 14 is terminated at step S316. Returning to step S306, if the video conference device 2 recognizes gestures from the presenter, the presentation material is modified at step S310 based on the gestures identified at step S306 and the modified presentation material is rendered into the scene images. The video information containing expanded scene images is then transferred to the external device 14 at step S312. Processing then continues to step S314 to determine whether the presentation is over. If the presentation is not over, processing then continues back to step S300 to receive additional frames, and if the presentation is over, then transfer of image information from the video conference unit 2 to the external unit 14 is terminated at step S316. Therefore, a presentation can be changed based on the actions of the presenter, so that the presenter can introduce a marker into the stage images when he is specifically discussing the presentation material, and remove the marker from the stage images when he wants full attention from the external audience members.

[0032] Det bør bemerkes at hvis mer enn én markør oppdages, velger videokonferanserenhet 2 en "presentasjons-" markør i scenebildene for å vise videoinformasjon som inneholder presentasjonsmaterialet innhentet fra en av enhetene 200. Som sådan, ved valg av en markør for å bli brukt for den virtuelle gjengivelse av presentasjonsmaterialet, plukker videokonferanseinnretningen basert på mønsteret identifisert på selve markøren. [0032] It should be noted that if more than one marker is detected, video conferencing unit 2 selects a "presentation" marker in the scene images to display video information containing the presentation material obtained from one of the units 200. As such, upon selection of a marker to be used for the virtual rendering of the presentation material, the video conferencing device picks based on the pattern identified on the marker itself.

[0033] Figur 4 er et illustrerende eksempel på et videokonferansemiljø for presentasjon av utvidede bilder i henhold til en eksempelvis utførelsesform. I figur 4 er en presentatør 46 avbildet stående i scenebildene 49 registrert av avbildningsenheten 12 sammen med det virtuelt gjengitte presentasjonsmaterialet 40 som er utvidet (eng.: augmented) inn i scenebildene av virtuelt-objekt-gjengivelsesenheten 209 via presentasjonsmarkøren 42 plassert på bordet 48. En ytterligere markør 44 plassert på bordet 48 er også avbildet i scenebildene 49. [0033] Figure 4 is an illustrative example of a video conference environment for the presentation of extended images according to an exemplary embodiment. In Figure 4, a presenter 46 is depicted standing in the scene images 49 registered by the imaging unit 12 together with the virtually rendered presentation material 40 which is augmented into the scene images by the virtual object rendering unit 209 via the presentation marker 42 placed on the table 48. A further marker 44 placed on the table 48 is also depicted in the scene images 49.

[0034] Som nevnt tidligere og ifølge en eksempelutførelsesform, har presentatør 46 evnen, via gestidentifiseringsenheten 208, til å forandre både utseendet på presentasjonsmaterialet 40 og innholdet i presentasjonsmaterialet 40 uten å kreve bruk av en forbundet enhet 200 eller en fjernkontroll. Ved for eksempel å bruke to markører kan presentatøren svitsje videoinformasjonen sendt til mottakerens endepunkt 212 slik at presentasjonsmaterialet 40 er den eneste informasjonen som blir overført til skjermen 216 via videomottaksenheten 214. Denne fullskjerm- transisjonseffekten kan oppnås ved å flytte presentasjonsmarkør 42 til i umiddelbar nærhet av eller innenfor den porsjonen av scenebildene 49 som er okkupert av den ekstra markøren 44. Hvis presentatøren 46 deretter flytter presentasjonsmarkør 42 tilbake bort fra den ekstra markør 44 slik at de ikke lenger er i umiddelbar nærhet, vil videoinformasjon av scenebildene 49 returnere tilbake til fremvisning av det virtuelt gjengitte presentasjonsmaterialet 40 sammen med presentatøren 46 og eventuell annen informasjon som vises i rammen av avbildningsenheten 12. Det virtuelt gjengitte presentasjonsmaterialet 40 kan også være virtuelt økt i størrelse ettersom presentatøren 46 flytter presentasjonsmarkøren 42 nærmere den ekstra markøren 44, og likeledes kan presentasjonsmaterialet 40 virtuelt reduseres i størrelse ettersom presentatøren 46 beveger presentasjonenmarkøren lenger unna den ekstra markøren 44. [0034] As mentioned earlier and according to an example embodiment, the presenter 46 has the ability, via the gesture identification unit 208, to change both the appearance of the presentation material 40 and the content of the presentation material 40 without requiring the use of a connected device 200 or a remote control. For example, by using two cursors, the presenter can switch the video information sent to the receiver endpoint 212 so that the presentation material 40 is the only information transferred to the screen 216 via the video receiving device 214. This full-screen transition effect can be achieved by moving the presentation cursor 42 to the immediate vicinity of or within the portion of the scene images 49 occupied by the additional marker 44. If the presenter 46 then moves presentation marker 42 back away from the additional marker 44 so that they are no longer in close proximity, video information of the scene images 49 will return back to display of the virtually rendered presentation material 40 together with the presenter 46 and any other information displayed in the frame of the imaging unit 12. The virtually rendered presentation material 40 may also be virtually increased in size as the presenter 46 moves the presentation marker 42 closer to the additional marker n 44 , and likewise, the presentation material 40 may be virtually reduced in size as the presenter 46 moves the presentation marker further away from the additional marker 44 .

[0035] I tillegg til å endre størrelsen på presentasjonsmaterialet 40 innenfor scenebildene 49, kan presentatøren 46 også endre innholdet i presentasjonsmaterialet 40 gjennom en rekke håndgester. Som diskutert tidligere, kan QR-koder bli dekodet av videokonferansesystemet for å laste ned presentasjonsmaterialet 40 fra en ekstern plassering via nettverket 10. Ved bruk av QR-koder er ingen ekstern enhet 200 nødvendig for å være forbundet til videokonferansesystemet 2, og derfor kontrollerer eller endrer presentatøren 46 presentasjonsmaterialet 40 ved hjelp håndgester. Derfor kan presentatøren bruke håndgester for å styre forsiden av presentasjonen eller baksiden av presentasjonen basert på hvilken type informasjon som presentatøren ønsker å diskutere. For eksempel, og som diskutert tidligere, kan presentatøren flytte markøren for overgang fra forsiden av presentasjonmaterialet til baksiden av presentasjonsmaterialet, som kan omfatte biografisk informasjon om presentatør eller ytterligere informasjon knyttet til presentasjonsmaterialet, for eksempel en nettside eller et bilde som taleren for tiden diskuterer i presentasjonen.Videre kan presentatøren 46 utføre en sveipende bevegelse mot høyre over det virtuelt fremviste presentasjonsmateriale 40 for å gå over til et annet sett av informasjon som det neste presentasjonslysbilde. Omvendt kan presentatøren 46 utføre en sveipende bevegelse mot venstre over det virtuelt fremviste presentasjonsmaterialet 40 for å gå over til et tidligere presentasjonslysbilde. Presentasjonsmaterialet 40 kan også være animert for å tilveiebringe utseende av at det glir ut av skjermen til overgangen til det nye materialet kommer på skjermen, i forbindelse med bevegelse av presentatørens 46 sveipende bevegelse. [0035] In addition to changing the size of the presentation material 40 within the scene images 49, the presenter 46 can also change the content of the presentation material 40 through a series of hand gestures. As discussed earlier, QR codes can be decoded by the video conferencing system to download the presentation material 40 from a remote location via the network 10. Using QR codes, no external device 200 is required to be connected to the video conferencing system 2, and therefore controls or the presenter 46 changes the presentation material 40 using hand gestures. Therefore, the presenter can use hand gestures to control the front of the presentation or the back of the presentation based on the type of information the presenter wants to discuss. For example, and as discussed earlier, the presenter may move the cursor to transition from the front of the presentation material to the back of the presentation material, which may include biographical information about the presenter or additional information related to the presentation material, such as a web page or image that the speaker is currently discussing in the presentation.Furthermore, the presenter 46 can perform a sweeping movement to the right over the virtually presented presentation material 40 to move to another set of information such as the next presentation slide. Conversely, the presenter 46 can perform a sweeping movement to the left over the virtually displayed presentation material 40 to go to a previous presentation slide. The presentation material 40 may also be animated to provide the appearance of sliding off the screen until the transition to the new material appears on the screen, in conjunction with the movement of the presenter's 46 sweeping motion.

[0036] Presentatøren 46 kan også utføre en ekspanderende bevegelse ved å bevege begge hendene i en ytre retning mens de er over presentasjonsmaterialet 40 for å gjøre presentasjonsmaterialet 40 fullskjerm med hensyn til scenebildene 49 overført i videoinformasjon til den eksterne enheten 14. Presentatør 46 kan også utføre en nedadgående sveipende bevegelse for å fremvise en liste over forskjellige filer og/eller lysbildefremvisninger tilgjengelig for å være utvidet over presentasjonsmarkør 42 i scenebildene 49. Når listen over filer og/eller lysbildefremvisninger vises, kan presentatøren 46 peke til en bestemt fil og/eller lysbildefremvisning som deretter lastes av videokonferansesystemet 2 og utvides inn på scenebildene 49 over presentasjonsmarkøren 42 som nytt presentasjonsmateriale 40. Videre kan presentatøren 46 utføre en utheving av et bestemt innhold i presentasjonsmaterialet 40 eller zoome inn eller zoome ut presentasjonsmateriale 40 ved å peke på en bestemt del av lysbildet og opprettholde denne positur i en forhåndsbestemt tidsperiode. Den forhåndsbestemte tid kan settes av presentatør 46 slik at videokonferanseinnretning 2 ikke markerer eller zoomer inntil den gjenkjenner at presentatør 46 ønsker å fremheve eller zoome. [0036] The presenter 46 can also perform an expanding movement by moving both hands in an outward direction while over the presentation material 40 to make the presentation material 40 full screen with respect to the scene images 49 transmitted in video information to the external device 14. The presenter 46 can also perform a downward swiping motion to display a list of various files and/or slideshows available to be expanded over presentation marker 42 in the stage images 49. When the list of files and/or slideshows is displayed, the presenter 46 can point to a particular file and/or slide show which is then loaded by the video conference system 2 and expanded onto the stage images 49 above the presentation marker 42 as new presentation material 40. Furthermore, the presenter 46 can highlight a specific content in the presentation material 40 or zoom in or zoom out presentation material 40 by pointing to a specific part of the slide and maintain this pose for a predetermined period of time. The predetermined time can be set by the presenter 46 so that the video conference device 2 does not highlight or zoom until it recognizes that the presenter 46 wants to highlight or zoom.

[0037] En annen funksjon som tilbys av videokonferansesystemet 2 er muligheten til å presentere tredimensjonale objekter i scenebildene slik at eksterne publikumsmedlemmer som ser scenebildene 49 via den eksterne enheten 14, kan få en bedre oversikt over innholdet som blir diskutert i presentasjonsmaterialet 40. Hvis for eksempel presentatøren 46 presenterer planer for en ny offshore-oljeplattform, kan presentatøren på et tidspunkt under presentasjonen ønske å vise en faktisk tre-dimensjonal modell av offshore-oljeplattformen. For å skape denne effekten, kan presentatøren 46 utføre en oppadgående sveipebevegelse på et presentasjonslysbilde som inneholder et bilde som presentatøren diskuterer for å fjerne presentasjonslysbildet, ved ikke lenger virtuelt å gjengi informasjonen over presentasjonsmarkøren 42. En tredimensjonal versjon av et bilde som tidligere var inneholdt innenfor det fjernede presentasjonsmaterialet 40, for eksempel en VRML-fil, blir deretter virtuelt gjengitt av virtuelt-objekt-gj engi vel sesenheten 209 på presentasjonsmarkøren 42 på samme sted som bildet tidligere inneholdt det fjernede presentasjonsmateriale. Den tredimensjonale versjon av bildet kan også være virtuelt gjengitt på den ekstra markøren 44. De nylig utvidede tredimensjonale scenebildene blir deretter overført til den eksterne enheten 14 for å bli sett av de eksterne publikumsmedlemmer. Presentatør 46 har da muligheten til å flytte, akkurat som hun gjorde med presentasjonsmaterialet 40, det tredimensjonale objektet gjengitt på presentasjonsmarkøren 42 eller flere markører 44 rundt i scenebildene for bedre å forklare konstruksjonen og utformingen av oljeplattformen. Denne typen presentasjon gir de eksterne publikumsmedlemmer en bedre oversikt over hva presentatøren 46 opprinnelig beskrev med hensyn til det to-dimensjonale bildet som tidligere ble vist i presentasjonsmaterialet 40 før det ble fjernet med den oppadgående, sveipende bevegelse. Presentatøren 46 kan deretter utføre en sveipende ned-bevegelse som vil føre til at virtuelt-objekt-gj engi vel sesenheten 209 fjerner det tre-dimensjonale bilde av oljeplattformen og gjeninnsetter det tidligere fjernede presentasjonsmaterialet 40 inneholdende det to-dimensjonale bilde av oljeplattformen. [0037] Another function offered by the video conference system 2 is the possibility to present three-dimensional objects in the stage images so that external audience members who see the stage images 49 via the external device 14 can get a better overview of the content that is discussed in the presentation material 40. If for example the presenter 46 presents plans for a new offshore oil platform, the presenter may at some point during the presentation wish to show an actual three-dimensional model of the offshore oil platform. To create this effect, the presenter 46 can perform an upward swipe on a presentation slide containing an image the presenter is discussing to remove the presentation slide, no longer virtually rendering the information above the presentation marker 42. A three-dimensional version of an image previously contained within the removed presentation material 40, for example a VRML file, is then virtually rendered by the virtual object rendering unit 209 on the presentation marker 42 at the same location as the image previously contained the removed presentation material. The three-dimensional version of the image may also be virtually rendered on the additional marker 44. The newly expanded three-dimensional scene images are then transmitted to the remote device 14 to be viewed by the remote audience members. Presenter 46 then has the opportunity to move, just as she did with the presentation material 40, the three-dimensional object rendered on the presentation marker 42 or several markers 44 around the scene images to better explain the construction and design of the oil platform. This type of presentation gives the external audience members a better overview of what the presenter 46 originally described with respect to the two-dimensional image that was previously shown in the presentation material 40 before it was removed with the upward sweeping motion. The presenter 46 can then perform a sweeping down movement which will cause the virtual object-resolve viewing unit 209 to remove the three-dimensional image of the oil platform and reinsert the previously removed presentation material 40 containing the two-dimensional image of the oil platform.

[0038] En opp- eller nedsveipingsgest av presentatøren kan også mappes til svitsjing av scenebildene for å vise hva som vises på en PC 16 eller mobil enhet 8 forbundet til videokonferanseinnretningen 2. En motsatt sveip-gest av det som kreves for å svitsje scenebildene til den tilkoblede eksterne enheten 200 kan deretter bli anvendt for å gå tilbake til at presentasjonsmaterialet blir utvidet innenfor scenebildene. Dette setter en presentatør i stand til å gjøre rask og sømløs transisjon til alle typer programvare og/eller program som kan kjøres på PC 16 eller mobil enhet 8 uten å avbryte flyten av presentasjonen. [0038] An up or down swipe gesture by the presenter can also be mapped to switching the scene images to show what is displayed on a PC 16 or mobile device 8 connected to the video conference device 2. An opposite swipe gesture of what is required to switch the scene images to the connected external device 200 can then be used to return to the presentation material being expanded within the scene images. This enables a presenter to quickly and seamlessly transition to any type of software and/or program that can be run on the PC 16 or mobile device 8 without interrupting the flow of the presentation.

[0039] Selvfølgelig er mange modifikasjoner og variasjoner av gest-funksjonene beskrevet ovenfor mulig i lys av den ovennevnte lære. Slk det vil forstås av en med vanlige kunnskaper innen teknikken, kan de ovenfor angitte gest-funksj oner derfor bli praktisert på annen måte enn det som er spesielt beskrevet her. Som sådan, kan de ulike sveipende bevegelser bli gjenkjent av videokonferanseinnretningen 2 for å utføre forskjellige funksjoner fra de som er listet ovenfor.For eksempel kan den oppadgående sveipebevegelse brukes til å vise en liste over forskjellige filer og/eller lysbildefremvisninger, og nedover sveipebevegelse kan brukes til å vise tredimensjonale gjenstander. [0039] Of course, many modifications and variations of the gesture functions described above are possible in light of the above teachings. As it will be understood by someone with ordinary knowledge in the field of technology, the gesture functions indicated above can therefore be practiced in a different way than what is specifically described here. As such, the various swiping gestures can be recognized by the video conferencing device 2 to perform different functions from those listed above. For example, the upward swipe gesture can be used to display a list of different files and/or slide shows, and the downward swipe gesture can be used to display three-dimensional objects.

[0040] Når presentatøren 46 utfører de ulike bevegelser, må [0040] When the presenter 46 performs the various movements, must

videokonferanseinnretningen 2 via gestidentifikasjonsenheten 208 avgjøre om presentatøren 46 faktisk har tenkt å endre presentasjonsmaterialet 40. For eksempel kan bevegelse i bakgrunn av scenebildene 49 av noe annet enn presentatøren 46 føre til at gestidentifikasjonsenheten 208 feilatig detekterer en gest av presentatør, og dette kan føre til at virtuelt-objekt-gjengivelsesenheten 209 gjengi presentasjonen på en måte som er uventet for presentatøren. Derfor, i en utførelsesform av foreliggende oppfinnelse, bestemmer gestidentifikasjonenheten 208 en posisjon og/eller dybde ved hvilken gesten av presentatøren 46 overstiger grenseplanene for presentasjonsmaterialet 40. For eksempel kan gestidentifikasjonsenhet 208 virtuelt måle lengde, høyde og bredde på virtuelt gjengitt presentasjonsmateriale og fastslå at bare gester som forløper inn i grensen av presentasjonsmaterialet 40 innenfor scenebildene 49 som overstiger en viss lengde, høydeog/eller bredde, forårsaker endringer i presentasjonsmaterialet 40. the video conference device 2 via the gesture identification unit 208 determine whether the presenter 46 actually intends to change the presentation material 40. For example, movement in the background of the scene images 49 by something other than the presenter 46 can cause the gesture identification unit 208 to mistakenly detect a gesture by the presenter, and this can lead to the virtual-object rendering unit 209 renders the presentation in a manner unexpected by the presenter. Therefore, in one embodiment of the present invention, the gesture identification unit 208 determines a position and/or depth at which the gesture of the presenter 46 exceeds the boundary planes of the presentation material 40. For example, the gesture identification unit 208 can virtually measure the length, height, and width of virtually rendered presentation material and determine that only gestures that extend into the boundary of the presentation material 40 within the scene images 49 that exceed a certain length, height and/or width cause changes in the presentation material 40.

[0041] Alternativt kan videokonferanseinnretningen 2 lokalisere presentatøren innenfra scenebildene ved å registrere ansikter i scenen og overvåke hvilke av de detekterte ansiktene som har bevegelige lepper. Når presentatøren er identifisert, kan hendene bli plassert og spores for dermed å sikre at kun håndbevegelser av taleren brukes for gestkontroll av presentasjonen. I en annen utførelsesform, kan identiteten til presentatøren kodes til en QR-kodebasert markør (dvs. navnet John Doe kunne kodes til markøren) slik at videokonferanseinnretningen 2 kunne laste et ansiktsbilde tilsvarende QR-ID-info og identifisere taleren ved hjelp av et ansiktsgjenkjenningssystem. Derfor kan videokonferanseinnretningen identifisere John Doe blant alle ansikter til stede i scenen, og dermed bare overvåke håndbevegelser utført av John Doe ved etablering av gest-baserte handlinger utført under presentasjonen. [0041] Alternatively, the video conference device 2 can locate the presenter from within the scene images by registering faces in the scene and monitoring which of the detected faces have moving lips. Once the presenter is identified, the hands can be positioned and tracked to ensure that only hand movements of the speaker are used for gesture control of the presentation. In another embodiment, the identity of the presenter could be encoded to a QR code-based marker (ie the name John Doe could be encoded to the marker) so that the video conferencing device 2 could load a face image corresponding to the QR-ID info and identify the speaker using a facial recognition system. Therefore, the video conferencing device can identify John Doe among all faces present in the scene, and thus only monitor hand movements performed by John Doe when establishing gesture-based actions performed during the presentation.

[0042] En annen viktig funksjon i videokonferanseinnretning 2 via virtuelt-objektgjengivelsesenheten 209 er evnen til å tillate virkelige objekter, for eksempel presentatøren 46, å skjule virtuelle objekter som presentasjonsmaterialet 40 eller et tre-dimensjonalt objekt gjengitt for å unngå at presentasjonsmateriale 40 blir virtuelt gjengitt i forgrunnen av noe reelt objekt i scenebildene 49. For eksempel, hvis presentatøren 46 legger armen ut foran presentasjonsmaterialet 40 vist i scenbildene 49, skal deretter presentatørens 46 arm blokkere en del av presentasjonsmaterialet 40 dekket av presentatørens 46 arm. Derfor bestemmer videokonferanseinnretning 2 den tredimensjonale struktur av scenen, for eksempel dybden av reelle objekter fra en gitt synsvinkel,for korrekt å skape de riktige virkelig-objekt-okklusjoner i scenebildene 49. Alternativt kan en dybdeavbildningsenhet brukes istedenfor eller i tillegg til avbildningenheten 12 for å skape dybdebilder av scenebildene 49 og dermed muliggjøre virtuelle objektokklusjon. [0042] Another important function of the video conferencing device 2 via the virtual object rendering unit 209 is the ability to allow real objects, for example the presenter 46, to hide virtual objects such as the presentation material 40 or a three-dimensional object rendered to avoid that the presentation material 40 becomes virtual rendered in the foreground by some real object in the scene images 49. For example, if the presenter 46 puts his arm out in front of the presentation material 40 shown in the scene images 49, then the presenter's 46 arm should block a part of the presentation material 40 covered by the presenter's 46 arm. Therefore, video conferencing device 2 determines the three-dimensional structure of the scene, for example the depth of real objects from a given viewing angle, to correctly create the correct real-object occlusions in the scene images 49. Alternatively, a depth imaging unit can be used instead of or in addition to the imaging unit 12 to create depth images of the scene images 49 and thus enable virtual object occlusion.

[0043] Ytterligere forbedringer tilveiebrakt av videokonferanseinnretning 2 inkluderer gjengivelse av høyere kvalitet, for eksempel ved bruk av antialiasing, av virtuelt-objekt -skygger som kastes på ekte miljø,noe som kan øke scenerealismen vesentlig selv når det gjengis på enmåte som ikke samsvarer med den faktiske belysning i scenen. Disse forbedringene øker gjengi vel seskvalitet og troverdighet for de virtuelle objekter når de gjengis av virtuelt-objektgjengivelsesenheten 209, og bevirker dermed at de virtuelle objekter blandes inn i det virkelige miljøet mer sømløst. I tillegg vil nøyaktig belysning av de virtuelle objekter sikre at de fremstår konsistent med belysningen av resten av det virkelige miljø, noe som sikrer virtuelle objekter med realistisk og mindre påtrengende utseende. [0043] Further improvements provided by video conferencing device 2 include higher quality rendering, for example using anti-aliasing, of virtual-object shadows cast on real environment, which can significantly increase scene realism even when rendered in a manner that does not conform to the actual lighting in the scene. These improvements in turn increase the visual quality and believability of the virtual objects when rendered by the virtual object rendering unit 209, thereby causing the virtual objects to blend into the real environment more seamlessly. In addition, accurate lighting of the virtual objects will ensure that they appear consistent with the lighting of the rest of the real environment, which ensures virtual objects with a realistic and less intrusive appearance.

[0044] Videokonferanseinnretningen 2 tilveiebringer også evnen, via virtuelt-objekt-gj engivelsesenheten 209, til mer realistisk å dekke opp markører som vises i scenebildene 49 etter at de har blitt detektert av markørdeteksjonsenheten 206. For eksempel kan utseendet av presentasjonsmarkøren 42 og/eller den ytterligere markøren 44 være skjemmende og forstyrrende for publikumsmedlemmer som ser scenebildene 49. Ved bruk av utvidet virkelighet er markørene noen ganger fremstilt som en svart firkant med hvitt inni langsmed et svart mønster ved senteret. Det bør imidlertid bemerkes at mange typer markører finnes, som sirkulære markører, retro-reflekterende markører eller aktive markører. Men for å gjøre disse markørene mindre påtrengende, kan en hvit firkant gjengis av virtuelt-objektgj engivelsesenheten 209, i tillegg til presentasjonsmaterialet 40, over noen av markørene detektert av markørdeteksjonsenheten 206 innenfor scenebildene 49. Som sådan vil det hvite kvadrat gjengitt av virtuelt-objektetgjengivelsesenheten 209 medføre at utpekt markør vises i scenebildene 49 bare som et blankt ark som var etterlatt på pulten, fremfor en påtrengende markør med et distraherende mønster. [0044] The video conferencing device 2 also provides the ability, via the virtual object rendering unit 209, to more realistically uncover markers appearing in the scene images 49 after they have been detected by the marker detection unit 206. For example, the appearance of the presentation marker 42 and/or the additional marker 44 may be unsightly and disruptive to audience members viewing the scene images 49. When using augmented reality, the markers are sometimes depicted as a black square with white inside along a black pattern at the center. However, it should be noted that many types of markers exist, such as circular markers, retro-reflective markers or active markers. However, to make these markers less intrusive, a white square may be rendered by the virtual object rendering unit 209, in addition to the presentation material 40, over some of the markers detected by the marker detection unit 206 within the scene images 49. As such, the white square rendered by the virtual object rendering unit 209 result in the designated marker appearing in the scene images 49 simply as a blank sheet of paper left on the desk, rather than an intrusive marker with a distracting pattern.

[0045] I stedet for å dekke markør med en hvit boks, kan virtuelt-objektetgjengivelseseenheten 209 også erstatte markøren med et bilde. For å skape en slik effekt, bestemmes skyggelegging av det virkelige måleobjektet, og denne påføres det virtuelle objektet for å øke realismen i den endelige scene fremvist i scenebildene 49. Derfor kan for eksempel et selskaps logo gjengis å være konsistent med scenebelysningen som den anvendes på markøroverflaten.Dette gir mulighet for presentatør 46 tilpasset å skreddersy en presentasjon på forhånd eller under selve presentasjonen, basert på publikumsmedlemmer, på en måte som mer aktivt engasjerer og imponerer publikumsmedlemmer. [0045] Instead of covering the cursor with a white box, the virtual object rendering unit 209 may also replace the cursor with an image. To create such an effect, shading of the real measurement object is determined and this is applied to the virtual object to increase the realism of the final scene displayed in the scene images 49. Therefore, for example, a company's logo can be rendered to be consistent with the scene lighting on which it is applied This allows presenter 46 to custom tailor a presentation in advance or during the presentation itself, based on audience members, in a way that more actively engages and impresses audience members.

[0046] Videokonferanseinnretningen 2 via markørdeteksjonsenheten 206 kan også detektere en tablet-PC fremvist i scenebildene 49 som en markør i stedet for en trykt, statisk markør. Som sådan kan tablet-PC-en produsere en rekke markører på skjermen som deretter påvises ved markørdeteksjonsenheten 206 akkurat som en trykt markør. Tilnærmingen med å bruke en tablet-PC som markør i stedet for et trykt statisk markør gir en rekke fordeler. For det første betyr bruk av en tablet-PC for å vise markøren at markøren er effektivt dynamisk fremfor statisk som tilfellet ved bruk en trykt markør. Derfor kan markøren endres når presentatøren 46 ønsker, og lar dermed presentatør 46 laste inn en rekke forskjellige AR-presentasjoner bare ved å vise en annen markør på tablet-PC-skjermen. Dette er særlig hvis markøren er QR-kode-basert,siden presentatøren 46 enkelt kan endre data kodet i QR-koden på et øyeblikk, uten å måtte skrive ut en annen markør. I tillegg kan kontroller være tilveiebrakt på berøringsskjermen på tablet-PC-en, som dermed muliggjør lysbildenavigasjon slik som å dra fingeren mot høyre eller venstre for å gå videre til neste eller forrige bilde. [0046] The video conference device 2 via the marker detection unit 206 can also detect a tablet PC displayed in the scene images 49 as a marker instead of a printed, static marker. As such, the tablet PC can produce a series of markers on the screen which are then detected by the marker detection unit 206 just like a printed marker. The approach of using a tablet PC as a marker rather than a printed static marker offers a number of advantages. First, using a tablet PC to display the cursor means that the cursor is effectively dynamic rather than static as is the case when using a printed cursor. Therefore, the cursor can be changed whenever the presenter 46 wishes, thereby allowing the presenter 46 to load a variety of different AR presentations simply by displaying a different cursor on the tablet PC screen. This is especially so if the marker is QR code based, since the presenter 46 can easily change data encoded in the QR code in an instant, without having to print another marker. Additionally, controls may be provided on the touch screen of the tablet PC, thereby enabling slide navigation such as swiping to the right or left to advance to the next or previous slide.

[0047] Videre er en hardwarebeskrivelse av videokonferanseinnretningen 2 i henhold til eksempelvise utførelsesformer beskrevet med henvisning til figur 5.1 figur 5 omfatter videokonferanseinnretningen 2 en CPU 500 som utfører prosesser som er beskrevet ovenfor. Prosessdataene og instruksjoner kan lagres i minnet 502. Disse prosessene og instruksjonene kan også lagres på en lagringsmediumsdisk 504, for eksempel en harddisk (HDD) eller et bærbart lagringsmedium, eller de kan lagres eksternt. Videre er de krevde utviklinger ikke begrenset til formen av datamasinlesbare medier som den oppfinneriske fremgangsmåtes instruksjoner er lagret på. For eksempel kan instruksjonene være lagret på CDer, DVDer, i FLASH-minne, RAM, ROM, PROM, EPROM, EEPROM, harddisk eller enhver annen informasjonsprosesseringsenhet som den datamaskinassisterte konstruksjonsstasjonen kommuniserer med, for eksempel en server eller en datamaskin. [0047] Furthermore, a hardware description of the video conference device 2 according to exemplary embodiments is described with reference to figure 5.1 figure 5, the video conference device 2 comprises a CPU 500 which performs processes described above. The process data and instructions may be stored in memory 502. These processes and instructions may also be stored on a storage medium disk 504, such as a hard disk drive (HDD) or a portable storage medium, or they may be stored externally. Further, the claimed developments are not limited to the form of computer readable media on which the inventive method instructions are stored. For example, the instructions may be stored on CDs, DVDs, in FLASH memory, RAM, ROM, PROM, EPROM, EEPROM, hard disk, or any other information processing device with which the computer-aided construction station communicates, such as a server or a computer.

[0048] Videre kan de krevde utviklinger tilveiebringes som en utility-applikasjon, bakgrunnsdemon, eller komponent i et operativsystem, eller enhver kombinasjon av disse, som eksekveres i forbindelse med CPU 500 og et operativsystem slik som Microsoft Windows 7, UNIX, Solaris, Linux, Apple MAC-OS og andre systemer kjent for fagfolk. [0048] Furthermore, the required developments may be provided as a utility application, background daemon, or component of an operating system, or any combination thereof, which is executed in conjunction with the CPU 500 and an operating system such as Microsoft Windows 7, UNIX, Solaris, Linux , Apple MAC-OS and other systems known to those skilled in the art.

[0049] CPU 500 kan være en Xenon- eller Core- prosessor fra Intel of America eller en Opteron-prosessor fra AMD of America, eller den kan være av annen prosessortype slik det ville innses av en av fagmann i teknikken. Alternativt kan CPU 500 være implementert på en FPGA, ASIC, PLD eller ve bruk av diskrete logiske kretser, som en fagmann innen faget ville innse. Videre kan CPU 500 bli implementert som flere prosessorer i som arbeider felles i parallell for å utføre instruksjonene i de oppfinneriske prosessene beskrevet ovenfor. [0049] CPU 500 may be a Xenon or Core processor from Intel of America or an Opteron processor from AMD of America, or it may be of another processor type as would be appreciated by one skilled in the art. Alternatively, CPU 500 may be implemented on an FPGA, ASIC, PLD or using discrete logic circuits, as one skilled in the art would appreciate. Furthermore, the CPU 500 can be implemented as multiple processors working together in parallel to execute the instructions in the inventive processes described above.

[0050] Videokonferanseinnretningen 2 i figur 5 også omfatter en nettverkskontroller 508, slik som et Intel Ethernet PRO nettverkskort fra Intel Corporation of America, for å danne grensesnitt med nettverket 10. Slik det vil forstås, kan nettverket 10 være et offentlig nettverk, for eksempel Internett, eller et privat nettverk, for eksempel en LAN eller WAN-nettverk, eller enhver kombinasjon av disse, og det kan også inkludere PSTN eller ISDN sub-nettverk. Nettverket 10 kan også være trådført, slik som et Ethernet-nettverk, eller det kan være trådløst slik som et mobilnettverk, inkludert EDGE, 3G og 4G trådløse mobilnettverkssystemer. Det trådløse nettverket kan også være WiFi, Bluetooth, eller enhver annen trådløs form for kommunikasjon som er kjent. [0050] The video conferencing device 2 in Figure 5 also comprises a network controller 508, such as an Intel Ethernet PRO network card from Intel Corporation of America, to form an interface with the network 10. As will be understood, the network 10 can be a public network, for example The Internet, or a private network, such as a LAN or WAN network, or any combination of these, and may also include PSTN or ISDN sub-networks. The network 10 may also be wired, such as an Ethernet network, or it may be wireless such as a cellular network, including EDGE, 3G and 4G wireless cellular network systems. The wireless network can also be WiFi, Bluetooth, or any other known wireless form of communication.

[0051] Videokonferanseinnretningen 2 omfatter videre en skjermkontroller 510, for eksempel et NVIDIA GeForce GTX eller Quadro skjermkort fra NVIDIA Corporation of America, for grensesnitt med displayet 512, for eksempel en Hewlett Packard HPL2445w LCD-skjerm. Et generelt I/O-grensesnitt 514 danner grensesnitt med et tastatur og/eller mus 516 samt et berøringsskjermpanel 518 på eller atskilt fra skjermen 512.1 tillegg er det generelle I/O-grensesnittet forbundet med avbildningsenheter 12, for eksempel en Canon XH Gl, en Sony F65 eller et kamera på en mobilenhet 8 for å motta scenebilder. Det generelle I/O-grensesnittet er også forbundet med et flertall av enheter 200 slik som en PC 16, VCR / DVD / BLU-RAY spiller 214, dokumentkamera 202 og server 4. [0051] The video conference device 2 further comprises a screen controller 510, for example an NVIDIA GeForce GTX or Quadro video card from NVIDIA Corporation of America, for interface with the display 512, for example a Hewlett Packard HPL2445w LCD screen. A general I/O interface 514 interfaces with a keyboard and/or mouse 516 as well as a touch screen panel 518 on or separate from the display 512. In addition, the general I/O interface is connected to imaging devices 12, such as a Canon XH Gl, a Sony F65 or a camera on a mobile device 8 to receive scene images. The general I/O interface is also connected to a plurality of devices 200 such as a PC 16, VCR / DVD / BLU-RAY player 214, document camera 202 and server 4.

[0052] En lydkontroller 526 er også tilveiebrakt i videokonferanseinnretningen 2, slik som en Sound Blåster X-Fi Titanium fra Creative, for å danne grensesnitt med høyttalere/mikrofon 528 og dermed tilveiebringe lyder og/eller musikk. [0052] A sound controller 526 is also provided in the video conference device 2, such as a Sound Blaster X-Fi Titanium from Creative, to interface with speakers/microphone 528 and thus provide sounds and/or music.

[0053] Den generelle lagringskontroller 522 forbinder lagringsmediumdisk 504 med kommunikasjonsbussen 524, som kan være en ISA, EISA, VESA, PCI, eller lignende, for å forbinde alle komponentene til videokonferanseinnretningen 2. En beskrivelse av de generelle trekk og funksjonene for displayet 512, tastatur og/eller mus 516, samt displaykontrolleren 510, lagringskontrolleren 522, nettverkskontrolleren 508, lydkontrolleren 526, og generelt I/O-grensesnittet 514 er utelatt her for enkelhets skyld, siden disse funksjonene er kjent. [0053] The general storage controller 522 connects the storage medium disk 504 to the communication bus 524, which may be an ISA, EISA, VESA, PCI, or the like, to connect all the components of the video conferencing device 2. A description of the general features and functions of the display 512, keyboard and/or mouse 516, as well as the display controller 510, the storage controller 522, the network controller 508, the audio controller 526, and the general I/O interface 514 are omitted here for simplicity, since these functions are known.

[0054] Alle prosesser, beskrivelser eller blokker i flytskjemaer beskrevet i dette dokumentet skal være forstått som å representere moduler, segmenter, eller porsjoner av kode som inkluderer en eller flere eksekverbare instruksjoner for implementering av spesifikke logiske funksjoner eller trinn i prosessen, og alternative implementeringer er innbefattet innenfor rammen av den eksempelvise utførelsesform av den foreliggende oppfinnelse, i hvilken funksjoner kan utføres i annen rekkefølge enn den som er vist og drøftet, herunder hovedsakelig i parallell eller i motsatt rekkefølge, avhengig av den involverte funksjonalitet. [0054] All processes, descriptions, or blocks in flowcharts described in this document shall be understood as representing modules, segments, or portions of code that include one or more executable instructions for implementing specific logical functions or steps in the process, and alternative implementations is included within the framework of the exemplary embodiment of the present invention, in which functions can be performed in a different order than that shown and discussed, including mainly in parallel or in the opposite order, depending on the functionality involved.

[0055] Selvfølgelig er tallrike modifikasjoner og variasjoner av den foreliggende oppfinnelse mulig i lys av den ovennevnte lære. Det skal derfor forstås at innenfor rammen av de medfølgende krav, kan den foreliggende oppfinnelse utøves på annen måte enn det som er spesifikt beskrevet her. [0055] Of course, numerous modifications and variations of the present invention are possible in light of the above teachings. It should therefore be understood that within the framework of the accompanying claims, the present invention can be practiced in a different way than what is specifically described here.

Claims

1. A video conferencing device (2) for presenting extended images, comprising: at least one interface; a network; and a computer processor programmed to receive (S300) first video information identifying a scene via at least one interface; detecting (S302) whether the scene contains at least one marker; identifying (S303) a location for each detected marker within the scene; expanding (S306; S308; S310), in response to determining the scene containing a first marker and based on the location of the first marker, a portion of the scene containing the first marker with second video information received via the at least one interface, and transmitting (S312) the extended scene to at least one external device via the network, wherein the computer processor is further programmed to extend an entirety of the scene with the second video information based on the location of the first marker in the scene with respect to the location of a second marker detected in the scene.

2. Video conference device (2) according to claim 1, where the first video information and the second video information are contained in a single video channel.

3. Video conferencing device (2) according to claim 1, where the computer processor is further programmed to decoding an image located on the first marker to obtain location information identifying a remote location for the second video information, and to receive the second video information from the remote location via the network.

4. Video conference device (2) according to claim 1, where the computer processor is further programmed to to identify at least one hand gesture of a user from the scene; identifying a gesture distance at which the at least one hand gesture extends over the portion of the scene containing the second video information, and changing the second video information based on the at least one hand movement and the gesture distance.

5. Video conferencing device (2) according to claim 4, wherein the computer processor changes the second video information only in response to identifying a gesture distance of at least half of the portion of the scene containing the second video information.

6. Video conference device (2) according to claim 4, where a whole of the scene is expanded with the second video information based on the at least one hand gesture and the gesture distance.

7. Video conference device according to claim 4, where the second video information is a slide show with a plurality of slides.

8. Video conferencing device (2) according to claim 7, wherein the computer processor is further programmed to navigate through the majority of slides based on gesture distance and a direction and movement of the at least one hand gesture.

9. Video conference device (2) according to claim 7, where the computer processor is further programmed to removing a slide displaying an image based on gesture distance and in response to identifying an upward sweeping hand gesture from the user, and augmenting a portion of the scene containing a second marker with a three-dimensionally modeled image of the image previously removed from the slide.

10. Video conference device (2) according to claim 7, where the computer processor is further programmed to removing a slide displaying an image based on gesture distance and in response to identifying an upward sweeping hand gesture from the user, and augmenting a portion of the scene containing the first marker with a three-dimensional modeled image of the image previously removed from the slide on a same location that the image was displayed on the previously removed slide.

11. Video conferencing device (2) according to claim 7, wherein the computer processor is further programmed to highlight or zoom a portion of a slide in response to identification of at least one hand gesture pointing to a specific portion of the slide and maintaining the at least one hand gesture in a predetermined period of time.

12. Video conference device (2) according to claim 7, where the computer processor is further programmed to displaying a list of various slide shows in response to identifying a downward swiping hand gesture from the user; identifying a hand gesture that selects a particular slide show from the list of different slide shows, and to expand the subscene containing the first marker with the selected slide show.

13. Video conferencing method for presenting extended images, comprising: receiving (S300) a first video information identifying a scene via at least one interface; detecting (S302) whether the scene contains at least one marker; identifying (S303) a location of each detected marker within the scene; expanding (S306; S308; S310), with a CPU, in response to determining that the scene contains a first marker, and based on the location of the first marker, a portion of the scene containing the first marker with a second video information received via at least one interface; and transmitting (S312) the extended scene to at least one external device via a network, the method further comprising extending an entirety of the scene with the second video information based on the location of the first marker in the scene with respect to the location of a second marker detected in the scene.

14. Video conferencing method according to claim 13, further comprising: identifying at least one hand gesture of a user from the scene; identifying a gesture distance at which the at least one hand gesture spans the portion of the scene containing the second video information, and changing the second video information based on the at least one hand gesture and the gesture distance.

15. Video conferencing method according to claim 14, wherein the second video information is a slide show with a plurality of slides.

16. The video conferencing method of claim 15, further comprising: removing a slide displaying an image based on gesture distance and in response to identification of an upward sweeping hand gesture from the user, and extending a portion of the scene containing the first marker with a three-dimensionally modeled image of the image previously removed from the slide at the same location as the image that was displayed on the previously removed slide.

17. A non-transitory computer-readable medium storing machine-readable instructions thereon which when executed by a computer processor (500) cause the computer processor to perform a video conferencing method of presenting extended images, comprising: receiving (S300) a first video information identifying a scene via at least one interface; detecting (S302) whether the scene contains at least one marker; identifying (S303) a location of each detected marker within the scene; expanding (S306; S308; S310), in response to determining that the scene contains a first marker, and based on the location of the first marker, a portion of the scene containing the first marker with a second video information received via at least one interface; transmitting (S312) the expanded scene to at least one external device via a network; and augmenting a whole of the scene with the second video information based on the location of the first marker in the scene with respect to the location of a second marker detected in the scene.

18. The non-transitory computer readable medium of claim 17, further comprising computer readable instructions which when executed by a computer processor (500) cause the computer processor to: identify at least one hand gesture of a user from the scene; identifying a gesture distance at which the at least one hand gesture spans the portion of the scene containing the second video information, and changing the second video information based on the at least one hand gesture and the gesture distance.

19. Non-transitory computer readable medium according to claim 18, wherein the second information is a slide show with a plurality of slides, and further containing computer readable instructions that when executed by a computer processor (500) cause the computer processor to: remove a slide displaying an image based on gesture distance and in response to identification of an upward swiping hand gesture from the user, and extending a portion of the scene containing the first marker with a three-dimensional modeled image of the image previously removed from the slide at the same location as the image was displayed on the previously removed slide.