WO2023174385A1 - 利用虚拟场景进行真实社交的方法、系统及ar眼镜 - Google Patents

利用虚拟场景进行真实社交的方法、系统及ar眼镜 Download PDF

Info

Publication number
WO2023174385A1
WO2023174385A1 PCT/CN2023/082004 CN2023082004W WO2023174385A1 WO 2023174385 A1 WO2023174385 A1 WO 2023174385A1 CN 2023082004 W CN2023082004 W CN 2023082004W WO 2023174385 A1 WO2023174385 A1 WO 2023174385A1
Authority
WO
WIPO (PCT)
Prior art keywords
live broadcast
light
live
glasses
person
Prior art date
Application number
PCT/CN2023/082004
Other languages
English (en)
French (fr)
Inventor
张一平
Original Assignee
郑州泽正技术服务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 郑州泽正技术服务有限公司 filed Critical 郑州泽正技术服务有限公司
Publication of WO2023174385A1 publication Critical patent/WO2023174385A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays

Definitions

  • the invention belongs to the field of virtual reality technology, and specifically relates to methods, systems and AR glasses for real social interaction using virtual scenes.
  • VR virtual reality technology
  • computers electronic information, and simulation technologies. Its basic implementation method is that computers simulate virtual environments to give people a sense of immersion in the environment.
  • VR technology has been applied to various technical fields. middle.
  • the patent number "2017103756194” discloses a VR social system and method based on real-time three-dimensional reconstruction of the human body.
  • the disclosed patent document has the following shortcomings: First, the three-dimensional reconstruction of the human body in the VR scene is not real. The image of the real human body in a live broadcast; secondly, the patent does not solve how the changes in the position of the human body in reality match the virtual scene. There are differences between the direction and speed of human body movement in reality and the direction and speed of human body movement in virtual scenes. The problem of inconsistency; third, due to the mismatch between the position changes of the human body in reality and the virtual scene, it is impossible to accurately establish contact with other people through the virtual scene, and the social efficiency is low.
  • the present invention provides a method and system for real social interaction using virtual scenes.
  • the method of using virtual scenes for real social interaction includes cloud servers and live broadcast terminals.
  • the method includes the following steps:
  • Step 1 The cloud server obtains the number N of live broadcast rooms participating in the social virtual scene based on the number of live broadcast terminals.
  • the live broadcast room is a live broadcast scene in reality, N ⁇ 2;
  • Step 2 Establish N+1 identical three-dimensional coordinate systems based on the number N of live broadcast rooms.
  • each live broadcast room establishes a three-dimensional coordinate system
  • N live broadcast rooms constitute 1 to N three-dimensional coordinate systems.
  • the cloud server The virtual scene establishes the N+1 three-dimensional coordinate system;
  • Step 3 Set the N+1 three-dimensional coordinate systems.
  • Each three-dimensional coordinate system consists of an x-axis, a y-axis, and a z-axis, and the length units of each x-axis, y-axis, and z-axis are the same;
  • the ground of the virtual scene is defined as the plane formed by the x-axis and y-axis of each three-dimensional coordinate system.
  • the ground of the live broadcast room is also the plane formed by the x-axis and y-axis of the three-dimensional coordinate system.
  • the virtual scene is in the N+1 three-dimensional coordinate system.
  • the spatial range occupied in is expressed as K N+1
  • the corresponding spatial range in the three-dimensional coordinate system of each live broadcast room is expressed as K 1 -K N.
  • Three-dimensional live broadcast can be carried out with the appearance of real people; the obstacles refer to the degree of occlusion that prevents the live broadcast from being completed. Obstacles that can be filtered and compensated during the live broadcast process are considered barrier-free;
  • Each live broadcast room has at least one live broadcast room that is defined as occupied by the live broadcaster.
  • the number of live broadcast rooms with people is defined as M, N ⁇ M ⁇ 2; each live broadcast room within the M live broadcast rooms is collected separately.
  • the human body position information and human body shape information of each live broadcast person, as well as the voice information and voice position information of the live broadcast person, are simultaneously transmitted to the cloud server.
  • the cloud server transmits the appearance information of each live broadcast person in each live broadcast room. Processed into three-dimensional portrait information;
  • Step 5 Replace Step 4.
  • Each live broadcast room has at least one live broadcast room that is defined as occupied by the live broadcaster.
  • the number of live broadcast rooms defined as occupied is M, N ⁇ M ⁇ 2; each live broadcast room within the M live broadcast rooms
  • the human body position information and human body shape information of each person being broadcasted as well as the voice information and voice position information of each person being broadcast are collected separately and processed into three-dimensional portrait information and transmitted to the cloud server at the same time;
  • Step 6 The cloud server imports the human body position information and three-dimensional portrait information of each live broadcaster in each live broadcast room, as well as the voice information and voice location information of the live broadcaster, into the virtual scene in real time to form a VR data stream.
  • the cloud server transmits the VR data stream to each live broadcast terminal; each live broadcaster wears AR glasses, the display component of the live broadcast terminal, in the corresponding live broadcast room.
  • the virtual scene gathers the virtual images of all live broadcasters wearing AR glasses.
  • the physical image and virtual image of the live broadcasters in each live broadcast room overlap.
  • the live broadcasters can only see the VR of other live broadcasters through AR glasses.
  • Virtual image of course, when the cloud server transmits the VR data stream to each live broadcast terminal; the virtual image of the live broadcaster may be missing, and the live broadcaster can only see the virtual images of other live broadcasters through AR glasses.
  • the generation of three-dimensional portrait information includes the following steps:
  • each live broadcast room at least three cameras are set up for each live broadcast person, and at least three cameras perform synchronous tracking and shooting of the live broadcast people in the live broadcast room; synchronous tracking and shooting is the image of each frame shot by different cameras. The time is the same;
  • the live broadcast terminal performs human body keying from different angles frame by frame from the videos shot by different cameras in the same live broadcast room, and synthesizes the human body stereoscopic image;
  • the live broadcast terminal transmits the human body stereoscopic image to the cloud server.
  • the cloud server recognizes the human body stereoscopic image frame by frame, records the video frames containing AR glasses in the human body stereoscopic image, and fuses the facial image with the human body stereoscopic image at the same time frame. Form three-dimensional portrait information.
  • the generation of three-dimensional portrait information includes the following steps, sound information and sound position information
  • Described S1 In each live broadcast room, at least three cameras are set up for each live broadcast person, and at least three cameras perform synchronous tracking and shooting of the live broadcast people in the live broadcast room; synchronous tracking and shooting is for each person shot by different cameras.
  • the frame time is the same; the person being broadcasted wears AR glasses in the live broadcast room, and the AR glasses are equipped with a camera facing the person's face.
  • the time frame of the camera is synchronized with the time frame of the camera, and the camera shoots the face of the person being broadcasted; mainly Photograph the parts of a person’s face that are obscured by AR glasses;
  • the live broadcast terminal fuses the videos captured by multiple sets of cameras in sequence according to time frames and numbers to form facial images; at the same time, the live broadcast terminal performs human body matting from different angles frame by frame from the videos captured by different cameras in the same live broadcast room. , and synthesize a three-dimensional image of the human body without AR glasses;
  • the live broadcast terminal transmits the facial image and the human body stereoscopic image to the cloud server.
  • the cloud server recognizes the human body stereoscopic image frame by frame, records the video frames containing AR glasses in the human body stereoscopic image, and compares the facial images at the same time frame with The three-dimensional human body images are fused to form three-dimensional portrait information without AR glasses.
  • the person being broadcast is wearing AR glasses in the live broadcast room.
  • the AR glasses have a camera facing the person's face.
  • the camera is located around the lens of the AR glasses.
  • Each camera shoots synchronously, and the shooting range between each adjacent camera is different.
  • the overlapping area also generates an overlapping area with the camera.
  • the camera and the camera are synchronously photographed.
  • the live broadcast terminal extracts facial information frame by frame from the camera.
  • the human body position information includes position coordinates and posture.
  • the live broadcast terminal collects the position coordinates and posture of each person being broadcast in each live broadcast room and transmits the collected position coordinates and posture to the cloud server.
  • the live broadcast terminal is based on The time sequence of the camera's shooting frames collects the coordinates and postures of everyone in the live broadcast room synchronously and in real time; the position coordinates are collected by positioning equipment, and the posture data is collected by gyroscopes.
  • the positioning equipment and gyroscopes are fixed on the person being broadcast. Chest.
  • the sound information is collected through the recording equipment of the live broadcast terminal, and the sound location information is collected through the sound source positioning equipment of the live broadcast terminal.
  • the system includes a cloud server and a live broadcast terminal.
  • the cloud server is equipped with a cloud service processor.
  • the live broadcast terminal is equipped with a terminal processor.
  • the live broadcast terminal also includes at least three sets of cameras, AR glasses, wireless In-ear headphones, positioning equipment and gyroscopes, wherein the cloud service processor is connected to the terminal processor through TCP/IP communication, and the camera, VR headset, wireless in-ear headphones, positioning equipment and gyroscopes are all connected to the terminal
  • the processor is electrically connected, the positioning device and the gyroscope are fixed and integrated, and the positioning device and the gyroscope are fixed on the chest of the person being broadcast.
  • An AR glasses including a spectacle frame, and a VR display device arranged on the eye frame.
  • the VR display device includes a display screen and a convex lens inside the display screen.
  • the convex lens and the display screen are combined to form a VR image in the human eye.
  • the display screen is Translucent display screen, a concave lens with diopter matching the diopter of the convex lens is set outside the translucent display screen.
  • the concave lens is used to offset the refraction of light by the convex lens.
  • the concave lens is located within the focus of the convex lens, and the convex lens is located within the virtual focus of the concave lens.
  • the light-transmitting part of the light-transmitting display screen forms a realistic transparent image.
  • the light-transmitting display screen is a light-transmitting single-sided display screen, and the light-transmitting single-sided display screen displays VR images toward the inside, that is, toward the eye side.
  • the light-transmitting single-sided display screen includes light-emitting areas of the array, and between the light-emitting areas of the array is the light-transmitting area of the array.
  • the light-emitting area emits light on one side.
  • Each of the light-transmitting areas is a part of a concave Fresnel lens.
  • the array of all light-transmitting areas forms a concave Fresnel lens.
  • the concave Fresnel lens replaces the concave lens with a diopter matching the diopter of the convex lens on the outside of the light-transmitting display screen. At this time, the weight and thickness of the glasses can be reduced.
  • the convex lens is a Fresnel convex lens
  • the concave lens is a Fresnel concave lens
  • the material of the light-transmitting area is a photosensitive color-changing light-transmitting material.
  • the luminescent material in the luminous area surrounding the light-transmitting area emits light
  • the color of the photo-sensitive color-changing light-transmitting material becomes darker and the light transmittance decreases; the luminescent material surrounding the light-transmitting area does not When emitting light, the light transmittance of the photochromic light-transmitting area is high.
  • a distance adjustment device is provided between the convex lens and the display screen to adjust the distance between the convex lens and the display screen.
  • the invention discloses a method and system for using virtual scenes for real social interaction.
  • the coordinate systems set in live broadcast rooms in different places and the coordinate systems built in the virtual scene are defined as three-dimensional coordinates in the same direction.
  • the system provides a systematic basis for realizing real social interaction in virtual scenes; then the appearance information of the live broadcasters in different live broadcast rooms and the corresponding three-dimensional coordinate position information of the live broadcasters in the live broadcast room are extracted, and the extracted
  • the appearance information is processed into a three-dimensional image of a real person. According to each person's three-dimensional coordinate position information in the live broadcast room, the three-dimensional image of the person being broadcast is placed in the virtual scene.
  • the coordinates of the person being broadcast in the virtual scene are consistent with the person's position in the live broadcast room.
  • the three-dimensional coordinate positions are the same; it solves the problem of consistent changes in the position of the person being broadcast in reality and the virtual scene, so that the position on the ground and the direction and speed of movement of each person broadcasting in the live broadcast room in reality are consistent with the live broadcast in the virtual scene
  • the position of the person on the ground and the direction and speed of movement are consistent.
  • the ground movement in the virtual scene is just like the ground movement in the live broadcast room. When bypassing the virtual object in the virtual scene, the real object is not changed in the live broadcast room.
  • the virtual The stairs on the ground in the scene must correspond to the real existence in the live broadcast room to prevent missing points; for example, while live broadcast in the live broadcast room, live broadcaster A sees the real live broadcast avatars of other live broadcasters in the virtual scene through VR equipment, and wants to communicate with the live broadcaster.
  • B communicates with the live broadcaster he can greet the live broadcaster in the virtual scene. This greeting process is broadcast live to the virtual scene.
  • the live broadcaster B discovers that the live broadcaster A greets him in the virtual scene, he responds to the live broadcaster A.
  • the live broadcaster A and Live broadcaster B will greet each other and communicate with each other's real live broadcast virtual image.
  • the communication between the two is not in contact, such as dialogue, gestures, expressions, etc., it will be beneficial to the live broadcaster and the virtual scene. Accurately establish connections between other people and improve social efficiency.
  • the present invention displays the image of a real person in the virtual scene, which is broadcast live, with better immersion and interactivity, and good experience effect.
  • Figure 1 is a schematic diagram of the VR headset structure.
  • Figure 2 is a schematic diagram of the position and structure of the limit plate in the VR head display.
  • Figure 3 is a schematic structural diagram of the camera distribution inside the VR headset.
  • Figure 4 is a schematic structural diagram of a full-face VR headset.
  • Figure 5 is a schematic structural diagram of the camera distribution inside the full-face VR headset.
  • Figure 6 is a schematic diagram of the system structure for realizing real social interaction in virtual scenes.
  • Figure 7 is a schematic diagram of the AR glasses of the present invention.
  • Figure 8 is a schematic structural diagram of AR glasses according to one embodiment.
  • Figure 9 is a schematic diagram of a partial cross-sectional structure of an AR glasses lens according to one embodiment.
  • An AR glasses as shown in Figure 8, includes a spectacle frame 105 and a VR display device arranged on the spectacle frame 105. Its principle is as shown in Figure 7.
  • the display screen 121 and the convex lens 122 inside the display screen 121 constitute a VR display device.
  • a VR image is formed in 123.
  • the display screen 121 is a light-transmitting display screen 1211.
  • a concave lens 124 whose diopter matches the diopter of the convex lens 122 is provided outside the light-transmissive display screen 1211.
  • the concave lens 124 is used to offset the refraction of light by the convex lens.
  • the concave lens 124 is located at Within the focus of the convex lens 121, the convex lens 121 is located within the virtual focus of the concave lens 124.
  • the light-transmitting part of the light-transmitting display screen forms a realistic transparent image.
  • the light-transmitting display screen is a light-transmitting single-sided display screen, and the light-transmitting single-sided display screen displays VR images toward the inside, that is, toward the eye side.
  • the light-transmitting single-sided display screen includes an array of light-emitting areas 102. Between the array's light-emitting areas 102 is an array of light-transmitting areas 103. The array's light-emitting areas 102 are fixed on the transparent bottom plate 104. The part of the transparent bottom plate 104 corresponding to the light-transmitting area 103 is also a light-transmitting area, so the light-transmitting area 103 can be a hole, or of course can be filled with transparent material. The light-emitting area emits light on one side.
  • Each light-transmitting area 103 is not a hole.
  • Each light-transmitting area 103 is a part of a Fresnel concave lens.
  • the array of all light-transmitting areas forms a Fresnel concave lens with reduced transparency.
  • the Fresnel concave lens with reduced transparency instead of a concave lens, a concave lens 106 whose diopter matches the diopter of the convex lens 101 is provided outside the light-transmitting display screen. Since the light-emitting area 102 is blocked, the array of all light-transmitting areas forming a Fresnel concave lens can only transmit half of the light, and the transparency is reduced by half.
  • Each light-transmitting area 103 is a hole, and the transparent bottom plate 104 is a complete Fresnel concave lens. If the light-emitting area 102 is not light-transmitting, it blocks the light from the corresponding Fresnel concave lens part. The light-transmitting area of each hole
  • the corresponding transparent bottom plate 104 of 103 is a part of the Fresnel concave lens, through which light can pass.
  • the part of the transparent bottom plate 104 that transmits light forms a Fresnel concave lens with reduced transparency. This Fresnel concave lens with reduced transparency replaces the lens.
  • a concave lens 106 whose diopter matches the diopter of the convex lens 101 is provided outside the light display screen.
  • the convex lens 101 may be a Fresnel convex lens. If the transparent base plate 104 is not a Fresnel concave lens, then the concave lens may be a Fresnel concave lens. These embodiments enable reduced eyeglass weight and reduced eyeglass thickness.
  • the material of the light-transmitting area is a photosensitive color-changing light-transmitting material.
  • the color of the photo-sensitive color-changing light-transmitting material becomes darker and the light transmittance decreases, making the light-emitting area 102 realistic VR.
  • the image is clearer; when the luminescent material around the light-transmitting area does not emit light, the light transmittance of the photosensitive color-changing light-transmitting area is high, making the real scene through the light-transmitting area clear, making the combination of virtuality and reality more perfect.
  • a thread 107 is provided between the convex lens 101 and the spectacle frame 105. Since the light-transmitting single-sided display screen is fixed on the spectacle frame 105, adjusting the distance between the convex lens 101 and the spectacle frame 105 can adjust the distance between the convex lens and the display screen. distance. When wearing it, first adjust the distance from the concave lens to the display screen to make the VR image clear, and then adjust the distance from the concave lens to the display screen so that the light in the light-transmitting area can clearly display the real scene after passing through the concave lens and convex lens.
  • the glasses legs of the glasses frame 105 can be equipped with a camera facing the human eye for synthesizing a complete facial three-dimensional figure.
  • the glasses frame is a high-strength titanium alloy thin glasses frame, such as a truss structure glasses frame of high-strength titanium alloy filaments
  • the convex lens 101 of the lens is 10-25mm away from the human eye.
  • the lens part can block. It is easy to combine the truss structure glasses frame with high-strength titanium alloy filament.
  • the lenses are cut out to form a complete facial shape.
  • the described method of using virtual scenes for real social interaction is to extract the images of people being broadcast in live broadcast rooms at different locations participating in real social interactions and process the extracted images of people being broadcast into three-dimensional images of real people, that is, It is a 3D real-person image, and the processed 3D image or 3D real-person image is placed into the corresponding position in the virtual scene according to each person's real-time position coordinates to generate a virtual scene that can be interacted with in real time, so that people with different geographical locations can interact Immersive real social interaction.
  • people in different locations enter the same virtual scene at the same time period to achieve real social interaction in the virtual scene.
  • the method of using virtual scenes for real social interaction includes the following steps:
  • Step 1 The cloud server obtains the number N of live broadcast rooms participating in the social virtual scene based on the number of live broadcast terminals.
  • the live broadcast room is a live broadcast scene in reality, N ⁇ 2; for the social method, at least two live broadcasts are required Scene, social interaction is meaningful.
  • the live broadcast scene is equipped with a live broadcast terminal and a live broadcast area.
  • the live broadcast terminal can extract the appearance information and location information of the live broadcast person in the live broadcast area.
  • the live broadcast person can use the VR headset in the live broadcast area. You can communicate and interact with the images of live broadcasters in other live broadcast rooms displayed in the social virtual scene in real time.
  • Step 2 Establish N+1 three-dimensional coordinate systems according to the number N of live broadcast rooms.
  • each live broadcast room establishes a three-dimensional coordinate system
  • N live broadcast rooms constitute 1 to N three-dimensional coordinate systems.
  • the cloud server is in the virtual scene Establish the N+1 three-dimensional coordinate system;
  • Step 3 Set the N+1 three-dimensional coordinate systems.
  • the x-axis direction of each three-dimensional coordinate system is due east, the y-axis direction is due north, and the z-axis direction is sea level upward. direction, and the length units of each x-axis, y-axis and z-axis are the same; this setting is to not disrupt the live broadcaster's sense of direction; of course, as long as the N+1 three-dimensional coordinate systems are the same coordinate system, only If the origin position is different, coordinate conversion is required.
  • the ground of the virtual scene is defined as sea level, and the ground of the live broadcast room is also assumed to be sea level.
  • the spatial range occupied by the virtual scene in the N+1 three-dimensional coordinate system is expressed as K N+1 .
  • the corresponding spatial range is expressed as K 1 -K N .
  • There are no obstacles in K 1 -K N and the real person's appearance in K 1 -K N can be used for three-dimensional live broadcast; the obstacle refers to the degree of occlusion that cannot complete the live broadcast. Obstacles that can be filtered and compensated during the live broadcast process are considered accessibility;
  • the above settings allow the three-dimensional coordinate system corresponding to the live broadcast room participating in the social virtual scene and the three-dimensional coordinate system established in the virtual scene to have a unified coordinate system definition, that is, the directions of the x, y, and z axes of the three-dimensional coordinate system are all the same.
  • the position coordinates of the people in the live broadcast room are mapped to the virtual scene in real time, so that the real three-dimensional image in the virtual scene has the same position coordinates as the people in the live broadcast room.
  • Step 4 The cloud server defines the origin position information of the N+1th three-dimensional coordinate system in the virtual scene, and the cloud server transmits the origin position information of the N+1th three-dimensional coordinate system in the virtual scene to different live broadcast terminals. Determine the origin information of the three-dimensional coordinate system of the 1st to N live broadcast rooms based on the origin position information obtained by the live broadcast terminal;
  • Each live broadcast room has at least one live broadcast room that is defined as occupied by the live broadcaster.
  • the live broadcast terminals in the M live broadcast rooms separately collect the location information and appearance information of each person being broadcast and transmit it to
  • the cloud server processes the appearance information of each person being broadcast in each live broadcast room into real three-dimensional portrait information;
  • Step 6 The cloud server imports the three-dimensional portrait information of each live broadcast person in each live broadcast room and the location information of each live broadcast person in the live broadcast room into the virtual scene in real time to form a VR data stream.
  • the cloud server will VR data streams are transmitted to various live broadcast terminals.
  • step five in a live broadcast room with people, the live broadcast terminal collects the location information and appearance information of each live broadcaster, and transmits the appearance information of each live broadcaster to the cloud server, and the cloud server The information is processed into real three-dimensional portrait information, and the three-dimensional portrait information is imported into the virtual scene according to the position information corresponding to the three-dimensional portrait information.
  • the live broadcast terminal will separately collect the location information and appearance information of each person being broadcasted, and add each person to the live broadcast.
  • the person's appearance information is processed into real three-dimensional portrait information; the three-dimensional portrait information is imported into the virtual scene based on the position information corresponding to the three-dimensional portrait, and two or more live broadcasters are collected separately to avoid two people appearing in the same video at the same time. have an impact on subsequent extraction and separation.
  • the location information and appearance information of the live broadcast person are collected in each live broadcast room, and each person's appearance information is processed into real three-dimensional portrait information;
  • the three-dimensional portrait information is imported into the virtual scene based on the position information corresponding to the three-dimensional portrait.
  • step 5 in a live broadcast room with people, the live broadcast terminal collects the voice information and voice location information of each live broadcaster, and combines the voice information of each live broadcaster. and sound position information are transmitted to the cloud server.
  • the cloud server processes the person's voice information and sound position information into real three-dimensional sound information and sound position information, and converts the three-dimensional sound information and sound position information based on the three-dimensional sound information and sound position information.
  • the corresponding location information is imported into the virtual scene.
  • the location information includes location coordinates and postures.
  • the live broadcast terminal collects the location coordinates and postures of each person in each live broadcast room and transmits the collected location coordinates and postures to the cloud server.
  • the live broadcast terminal collects the location coordinates and postures based on the camera's shooting frames.
  • the coordinates and postures of everyone in the live broadcast room are collected synchronously and in real time in chronological order.
  • the generation of real three-dimensional portrait information includes the following steps:
  • each live broadcast room at least three cameras are set up for each live broadcast person, and at least three cameras perform synchronous tracking and shooting of the live broadcast people in the live broadcast room; synchronous tracking and shooting is the image of each frame shot by different cameras.
  • the camera is preferably an RGBD camera
  • the live broadcast room pair is preferably a live broadcast room with a green shed, which is conducive to subsequent keying;
  • the person being broadcast wears a VR headset in the live broadcast room.
  • the multiple sets of cameras are numbered in sequence.
  • the time frames of the multiple sets of cameras are synchronized with the time frame of the cameras.
  • a group of cameras are used to shoot the face of the person being broadcast; the shooting includes shooting to determine the location of the sound source,
  • the live broadcast terminal fuses the videos captured by multiple sets of cameras in sequence according to time frames and numbers to form facial images;
  • the live broadcast terminal performs frame-by-frame keying of the human body from different angles from the videos captured by different cameras in the same live broadcast room, and synthesizes a three-dimensional image of the human body;
  • the live broadcast terminal transmits the facial image and human body stereoscopic image to the cloud server.
  • the cloud server recognizes the human body stereoscopic image frame by frame, records the video frame of the VR headset in the human body stereoscopic image, and compares the facial image and human body in the same time frame. Stereoscopic images are fused to form three-dimensional portrait information without VR headsets.
  • the captured video also includes the voice information of the person being broadcast in the live broadcast room.
  • the voice information includes the voice content, the location of the sound source and the direction of the voice.
  • the cloud server extracts the voice content, voice location and voice direction from the captured video. Direction, the cloud server sends the sound to different live broadcasters in the virtual scene according to the sound position and sound direction, so that the live broadcasters in each live broadcast room can hear sound information of different locations, directions and intensities.
  • a gyroscope is hidden in the live broadcaster's hair to determine the direction of the sound, eliminating the need for software and hardware to determine the sound direction. It is more economical to determine the height of the sound source based on the height of the live broadcaster.
  • the VR headsets include ordinary VR headsets, full-face VR headsets and projection-type virtual reality VR glasses.
  • the projection-type virtual reality VR glasses are existing technology.
  • the patent number "2017215701499” and the patent title "A projection-type virtual reality VR glasses display device” have disclosed a kind of projection-type virtual reality VR glasses.
  • the person being broadcast is wearing projection-type virtual reality VR glasses in the scene.
  • At least two left and right cameras are installed around each lens barrel on the VR glasses frame.
  • the camera ranges between each adjacent camera have overlapping areas, so The camera and the video camera are synchronously photographed, and the live broadcast terminal extracts facial information frame by frame from the camera.
  • the blocking part is only around the eyes, so the blocking part is small, the minimum is only 2-5 square centimeters, and there is a distance between the eyes and the frame, so the eyes are not completely blocked, and it is completely possible to capture color images of the eyes.
  • live broadcasters wearing such VR glasses can see the floor of the live broadcast room, which not only eliminates the fear of stepping into the air, making it easier for people to accept this social method, but also provides feedback on whether the floor of the live broadcast room coincides with the floor of the virtual scene. , if there is no overlap, the information about the error range between the live broadcast room floor and the virtual scene floor can be fed back, and the information can be transmitted to the cloud server.
  • the cloud server can correct the error between the live broadcast room floor and the virtual scene floor. Markers can also be set on the ground with the same coordinates in each live broadcast room, and markers can also be set on the ground with the same coordinates in the virtual scene, such as luminous points. The overlap of the marks in each live broadcast room and the virtual scene can be monitored in real time, and the information can be transmitted to the cloud service. The cloud server can correct the error between the live broadcast room ground and the virtual scene ground.
  • the VR headset includes a VR display screen 1, a lens fixing plate 2, a bracket 3, a support frame 4 and an elastic band.
  • a VR display screen 1 is provided on one side of the support frame 4, and a lens fixing plate 2 is provided on the support frame 4 opposite to the VR display screen 1.
  • a VR lens 5 is symmetrically provided on the fixing plate 2, so Brackets 3 are fixedly provided on both side edges 4 of the lens fixing plate 2.
  • the brackets 3 are arranged symmetrically.
  • a limit plate is provided adjacent to the bracket 3 and located on the side edges of the lens fixing plate 2. 6.
  • the limiting plate 6 is used to limit the distance between the human eye and the VR lens 5, so that there is a certain viewing distance between the human eye and the VR lens 5.
  • the bracket 3 is also provided with a wearing elastic band. The elastic band and the bracket 3 cooperate so that the VR head display can be smoothly worn on the person's head.
  • a camera is provided inside the VR head display.
  • the camera is provided on the fixed plate 2.
  • the camera includes an upper camera group 7, The middle camera group 8 and the lower camera group 9 are evenly distributed. Each camera shoots synchronously, and the shooting range between each adjacent camera overlaps. area, the camera and the camera are synchronously photographed, and the live broadcast terminal extracts facial information frame by frame from the camera.
  • the upper camera group 7, the middle camera group 8 and the lower camera group 9 can capture all the eye expressions of the person being broadcast in the VR headset.
  • the expression in the inner area of the human eye being broadcast is spliced from the videos captured by each camera in the upper camera group 7, the middle camera group 8 and the lower camera group 9. Since the camera range between each adjacent camera has a certain overlapping area, the cameras on the fixed plate 2 are coded in a certain order, such as from left to right and from top to bottom. Encoding, or sequentially encoding each camera in the order from bottom to top and from right to left, the left and right up and down order is the left and right up and down order of the picture in Figure 3.
  • each frame of image of each camera at the same time is separated from the video in turn. Then according to the coding order of the camera and using the overlapping area as the reference position, pictures with the same overlapping area are referenced to the overlapping area. Splicing is carried out at the position in the picture, and finally the whole picture is completed, and the spliced pictures of each frame are synthesized into a video, that is, the eye expression information in the VR headset is obtained.
  • the coding sequence is preferably from top to bottom and from left to right.
  • the first camera 10 on the left is coded as A
  • the second camera 11 on the left is coded as B
  • the remaining cameras are coded as follows.
  • the position sequence is encoded in the order of the English alphabet; when splicing, within the same time frame, first extract the first image information PA captured by the camera coded A, and then extract the second image information PB captured by the camera coded B. , compare the position of the overlapping area of the first picture information PA and the second picture information PB in the picture, and then cover the overlapping area of the second picture information PB with the overlapping area of the first picture information PA to complete the splicing of the two pictures.
  • the picture information captured by the cameras is spliced in sequence to stitch the pictures from all cameras in the same time frame to form the human eye expression picture being broadcast live.
  • the camera in the VR headset and the camera in the live broadcast room also shoot synchronously.
  • the VR headset is filtered out from the appearance information of the person being broadcasted by the camera in the live broadcast room.
  • the actual face information when not wearing a VR headset will be fused with the appearance information of the live broadcast person after filtering out the VR headset and the human eye expression pictures in the same time frame to form a real-time face information when not wearing a VR headset.
  • the VR headset filtering method is to perform a grayscale transformation on each frame of the image of the person being broadcasted in the appearance information captured by the camera, and determine that the grayscale image contains the VR headset. area, determine a fixed point coordinate on each frame of grayscale image, and use the imcrop function in Matlab to cut the image to complete the filtering of the VR headset.
  • the full-face VR head display includes a VR display screen 1, a support frame 4 and a mask 12.
  • the support frame 4 A VR display screen 1 is provided on one side, and a mask 12 is provided on the support frame 4 opposite to the VR display display 1.
  • the mask 12 can cover the entire face of a person, and a VR lens is provided inside the mask 12.
  • the mask 12 is also equipped with a camera and a light source.
  • the illumination standard of the light source is 10lx ⁇ 30lx.
  • the illumination value of 10lx ⁇ 30lx is low light, which is not irritating to the eyes of the person being broadcast.
  • the light source is set to help the camera to detect people.
  • the cameras are evenly distributed in multiple rows and multiple columns in the mask 12. In this embodiment, five rows and three columns are preferred.
  • the cameras in the mask shoot synchronously, so that the facial expression information of the person in the mask can be extracted.
  • the imaging range between each adjacent camera in the mask 12 has an overlapping area.
  • the cameras and cameras are synchronously photographed.
  • the live broadcast terminal extracts facial information frame by frame from the cameras.
  • each camera in the mask can be coded in a certain order.
  • each camera can be coded in sequence from left to right and top to bottom.
  • each camera may be encoded sequentially in a bottom-up and right-to-left order, and the left-right-up-down order is the left-right, up-down order of the picture in FIG. 5 .
  • each frame of image of each camera at the same time is separated from the video in turn. Then according to the coding order of the camera and using the overlapping area as the reference position, pictures with the same overlapping area are referenced to the overlapping area. Splicing is carried out at the position in the picture, and finally the whole picture is completed, and the spliced pictures of each frame are synthesized into a video, that is, the expression information of the person being broadcasted in the mask is obtained.
  • the full-face mask VR head display is filtered out from the appearance information of the person captured by the camera in the live broadcast room.
  • the actual face information of the person being broadcasted when wearing a VR headset is integrated with the appearance information of the person being broadcast after filtering out the full-face VR headset and the expression information of the person being broadcasted in the same time frame to form a real-time future.
  • the full-face VR headset can completely cover a person's facial area, while the VR headset only covers the eye area.
  • the facial expression extraction of the person being broadcast is not as rich and accurate as the full-face VR headset.
  • the system includes a cloud server 15 and a live broadcast terminal 17.
  • the cloud server 15 is provided with a server cluster 16, and the live broadcast terminal 17 is provided with a terminal processor 19.
  • the live broadcast terminal 17 also includes at least three groups of cameras 18, VR headsets 20, wireless in-ear headphones 23, positioning devices 22 and gyroscopes 21, wherein the server cluster 16 and the terminal processor 19 are connected through TCP/IP communication , the camera 18, VR head display 20, wireless in-ear headphones 23, positioning device 22 and gyroscope 21 are all electrically connected to the terminal processor.
  • the positioning device and the gyroscope are fixed and integrally arranged, and the positioning device and the gyroscope are provided with Velcro, and the positioning device and the gyroscope are fixed on the chest of the person being broadcast through the Velcro.
  • the cloud server 15 is used to generate virtual scenes and receive information transmitted from each live broadcast room.
  • the cloud server 15 receives in real time the human appearance information transmitted by the camera 18, the facial information transmitted by the VR headset 20, and the positioning device 22. Based on the coordinate information and the person's posture information transmitted by the gyroscope 21, the cloud server 16 will synthesize a VR video of real-time actions of a real person in the virtual scene, and send the VR video to the VR headset 20.
  • the camera 18 is preferably an RGBD camera, and a plurality of the RGBD cameras are fixed in each live broadcast room.
  • the VR head display 20 is an ordinary VR head display, a full-face VR head display or a projection virtual reality VR glasses.
  • the positioning device 22 includes a live broadcast room origin positioning device and a wearable positioning device.
  • the origin positioning device is an RTK base station.
  • the wearable positioning device is equipped with an RTK positioning module and a microcontroller.
  • the RTK base station will include an RTK positioning module. , RTK-GPS antenna and data transceiver module.
  • the RTK base station transmits its observation values and measuring point coordinates to the wearable positioning device through the data transceiver module and RTK-GPS antenna.
  • the RTK positioning module in the wearable positioning device receives the observation values and Measuring site coordinates, collecting GPS observation data at the same time, forming differential observations for real-time processing, giving centimeter-level positioning results and uploading them to the server through the terminal processor.
  • the RTK positioning method has been patented with the patent number 2018105750619, and the patent name is " "Methods and devices for automatic location-finding and wireless charging of drones" are disclosed.
  • the gyroscope 21 can obtain the posture of the human body in real time, and the gyroscope is wirelessly connected to the terminal processor.
  • the described live broadcast method can also be carried out by using existing live broadcast methods.
  • the number of live broadcast rooms M is less than or equal to 5, and each person being broadcasted wears a wrist positioning device on the right wrist or left and right wrists; correspondingly, the virtual person's wearer is set to imitate the person being broadcast.
  • Equipment including a chest-worn positioning device corresponding to a positioning device and a wearable right-wrist positioning device corresponding to a right wrist or left or right wrist-worn wrist positioning device or a worn left or right wrist positioning device, a chest-worn positioning device and a wearable right wrist positioning device or a wearable
  • the position coordinates of the left and right wrist positioning devices are transmitted to the cloud server in real time.
  • the cloud service processor of the cloud server compares the three-dimensional information of the positioning device and the chest wearable positioning device in real time, and issues real-time instructions to the wearable device to correct the chest position information; the person being broadcast will be
  • the position coordinates of the wrist positioning device worn on the right wrist or left and right wrists are transmitted to the cloud server.
  • the cloud service processor of the cloud server compares the three-dimensional information between them and sends corrective position information of the right wrist or left and right wrists to the wearable device in real time. command; set up a wearable mouth sound emitting device, send the voice corresponding to the person being broadcasted live on the cloud server to the wearable mouth sound emitting device, and the wearable mouth sound emitting device emits the voice of the person being live broadcasted.
  • the imitator who imitates the person being broadcast wears a wearable device and wears a VR headset.
  • the imitator can imitate the corresponding person being broadcast in the virtual scene. Since the imitator is not being broadcast live, the imitator does not appear in the virtual scene. , the virtual person to be imitated is obscured.
  • the corresponding person to be broadcast can be imitated to shake hands with the real person to be broadcast, and the imitator and the corresponding person to be broadcast can If the imitator has the same body shape and is trained so that the imitator's actions are consistent with those of the person being live-streamed, he or she can interact with the real person being live-streamed in the form of shoulder pats, hugs, etc.
  • the live broadcast needs to filter out imitation people, and the three-dimensional human body images of the people being broadcasted who are in contact with each other in the virtual scene are synthesized and spliced into three-dimensional images that are in contact with each other.
  • the scene of the people being broadcasted touching each other appears in the virtual scene, making the virtual scene richer and at the same time able to Convert virtual scenes into two-dimensional images from different angles for live broadcast, especially suitable for long-distance meetings and other situations.
  • M is limited to less than or equal to 5 here is because there are too many imitators, which will form obstacles and make it impossible to complete the 3D live broadcast. Even if M is less than or equal to 5, it cannot affect the 3D live broadcast. However, if M is greater than 5, the occlusion can be eliminated through filtering and compensation, and the 3D live broadcast can be normal.
  • M is not subject to the restriction of less than or equal to 5.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本发明公开了一种利用虚拟场景进行真实社交的方法、系统及AR眼镜,利用虚拟场景进行真实社交的方法,包括云服务端和直播终端,所述方法包括留个步骤。利用虚拟场景进行真实社交的系统,包括云服务端和直播终端,所述云服务端设置有云服务处理器,所述直播终端设置有终端处理器,所述直播终端还包括有至少三组摄像机、AR眼镜、无线入耳式耳机、定位设备和陀螺仪,其中,所述云服务处理器与终端处理器通过TCP/IP通信连接,所述摄像机、VR头显、无线入耳式耳机、定位设备和陀螺仪均与终端处理器电连接,所述定位设备与陀螺仪固定一体设置,所述定位设备和陀螺仪固定在被直播人胸部。还包括一种AR眼镜。

Description

利用虚拟场景进行真实社交的方法、系统及AR眼镜 技术领域
本发明属于虚拟现实技术领域,具体涉及利用虚拟场景进行真实社交的方法、系统及AR眼镜。
背景技术
 虚拟现实技术即VR包括计算机、电子信息、仿真技术于一体,其基本实现方式是计算机模拟虚拟环境而给人以环境沉浸感,当前,随着VR技术的发展,VR已经被应用到各个技术领域中。
 专利号为“2017103756194”的专利公开了一种基于实时人体三维重建的VR社交系统及其方法,所公开的专利文件中,存在以下缺点:一是,在VR场景中进行三维重建的人体不是真实实况转播的真实人体的形象;二是,所述专利中并没有解决现实中人体位置的变化怎么与虚拟场景相匹配,存在现实中人体移动的方向和速度与虚拟场景中人体移动的方向和速度不一致的问题;三是,由于现实中人体位置变化与虚拟场景不匹配,导致无法通过虚拟场景与其他人准确建立联系,社交效率低。 
发明内容
 为解决上述技术问题,本发明提供利用虚拟场景进行真实社交的方法及系统。
 具体方案如下:
利用虚拟场景进行真实社交的方法,包括云服务端和直播终端,所述方法包括如下步骤:
步骤一:云服务端根据直播终端数量获取参与社交虚拟场景的直播间的数量N,所述直播间为现实中的一个直播场景,N≥2;
步骤二:根据直播间的数量N建立N+1个相同的三维坐标系,其中,每个直播间建立1个三维坐标系,N个直播间构成1至N个三维坐标系,云服务端在虚拟场景建立第N+1个三维坐标系;
步骤三:设定所述N+1个三维坐标系,每个三维坐标系由x轴、y轴和z轴构成,且每个x轴、y轴和z轴的长度单位相同;
定义虚拟场景的地面为每个三维坐标系的x轴和y轴形成的平面,直播间的地面也为三维坐标系的x轴和y轴形成的平面,虚拟场景在第N+1三维坐标系中占据的空间范围表示为K N+1,每个直播间的三维坐标系内对应的空间范围表示为K 1-K N所述K 1-K N内无障碍且K 1-K N内的真实人外形能够进行三维直播;所述障碍指遮挡的程度无法完成直播,在直播过程可以过滤和弥补的障碍就属于无障碍;
步骤四:每个直播间至少存在一个被直播人定义为有人的直播间,定义有人的直播间数量为M个,N≥M≥2;在M个直播间内的每个直播间分别采集每个被直播人的人体位置位置信息和人体外形信息以及被直播人的声音信息和声音位置信息,并同时传输至云服务端,云服务端将每个直播间的每个被直播人的外形信息处理为三维人像信息;
步骤五:替换步骤四,每个直播间至少存在一个被直播人定义为有人的直播间,定义有人的直播间数量为M个,N≥M≥2;在M个直播间内的每个直播间分别采集每个被直播人的人体位置位置信息和人体外形信息以及被直播人的声音信息和声音位置信息并处理为三维人像信息并同时传输至云服务端;
步骤六:云服务端将每个直播间的每个被直播人的人体位置位置信息和三维人像信息信息以及被直播人的声音信息和声音位置信息均实时导入到虚拟场景中形成VR数据流,云服务端将所述VR数据流传送至各个直播终端;每个被直播人在对应直播间均佩戴直播终端的显示部件AR眼镜。此时虚拟场景中集合了所有被直播人佩戴AR眼镜的虚拟形象,每个直播间的被直播人的实体形象与虚拟形象重合,被直播人通过AR眼镜只能够看到其他被直播人的VR虚拟形象,当然云服务端将所述VR数据流传送至各个直播终端时;可以缺少该被直播人的虚拟形象,被直播人通过AR眼镜也只能够看到其他被直播人的虚拟形象。
 三维人像信息的生成包括如下步骤,
S1):在每个直播间内对于每个被直播人均设置至少三台摄像机,至少三台摄像机对直播间内的被直播人进行同步跟踪拍摄;同步跟踪拍摄为不同摄像机拍摄的每一帧的时间相同;
S2):直播终端从同一直播间内不同摄像机拍摄的视频中逐帧进行不同角度的人体抠像,并合成人体立体影像;
S3):直播终端将人体立体影像传送至云服务端,云服务端逐帧识别人体立体影像,记录人体立体影像中含有AR眼镜的视频帧,将相同时间帧的面部影像与人体立体影像融合,形成三维人像信息。
 三维人像信息的生成包括如下步骤,、声音信息和声音位置信息
所述S1):在每个直播间内对于每个被直播人均设置至少三台摄像机,至少三台摄像机对直播间内的被直播人进行同步跟踪拍摄;同步跟踪拍摄为不同摄像机拍摄的每一帧的时间相同;被直播人在直播间内佩戴AR眼镜,所述AR眼镜朝向人脸设置有摄像头,摄像头的时间帧与摄像机的时间帧同步,摄像头对被直播人的面部进行拍摄;主要是拍摄人的面部被AR眼镜遮挡的部位;
所述S2):直播终端将多组摄像头拍摄的视频根据时间帧和编号依次融合,构成面部影像;同时,直播终端从同一直播间内不同摄像机拍摄的视频中逐帧进行不同角度的人体抠像,并合成不含AR眼镜的人体立体影像;
所述S3):直播终端将面部影像和人体立体影像传送至云服务端,云服务端逐帧识别人体立体影像,记录人体立体影像中含有AR眼镜的视频帧,将相同时间帧的面部影像与人体立体影像融合,形成不含AR眼镜的三维人像信息。
 被直播人在直播间内佩戴的是AR眼镜,AR眼镜朝向人脸设置摄像头,所述摄像头位于AR眼镜的镜片周边,每个摄像头均为同步拍摄,每个相邻摄像头之间摄影范围均产生重叠区域并且与摄像机也产生重叠区域,所述摄像头与摄像机为同步拍摄,所述直播终端从所述摄像头中逐帧进行面部信息的提取。
 所述人体位置信息包括位置坐标和姿态,所述直播终端采集每个直播间内每个被直播人的位置坐标和姿态并将采集的位置坐标和姿态传送到云服务端,所述直播终端根据摄像机的拍摄帧的时间顺序对直播间内每个人的坐标和姿态同步实时采集;所述位置坐标通过定位设备采集,姿态数据由通过陀螺仪采集,所述定位设备和陀螺仪固定在被直播人胸部。
 所述声音信息通过直播终端的录音设备采集,所述声音位置信息通过直播终端声源定位设备采集。
 所述系统,包括云服务端和直播终端,所述云服务端设置有云服务处理器,所述直播终端设置有终端处理器,所述直播终端还包括有至少三组摄像机、AR眼镜、无线入耳式耳机、定位设备和陀螺仪,其中,所述云服务处理器与终端处理器通过TCP/IP通信连接,所述摄像机、VR头显、无线入耳式耳机、定位设备和陀螺仪均与终端处理器电连接,所述定位设备与陀螺仪固定一体设置,所述定位设备和陀螺仪固定在被直播人胸部。
 一种AR眼镜,包括眼镜架,及设置在眼睛架上的VR显示装置,VR显示装置包括显示屏及显示屏内侧的凸透镜,凸透镜与显示屏结合在人眼中形成VR图像,所述显示屏为透光显示屏,透光显示屏外侧设置屈光度与凸透镜屈光度相配合的凹透镜,凹透镜用于抵消凸透镜对光线的折射,凹透镜位于凸透镜的焦点之内,凸透镜位于凹透镜的虚焦点之内。使得透光显示屏的透光部分形成现实的透明图像。
 所述透光显示屏为透光单侧显示屏,透光单侧显示屏朝向内侧也就是朝向眼睛一侧显示VR图像。
 所述透光单侧显示屏,包括阵列的发光区,阵列的发光区之间是阵列的透光区。其中发光区单侧发光。
 所述每一个透光区为菲涅尔凹透镜的一部分,所有透光区组成的阵列构成一个菲涅尔凹透镜,菲涅尔凹透镜代替透光显示屏外侧设置屈光度与凸透镜屈光度相配合的凹透镜。此时能够减少眼镜重量并减小眼镜厚度。
 所述凸透镜为菲涅尔凸透镜,所述凹透镜为菲涅尔凹透镜。
 所述透光区的材料为感光变色透光材料,当透光区周边发光区的发光材料发光时,感光变色透光材料的颜色变深,透光度降低;透光区周边的发光材料不发光时,感光变色透光区的透光度高。
 所述凸透镜与显示屏之间设置距离调节装置,用以调节凸透镜到显示屏之间的距离。
 本发明公开了一种利用虚拟场景进行真实社交的方法及系统,通过建立统一坐标系,将不同地方的直播间设定的坐标系和虚拟场景中所建的坐标系定义为相同方向的三维坐标系,为在虚拟场景中实现真实社交提供统基础;而后将不同直播间的被直播人的外形信息和被直播人在直播间所对应的三维坐标位置信息进行提取,将提取的被直播人的外形信息处理为真实人的三维形象,根据每个人在直播间的三维坐标位置信息,将被直播人的三维形象置入虚拟场景中,被直播人在虚拟场景中的坐标与人在直播间的三维坐标位置相同;解决了现实中被直播人位置的变化与虚拟场景匹配一致的问题,使得现实中每个在直播间直播人的在地面的位置及移动的方向和速度与虚拟场景中该直播人在地面的位置及移动的方向和速度一致,在虚拟场景中地面移动就像在直播间地面移动一样,在虚拟场景中绕开虚拟物体时,在直播间里并没有改真实物体,当然虚拟场景中地面的阶梯必须在直播间对应真实存在,预防踩空;例如直播人甲在直播间直播的同时,通过VR设备看到虚拟场景中其他直播人的真实直播的虚拟形象,想和直播人乙交流,就可以在虚拟场景中向直播人打招呼,这个打招呼的过程被直播到虚拟场景中,直播人乙在虚拟场景中发现直播人甲向他打招呼,就回应直播人甲,直播人甲和直播人乙就相互迎上前去与对方的真实直播的虚拟形象进行交流,只要不是二者接触的交流都能够完成,例如对话,手势,表情等等,在有利于被直播人与虚拟场景中的其他人之间准确建立联系,提高社交效率。
 此外,本发明在虚拟场景中所展示的为实况转播的真实人的形象,具有更好的沉浸感和交互性,体验效果好。
附图说明
 图1是VR头显结构示意图。
 图2是VR头显中限位板位置结构示意图。
 图3是VR头显内部摄像头分布的结构示意图。
 图4是全面罩VR头显结构示意图。
 图5是全面罩VR头显内部摄像头分布的结构示意图。
 图6是实现虚拟场景进行真实社交的系统结构示意图。
 图7是本发明AR眼镜原理图。
 图8是实施例之一的AR眼镜结构示意图。
 图9是实施例之一的AR眼镜镜片部分剖面结构示意图。
实施方式
 一种AR眼镜,如图8包括眼镜架105,及设置在眼镜架105上的VR显示装置,其原理如图7,显示屏121及显示屏121内侧的凸透镜122构成VR显示装置,在人眼123中形成VR图像,所述显示屏121为透光显示屏1211,透光显示屏1211外侧设置屈光度与凸透镜122屈光度相配合的凹透镜124,凹透镜124用于抵消凸透镜对光线的折射,凹透镜124位于凸透镜121的焦点之内,凸透镜121位于凹透镜124的虚焦点之内。使得透光显示屏的透光部分形成现实的透明图像。
 所述透光显示屏为透光单侧显示屏,透光单侧显示屏朝向内侧也就是朝向眼睛一侧显示VR图像。
 如图8图9,所述透光单侧显示屏,包括阵列的发光区102,阵列的发光区102之间是阵列的透光区103,阵列的发光区102固定在透明底板104上,此时透明底板104对应透光区103的部位也是透光区,所以透光区103就可以是空穴,当然也可以填充透明材料。其中发光区单侧发光。
 所述每一个透光区103不是空穴,每一个透光区103为菲涅尔凹透镜的一部分,所有透光区组成的阵列构成一个透明度减弱的菲涅尔凹透镜,该透明度减弱的菲涅尔凹透镜代替透光显示屏外侧设置屈光度与凸透镜101屈光度相配合的凹透镜106。由于发光区102遮挡,所有透光区组成的阵列构成一个菲涅尔凹透镜只能透过一半的光线,透明度就减少一半。
 所述每一个透光区103是空穴,透明底板104为完整的菲涅尔凹透镜,发光区102不透光就遮挡对应菲涅尔凹透镜部分的光线透过,每一个空穴的透光区103对应的透明底板104为菲涅尔凹透镜的一部分,该部分的光线能够透过,透明底板104透过光线的部分构成一个透明度减弱的菲涅尔凹透镜,该透明度减弱的菲涅尔凹透镜代替透光显示屏外侧设置屈光度与凸透镜101屈光度相配合的凹透镜106。此时所述凸透镜101可以为菲涅尔凸透镜,作为透明底板104不是菲涅尔凹透镜的实施例,那么所述凹透镜为菲涅尔凹透镜。这些实施例能够减少眼镜重量并减小眼镜厚度。
 所述透光区的材料为感光变色透光材料,当透光区周边的发光区102发光材料发光时,感光变色透光材料的颜色变深,透光度降低,使得发光区102现实的VR图像更清晰;透光区周边的发光材料不发光时,感光变色透光区的透光度高,使得透光区的透过的真实景象也清晰,使得虚拟与现实的结合更完美。
 所述凸透镜101与眼镜架105之间设置螺纹107,由于透光单侧显示屏固定在眼镜架105上,所以调节凸透镜101与眼镜架105之间距离,就能够调节凸透镜到显示屏之间的距离。佩戴时先调节凹透镜到显示屏的距离,使得VR图像清晰之后,再调节凹透镜到显示屏的距离,使得透光区的光线经过凹透镜和凸透镜之后,能够清晰的显示现实的场景。
 眼镜架105的眼镜腿上可以设置朝向人眼摄像头,用于合成完整面部三维图形,但是眼镜架为高强钛合金细眼镜架时,例如高强钛合金细丝的桁架结构眼镜架,由于高强钛合金细丝能够遮挡的部分很少,而镜片的凸透镜101离人眼10-25mm,不同角度同时拍摄,镜片部分能够遮挡的部分也很少,就容易将高强钛合金细丝的桁架结构眼镜架及镜片抠出,形成完整的面部形状。
 下面将结合本发明中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本发明一部分实施,而不是全部的实施,基于本发明的实施例,本领域普通技术人员在没有做、出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
 所述的利用虚拟场景进行真实社交的方法,是将参与真实社交的不同地点处的直播间内被直播人的形象进行提取并将提取的被直播人的形象处理为真实人的三维形象也即是3D真人形象,并将处理后的三维形象或3D真人形象根据每个人的实时位置坐标放入虚拟场景中相应的位置,生成可以实时交互的虚拟场景,以此实现地理位置不同的人可以进行沉浸式的真实社交,在进行真实社交之前,不同地点的人均在相同的时间段内进入同一虚拟场景中,以实现在虚拟场景中进行真实社交。
 利用虚拟场景进行真实社交的方法,包括如下步骤:
步骤一:云服务端根据直播终端数量获取参与社交虚拟场景的直播间的数量N,所述直播间为现实中的一个直播场景,N≥2;,对于所述社交方法,至少需要两个直播场景,社交才有意义,所述直播场景内设置有直播终端和直播区域,直播终端可以提取在直播区域内的被直播人的外形信息和位置信息,被直播人在直播区域内通过VR头显可以实时与社交虚拟场景中显示的其它直播间的内的被直播人的形象进行沟通和交互。
 步骤二:根据直播间的数量N建立N+1个三维坐标系,其中,每个直播间建立1个三维坐标系,N个直播间构成1至N个三维坐标系,云服务端在虚拟场景建立第N+1个三维坐标系;
步骤三:设定所述N+1个三维坐标系,每个三维坐标系的x轴的方向均为正东方向,y轴的方向均为正北方向,z轴的方向均为海平面向上的方向,且每个x轴、y轴和z轴的长度单位相同;这样设定,是为了不打乱直播人的方向感;当然只要N+1个三维坐标系是相同的坐标系,仅仅是原点位置不同,就需要进行坐标转换。
 定义虚拟场景的地面为海平面,直播间的地面也假设为海平面,虚拟场景在第N+1三维坐标系中占据的空间范围表示为K N+1,每个直播间的三维坐标系内对应的空间范围表示为K 1-K N, 所内K 1-K N,内无障碍且K 1-K N,内的真实人外形能够进行三维直播;所述障碍指遮挡的程度无法完成直播,在直播过程可以过滤和弥补的障碍就属于无障碍;
上述设定使得参与社交虚拟场景的直播间所对应的三维坐标系与虚拟场景中所建立的三维坐标系,具有统一的坐标系定义,即三维坐标系的x、y和z三轴的方向均相同,将直播间内人的位置坐标实时映射到虚拟场景中,使得虚拟场景中真实三维形象与直播间内人具有一致的位置坐标。
 步骤四:云服务端定义虚拟场景中第N+1个三维坐标系的原点位置信息,云服务器端将虚拟场景中第N+1个三维坐标系的原点位置信息分别传输至不同的直播终端,根据直播终端获取的原点位置信息确定第1至N个直播间的三维坐标系的原点信息;
步骤五:每个直播间至少存在一个被直播人定义为有人的直播间,定义有人的直播间数量为M个,M≥2;有人的直播间数量至少为两个,这样直播社交才有意思,若是M=1的话,仅有一个直播间,并无社交的对象,失去了社交的意义,在M个直播间内的直播终端分别采集每个被直播人的位置信息和外形信息并同时传输至云服务端,云服务端将每个直播间的每个被直播人的外形信息处理为真实的三维人像信息;
步骤六:云服务端将每个直播间的每个被直播人的三维人像信息和直播间内每被直播个人的位置信息均实时导入到虚拟场景中形成VR数据流,云服务端将所述VR数据流传送至各个直播终端。
 步骤五中,在有人的直播间内,直播终端分别采集每个被直播人的位置信息和外形信息,并将每个被直播人的外形信息传输到云服务端,云服务端将人的外形信息处理为真实的三维人像信息,将三维人像信息根据三维人像信息所对应的位置信息导入虚拟场景中。
 所述有人的直播间若被直播人的人数为两个以上时,所述直播终端会针对每一个被直播人,分别采集每个被直播人的位置信息和外形信息,并将每个被直播人的外形信息处理为真实的三维人像信息;将三维人像信息根据三维人像所对应的位置信息导入虚拟场景中,两个以上的背直播人分别进行采集,避免在同一视频中同时出现两个人,对后续的提取和分离造成影响。
 所述有人的直播间内若仅仅仅有一个被直播人,则在每个直播间分别采集被直播人的位置信息和外形信息,并将每个人的外形信息处理为真实的三维人像信息;将三维人像信息根据三维人像所对应的位置信息导入虚拟场景中。
 所述的利用虚拟场景进行真实社交的方法,步骤五中,在有人的直播间内,直播终端分别采集每个被直播人的声音信息和声音位置信息,并将每个被直播人的声音信息和声音位置信息传输到云服务端,云服务端将人的声音信息和声音位置信息处理为真实的三维声音信息和声音位置信息,将三维声音信息和声音位置信息根据三维声音信息和声音位置信息所对应的位置信息导入虚拟场景中。
 所述位置信息包括位置坐标和姿态,所述直播终端采集每个直播间内每个人的位置坐标和姿态并将采集的位置坐标和姿态传送到云服务端,所述直播终端根据摄像机的拍摄帧的时间顺序对直播间内每个人的坐标和姿态同步实时采集。
 真实三维人像信息的生成包括如下步骤,
S1):在每个直播间内对于每个被直播人均设置至少三台摄像机,至少三台摄像机对直播间内的被直播人进行同步跟踪拍摄;同步跟踪拍摄为不同摄像机拍摄的每一帧的时间相同;所述摄像机优选为RGBD摄像机,所述直播间对优选为设置有绿棚的直播间,绿棚直播间有利于后续抠像;     
S2):被直播人在直播间内佩戴VR头显,所述VR头显内设置有多组摄像头,所述多组摄像头依次编号,设置多组摄像头的时间帧与摄像机的时间帧同步,多组摄像头对被直播人的面部进行拍摄;所述拍摄包括确定声源定位的拍摄,
S3):直播终端将多组摄像头拍摄的视频根据时间帧和编号依次融合,构成面部影像;
同时,直播终端从同一直播间内不同摄像机拍摄的视频中逐帧进行不同角度的人体抠像,并合成人体立体影像;
S4):直播终端将面部影像和人体立体影像传送至云服务端,云服务端逐帧识别人体立体影像,记录人体立体影像中含有VR头显的视频帧,将相同时间帧的面部影像与人体立体影像融合,形成不含VR头显的三维人像信息。
 所述拍摄的视频中还包括直播间内被直播人的声音信息,所述声音信息包括声音内容、声源位置和发声方向,所述云端服务器从拍摄的视频中提取声音内容、声音位置和发声方向,所述云端服务器根据声音位置和发声方向将声音向虚拟场景中的不同的直播人发送,以使得各直播间内直播人听到不同位置不同方向及不同强度的声音信息。必要时,直播人头发中隐藏陀螺仪,以确定声音的方向,省却进行声音方向确定的软硬件,进而根据直播人身高,确定声源高度,比较经济。
 所述VR头显包括普通VR头显、全面罩VR头显和投射式虚拟现实VR眼镜。
 所述投射式虚拟现实VR眼镜为现有技术,专利号“2017215701499”,专利名称“一种投射式虚拟现实VR眼镜显示装置”已公开了一种投射式虚拟现实VR眼镜。
 被直播人在场景内佩戴的是投射式虚拟现实VR眼镜,VR眼镜框上每个镜筒周边至少设置左右两个摄像头,所述每个相邻摄像头之间的摄像范围均有重叠区域,所述摄像头与摄像机为同步拍摄,所述直播终端从所述摄像头中逐帧进行面部信息的提取。
 由于是透射式显示,遮挡部位仅仅在眼部,所以遮挡部位小,最小仅仅2-5平方厘米,而且眼部与镜框之间存在距离,眼部没有被完全遮蔽,完全能够拍摄眼部彩色图像,佩戴这样的VR眼镜的直播人,能够看到直播间地面,不但能够消除踩空的恐惧心理,使得人们更容易接受这种社交方式,此外还能够实施反馈直播间地面与虚拟场景地面是否重合,如果没有重合,能够实施反馈直播间地面与虚拟场景地面的误差范围的信息,并将该信息传输到云服务端,云服务端能够实施纠正直播间地面与虚拟场景地面的误差。也可以在每个直播间相同坐标的地面设置标记,虚拟场景相同坐标的地面也设置标记,例如发光点,实时监测每个直播间与虚拟场景中标记的重合情况,将该信息传输到云服务端,云服务端能够实施纠正直播间地面与虚拟场景地面的误差。
 如图1至图3所示,被直播人在直播间内佩戴的是VR头显,所述VR头显包括VR显示屏1、镜头固定板2、托架3、支撑架4和佩戴松紧带,所述支撑架4上的一侧上设置有VR显示屏1,与VR显示屏1相对面的支撑架4上设置有镜头固定板2,所述固定板2上对称设置有VR镜头5,所述镜头固定板2两侧边边缘4上固定设置有托架3,所述托架3对称设置,与所述托架3相邻的且位于镜头固定板2的侧边边缘上设置有限位板6,所述限位板6用于限定人眼到VR镜头5的距离,使得人眼与VR镜头5之间有一定的观看距离,所述托架3上还设置有佩戴松紧带,所述佩戴松紧带与托架3配合使得VR头显能够被顺利穿戴至人的头部,所述VR头显内部设置摄像头,所述摄像头设置在所述固定板2上,所述摄像头包括上部摄像头组7、中部摄像头组8和下部摄像头组9,所述上部摄像头组7、中部摄像头组8和下部摄像头组9均匀分布,每个摄像头均为同步拍摄,每个相邻两摄像头之间摄影范围均有重叠区域,所述摄像头与摄像机为同步拍摄,所述直播终端从所述摄像头中逐帧进行面部信息的提取。所述上部摄像头组7、中部摄像头组8和下部摄像头组9可以将在VR头显内的被直播人眼部表情全部拍摄。
 所述被直播人眼内部区域内的表情由上部摄像头组7、中部摄像头组8和下部摄像头组9中的每个摄像头所拍摄的视频拼接而成。由于每个相邻摄像头之间的摄像范围有一定的重叠区域,采用将固定板2上的摄像头按一定顺序进行编码,如按从左到右和从上到下的顺序对每个摄像头进行依次编码,或者采用从下至上和从右至左的顺序对每个摄像头进行依次编码,所述左右上下的顺序为所述图3中图片的左右上下顺序。
 根据每个摄像头的编码顺序,依次将每个摄像头同一时间内的每帧图像从视频中分离出来,然后根据摄像头的编码顺序并以重叠区域为参考位置,将有相同重叠区域的图片参照重叠区域在图片中的位置进行拼接,最终完成整个图片的拼接,并将每帧拼接完成的图片合成为视频,即获取VR头显内的眼部表情信息。
 本实施例中优选为从上至下,从左到右的编码顺序,在图3上部摄像头组7中左边第一个摄像头10编码为A,左边第二个摄像头11编码为B,剩余摄像头按照位置顺序采用英文字母表顺序依次进行编码;在进行拼接时,在同一时间帧内首先提取编码为A的摄像头拍摄的第一图片信息PA,再提取编码为B的摄像头拍摄的第二图片信息PB,比较第一图片信息PA和第二图片信息PB的重叠区域在图片中的位置,然后将第二图片信息PB的重叠区域覆盖第一图片信息PA的重叠区域,以完成两图片的拼接,其余摄像头拍摄的图片信息,依次进行拼接,以进行同一时间帧内所有摄像头的图片拼接,形成被直播人眼表情图片。
 所述VR头显内的摄像头与直播室内的摄像机也为同步拍摄,根据VR头显的外形特征,从直播室内的摄像机拍摄的被直播人的外形信息中滤除VR头显,参照被直播人在未佩戴VR头显时实际的人脸信息,将滤除VR头显后的被直播人的外形信息与同一时间帧内的人眼表情图片进行融合,形成实时的未佩戴VR头显时的被直播人的三维形象,所述VR头显的滤除的方法为将在摄像机拍摄的被直播人的外形信息中每一帧图片做灰度变换,确定灰度图中包含有VR头显的区域,在每帧灰度图上确定一个定点坐标,采用Matlab中的imcrop函数对图像进行切割,以完成VR头显的滤除。
 如图4至5所示,所述的人在场景内佩戴的是全面罩VR头显,所述全面罩VR头显包括VR显示屏1、支撑架4和面罩12,所述支撑架4的一侧设置有VR显示屏1,与所述VR显示屏1相对面的支撑架4上设置有面罩12,所述面罩12可以覆盖人的整个面部,所述面罩12内设置有VR镜头,所述面罩12内还设置有摄像头和光源,所述光源照度标准为10lx~30lx,10lx~30lx的照度值为微光,对被直播人的人眼无刺激,设置的光源有助于摄像头对人面部表情的拍摄,所述摄像头在面罩12内以多行多列均匀分布,本实施例中优选为五行三列,面罩内的摄像头同步拍摄,可将面罩内人的面部表情信息进行提取,所述面罩12内的每个相邻摄像头之间的摄像范围均有重叠区域,所述摄像头与摄像机为同步拍摄,所述直播终端从所述摄像头中逐帧进行面部信息的提取。
 由于每个相邻摄像头之间的摄像范围有一定的重叠区域,可以将面罩内的摄像头按一定顺序进行编码,如按从左到右和从上到下的顺序对每个摄像头进行依次编码,或者采用从下至上和从右至左的顺序对每个摄像头进行依次编码,所述左右上下的顺序为所述图5中图片的左右上下顺序。
 根据每个摄像头的编码顺序,依次将每个摄像头同一时间内的每帧图像从视频中分离出来,然后根据摄像头的编码顺序并以重叠区域为参考位置,将有相同重叠区域的图片参照重叠区域在图片中的位置进行拼接,最终完成整个图片的拼接,并将每帧拼接完成的图片合成为视频,即获取面罩内的被直播人的表情信息。
 所述面罩12的摄像头与直播室内的摄像机也为同步拍摄,根据VR头显的外形特征,从直播室内的摄像机拍摄的人的外形信息中滤除全面罩VR头显,参照人在未佩戴全面罩VR头显时实际的被直播人的人脸信息,将滤除全面罩VR头显后的被直播人的外形信息与同一时间帧内的被直播人的表情信息进行融合,形成实时的未佩戴全面罩VR头显时的被直播人的三维形象。
 所述全面罩VR头显可将人的面部区域完全覆盖,所述VR头显只是将眼部区域进行覆盖,提取被直播人的面部表情不如全面罩VR头显丰富和精准。
 一种利用虚拟场景进行真实社交的系统,所述系统,包括云服务端15和直播终端17,所述云服务端15设置有服务器集群16,所述直播终端17设置有终端处理器19,所述直播终端17还包括有至少三组摄像机18、VR头显20、无线入耳式耳机23、定位设备22和陀螺仪21,其中,所述服务器集群16与终端处理器19通过TCP/IP通信连接,所述摄像机18、VR头显20、无线入耳式耳机23、定位设备22和陀螺仪21均与终端处理器电连接。所述定位设备与陀螺仪固定一体设置,所述定位设备和陀螺仪上设置有魔术贴,所述定位设备和陀螺仪通过魔术贴固定在被直播人胸部。
 所述云服务端15用于生成虚拟场景和接收每个直播间传输的信息,所述云服务端15实时接收摄像机18传送的人外形信息、VR头显20传送的面部信息、定位设备22传送的坐标信息和陀螺仪21传送的人的姿态信息,所述云服务端16将合成真实人在虚拟场景中的实时动作的VR视频,并将所述VR视频发送至VR头显20上。
 所述摄像机18优选RGBD摄像机,且所述RGBD摄像机为多个固定在每个直播间内,所述VR头显20为普通VR头显、全面罩VR头显或投射式虚拟现实VR眼镜。
 所述定位设备22包括直播间原点定位设备和可穿戴定位设备,所述原点定位设备为RTK基站,所述可穿戴定位设备内设置有RTK定位模块和单片机,所述RTK基站将包括RTK定位模块、RTK-GPS天线和数据收发模块,RTK基站通过数据收发模块和RTK—GPS天线将其观测值和测站点坐标一起传送给可穿戴定位设备,可穿戴定位设备中的RTK定位模块接收观测值和测站点坐标,同时采集GPS观测数据,组成差分观测值进行实时处理,给出厘米级定位结果并通过终端处理器上传至服务器集中,所述RTK定位方式已被专利号为2018105750619,专利名称为“无人机自动寻位无线充电的方法及装置”公开。
 所述陀螺仪21可以实时获取人体的姿态,所述陀螺仪与终端处理器无线通信连接。
 所述的直播方式也可以利用现有的直播方式进行直播。
 所述的利用虚拟场景进行真实社交的系统,所述有人直播间数量M小于等于5,每个被直播人的右手腕或者左右手腕佩戴手腕定位设备;对应设置模仿被直播人的虚拟人的穿戴设备,包括与定位设备对应的胸部穿戴定位设备及与右手腕或者左右手腕佩戴手腕定位设备对应的穿戴右手腕定位设备或者穿戴左右手腕定位设备,将胸部穿戴定位设备及穿戴右手腕定位设备或者穿戴左右手腕定位设备的位置坐标实时传送到云服务端,云服务端的云服务处理器实时比较定位设备和胸部穿戴定位设备的三维信息,向穿戴设备实时发出纠正胸部位置信息的指令;将被直播人的右手腕或者左右手腕佩戴手腕定位设备的位置坐标传送到云服务端,云服务端的云服务处理器对应比较他们之间的三维信息,向穿戴设备实时发出纠正穿戴右手腕或者穿戴左右手腕位置信息的指令;设置穿戴嘴部发声装置,将云服务端的对应被直播人的声音发送至穿戴嘴部发声装置,由穿戴嘴部发声装置发出被直播人的声音。这样模仿被直播人的模仿者穿上穿戴设备并佩戴VR头显,模仿者就能够模仿虚拟场景中对应的被直播人,由于模仿者并不被直播,所以模仿者在虚拟场景中并不出现,的虚拟被模仿人遮蔽,只要模仿者的手的位置与虚拟被模仿人的对应虚拟手的位置重合,就能够模仿对应被直播人与现实被直播真人进行握手,模仿者与对应被直播人身材体型相同,模仿者经过训练,模仿者的动作与对应被直播人的动作一致,就能够与现实被直播真人进行拍肩、拥抱等相互接触形式的互动,此时的直播,需要滤掉模仿人,并且将虚拟场景中相互接触的被直播人的人体立体影像合成拼接为相互接触的立体影像,这样在虚拟场景中就出现被直播人相互接触的场景,使得虚拟场景更加丰富,同时也能够将虚拟场景从不同的角度转化为二维图像进行直播,尤其适合远距离之间的会见等情形。这里限定M小于等于5,是担心模仿者多了,形成障碍,无法完成三维直播,即使M小于等于5,也不能影响三维直播,但是如果M大于5也能够通过过滤和弥补消除遮挡, 能够正常进行三维直播,那么M就不受小于等于5的限制。
 本发明方案所公开的技术手段不仅限于上述实施方式所公开的技术手段,还包括由以上技术特征任意组合所组成的技术方案。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也视为本发明的保护范围。

Claims (14)

  1. 利用虚拟场景进行真实社交的方法,包括云服务端和直播终端,所述方法包括如下步骤:
    步骤一:云服务端根据直播终端数量获取参与社交虚拟场景的直播间的数量N,所述直播间为现实中的一个直播场景,N≥2;
    步骤二:根据直播间的数量N建立N+1个相同的三维坐标系,其中,每个直播间建立1个三维坐标系,N个直播间构成1至N个三维坐标系,云服务端在虚拟场景建立第N+1个三维坐标系;
    步骤三:设定所述N+1个三维坐标系,每个三维坐标系由x轴、y轴和z轴构成,且每个x轴、y轴和z轴的长度单位相同;
    定义虚拟场景的地面为每个三维坐标系的x轴和y轴形成的平面,直播间的地面也为三维坐标系的x轴和y轴形成的平面,虚拟场景在第N+1三维坐标系中占据的空间范围表示为K N+1,每个直播间的三维坐标系内对应的空间范围表示为K 1-K N所述K 1-K N内无障碍且K 1-K N内的真实人外形能够进行三维直播;所述障碍指遮挡的程度无法完成直播,在直播过程可以过滤和弥补的障碍就属于无障碍;
    步骤四:每个直播间至少存在一个被直播人定义为有人的直播间,定义有人的直播间数量为M个,N≥M≥2;在M个直播间内的每个直播间分别采集每个被直播人的人体位置位置信息和人体外形信息以及被直播人的声音信息和声音位置信息,并同时传输至云服务端,云服务端将每个直播间的每个被直播人的外形信息处理为三维人像信息;
    步骤五:替换步骤四,每个直播间至少存在一个被直播人定义为有人的直播间,定义有人的直播间数量为M个,N≥M≥2;在M个直播间内的每个直播间分别采集每个被直播人的人体位置位置信息和人体外形信息以及被直播人的声音信息和声音位置信息并处理为三维人像信息并同时传输至云服务端;
    步骤六:云服务端将每个直播间的每个被直播人的人体位置位置信息和三维人像信息信息以及被直播人的声音信息和声音位置信息均实时导入到虚拟场景中形成VR数据流,云服务端将所述VR数据流传送至各个直播终端;每个被直播人在对应直播间均佩戴直播终端的显示部件AR眼镜;此时虚拟场景中集合了所有被直播人佩戴AR眼镜的虚拟形象,每个直播间的被直播人的实体形象与虚拟形象重合,被直播人通过AR眼镜只能够看到其他被直播人的VR虚拟形象,当然云服务端将所述VR数据流传送至各个直播终端时;可以缺少该被直播人的虚拟形象,被直播人通过AR眼镜也只能够看到其他被直播人的虚拟形象。
  2.  根据权利要求1所述的利用虚拟场景进行真实社交的方法,其特征在于:三维人像信息的生成包括如下步骤,
    S1):在每个直播间内对于每个被直播人均设置至少三台摄像机,至少三台摄像机对直播间内的被直播人进行同步跟踪拍摄;同步跟踪拍摄为不同摄像机拍摄的每一帧的时间相同;
    S2):直播终端从同一直播间内不同摄像机拍摄的视频中逐帧进行不同角度的人体抠像,并合成人体立体影像;
    S3):直播终端将人体立体影像传送至云服务端,云服务端逐帧识别人体立体影像,记录人体立体影像中含有AR眼镜的视频帧,将相同时间帧的面部影像与人体立体影像融合,形成三维人像信息。
  3.  根据权利要求2所述的利用虚拟场景进行真实社交的方法,其特征在于:三维人像信息的生成包括如下步骤,、声音信息和声音位置信息
    所述S1):在每个直播间内对于每个被直播人均设置至少三台摄像机,至少三台摄像机对直播间内的被直播人进行同步跟踪拍摄;同步跟踪拍摄为不同摄像机拍摄的每一帧的时间相同;被直播人在直播间内佩戴AR眼镜,所述AR眼镜朝向人脸设置有摄像头,摄像头的时间帧与摄像机的时间帧同步,摄像头对被直播人的面部进行拍摄;主要是拍摄人的面部被AR眼镜遮挡的部位;
    所述S2):直播终端将多组摄像头拍摄的视频根据时间帧和编号依次融合,构成面部影像;同时,直播终端从同一直播间内不同摄像机拍摄的视频中逐帧进行不同角度的人体抠像,并合成不含AR眼镜的人体立体影像;
    所述S3):直播终端将面部影像和人体立体影像传送至云服务端,云服务端逐帧识别人体立体影像,记录人体立体影像中含有AR眼镜的视频帧,将相同时间帧的面部影像与人体立体影像融合,形成不含AR眼镜的三维人像信息。
  4.  根据权利要求3所述的利用虚拟场景进行真实社交的方法,其特征在于:被直播人在直播间内佩戴的是AR眼镜,AR眼镜朝向人脸设置摄像头,所述摄像头位于AR眼镜的镜片周边,每个摄像头均为同步拍摄,每个相邻摄像头之间摄影范围均产生重叠区域并且与摄像机也产生重叠区域,所述摄像头与摄像机为同步拍摄,所述直播终端从所述摄像头中逐帧进行面部信息的提取。
  5.  根据权利要求1所述的利用虚拟场景进行真实社交的方法,其特征在于:所述人体位置信息包括位置坐标和姿态,所述直播终端采集每个直播间内每个被直播人的位置坐标和姿态并将采集的位置坐标和姿态传送到云服务端,所述直播终端根据摄像机的拍摄帧的时间顺序对直播间内每个人的坐标和姿态同步实时采集;所述位置坐标通过定位设备采集,姿态数据由通过陀螺仪采集,所述定位设备和陀螺仪固定在被直播人胸部。
  6.  根据权利要求1所述的利用虚拟场景进行真实社交的方法,其特征在于:所述声音信息通过直播终端的录音设备采集,所述声音位置信息通过直播终端声源定位设备采集。
  7.  一种如权利要求1-6之一所述的利用虚拟场景进行真实社交的系统,其特征在于:所述系统,包括云服务端和直播终端,所述云服务端设置有云服务处理器,所述直播终端设置有终端处理器,所述直播终端还包括有至少三组摄像机、AR眼镜、无线入耳式耳机、定位设备和陀螺仪,其中,所述云服务处理器与终端处理器通过TCP/IP通信连接,所述摄像机、VR头显、无线入耳式耳机、定位设备和陀螺仪均与终端处理器电连接,所述定位设备与陀螺仪固定一体设置,所述定位设备和陀螺仪固定在被直播人胸部。
  8.  一种AR眼镜,包括眼镜架,及设置在眼睛架上的VR显示装置,VR显示装置包括显示屏及显示屏内侧的凸透镜,凸透镜与显示屏结合在人眼中形成VR图像,其特征在于,所述显示屏为透光显示屏,透光显示屏外侧设置屈光度与凸透镜屈光度相配合的凹透镜,凹透镜用于抵消凸透镜对光线的折射,凹透镜位于凸透镜的焦点之内,凸透镜位于凹透镜的虚焦点之内;使得透光显示屏的透光部分形成现实的透明图像。
  9.  根据权利要求8所述的一种AR眼镜,其特征在于:所述透光显示屏为透光单侧显示屏,透光单侧显示屏朝向内侧也就是朝向眼睛一侧显示VR图像。
  10.  根据权利要求9所述的一种AR眼镜,其特征在于:所述透光单侧显示屏,包括阵列的发光区,阵列的发光区之间是阵列的透光区,其中发光区单侧发光。
  11.  根据权利要求8所述的一种AR眼镜,其特征在于:所述每一个透光区为菲涅尔凹透镜的一部分,所有透光区组成的阵列构成一个菲涅尔凹透镜,菲涅尔凹透镜代替透光显示屏外侧设置屈光度与凸透镜屈光度相配合的凹透镜,此时能够减少眼镜重量并减小眼镜厚度。
  12.  根据权利要求8所述的一种AR眼镜,其特征在于:所述凸透镜为菲涅尔凸透镜,所述凹透镜为菲涅尔凹透镜。
  13.  根据权利要求10或者11所述的一种AR眼镜,其特征在于:所述透光区的材料为感光变色透光材料,当透光区周边发光区的发光材料发光时,感光变色透光材料的颜色变深,透光度降低;透光区周边的发光材料不发光时,感光变色透光区的透光度高。
  14.  根据权利要求13所述的一种AR眼镜,其特征在于:所述凸透镜与显示屏之间设置距离调节装置,用以调节凸透镜到显示屏之间的距离。
PCT/CN2023/082004 2022-03-18 2023-03-16 利用虚拟场景进行真实社交的方法、系统及ar眼镜 WO2023174385A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210271032.X 2022-03-18
CN202210271032.XA CN117994472A (zh) 2022-03-18 2022-03-18 利用虚拟场景进行真实社交的方法、系统及ar眼镜

Publications (1)

Publication Number Publication Date
WO2023174385A1 true WO2023174385A1 (zh) 2023-09-21

Family

ID=88022394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082004 WO2023174385A1 (zh) 2022-03-18 2023-03-16 利用虚拟场景进行真实社交的方法、系统及ar眼镜

Country Status (2)

Country Link
CN (1) CN117994472A (zh)
WO (1) WO2023174385A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109976690A (zh) * 2019-03-27 2019-07-05 优奈柯恩(北京)科技有限公司 Ar眼镜远程交互方法、装置和计算机可读介质
CN209746278U (zh) * 2019-06-17 2019-12-06 杭州光粒科技有限公司 一种透射型ar眼镜装置
CN113709515A (zh) * 2021-09-06 2021-11-26 广州麦田信息技术有限公司 一种新媒体直播和用户线上互动方法
CN113822970A (zh) * 2021-09-23 2021-12-21 广州博冠信息科技有限公司 直播控制方法、装置、存储介质与电子设备
CN114025219A (zh) * 2021-11-01 2022-02-08 广州博冠信息科技有限公司 增强现实特效的渲染方法、装置、介质及设备
CN114143568A (zh) * 2021-11-15 2022-03-04 上海盛付通电子支付服务有限公司 一种用于确定增强现实直播图像的方法与设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109976690A (zh) * 2019-03-27 2019-07-05 优奈柯恩(北京)科技有限公司 Ar眼镜远程交互方法、装置和计算机可读介质
CN209746278U (zh) * 2019-06-17 2019-12-06 杭州光粒科技有限公司 一种透射型ar眼镜装置
CN113709515A (zh) * 2021-09-06 2021-11-26 广州麦田信息技术有限公司 一种新媒体直播和用户线上互动方法
CN113822970A (zh) * 2021-09-23 2021-12-21 广州博冠信息科技有限公司 直播控制方法、装置、存储介质与电子设备
CN114025219A (zh) * 2021-11-01 2022-02-08 广州博冠信息科技有限公司 增强现实特效的渲染方法、装置、介质及设备
CN114143568A (zh) * 2021-11-15 2022-03-04 上海盛付通电子支付服务有限公司 一种用于确定增强现实直播图像的方法与设备

Also Published As

Publication number Publication date
CN117994472A (zh) 2024-05-07

Similar Documents

Publication Publication Date Title
US20150358539A1 (en) Mobile Virtual Reality Camera, Method, And System
JP2019516261A (ja) インサイドアウト方式のポジション、ユーザボディ、及び環境トラッキングを伴うバーチャルリアリティ及びミックスドリアリティ用ヘッドマウントディスプレイ
CN106210703A (zh) Vr环境中特写镜头的运用及显示方法和系统
US20160269685A1 (en) Video interaction between physical locations
RU2743518C2 (ru) Восприятия многослойных дополненных развлечений
EP3779559A1 (en) Head-mounted display, and display screen, head-mounted bracket and video thereof
US20160344999A1 (en) SYSTEMS AND METHODs FOR PRODUCING PANORAMIC AND STEREOSCOPIC VIDEOS
JPWO2017094543A1 (ja) 情報処理装置、情報処理システム、情報処理装置の制御方法、及び、パラメーターの設定方法
CN106444023A (zh) 一种超大视场角的双目立体显示的透射式增强现实系统
EP3080986A1 (en) Systems and methods for producing panoramic and stereoscopic videos
CN102929091A (zh) 数字球幕立体电影的制作方法
CN106168855A (zh) 一种便携式mr眼镜、手机和mr眼镜系统
CN107545537A (zh) 一种从稠密点云生成3d全景图片的方法
CN110324553A (zh) 基于视频通信的实景窗系统
WO2020209088A1 (ja) 複数のマーカを備えたデバイス
CN115118880A (zh) 一种基于沉浸式视频终端搭建的xr虚拟拍摄系统
WO2022127747A1 (zh) 利用虚拟场景进行真实社交的方法及系统
CN113112407B (zh) 基于电视的照镜视野生成方法、系统、设备及介质
CN113941138A (zh) 一种ar交互控制系统、装置及应用
CN110324555B (zh) 视频通信装置及方法
CN110324559B (zh) 视频通信装置及方法
WO2023174385A1 (zh) 利用虚拟场景进行真实社交的方法、系统及ar眼镜
CN110244837A (zh) 增强现实且具有虚拟影像叠加的体验眼镜及其成像方法
WO2017092369A1 (zh) 一种头戴设备、三维视频通话系统和三维视频通话实现方法
CN1621939A (zh) 非立体电影胶片的立体放映方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23769887

Country of ref document: EP

Kind code of ref document: A1