CN111459267A

CN111459267A - Data processing method, first server, second server and storage medium

Info

Publication number: CN111459267A
Application number: CN202010136715.5A
Authority: CN
Inventors: 王晓阳; 赵保军; 张佳宁
Original assignee: Hangzhou Jialan Innovation Technology Co ltd
Current assignee: Hangzhou Jialan Innovation Technology Co ltd
Priority date: 2020-03-02
Filing date: 2020-03-02
Publication date: 2020-07-28

Abstract

The application discloses a data processing method, a first server, a second server and a storage medium, wherein the method comprises the steps that the first server receives control information, pose information and audio information sent by a virtual reality device; the first server renders the image according to the received control information and pose information to obtain a rendered video image; and the first server fuses the received audio information and the rendered video image to obtain audio and video fusion data of the virtual reality device. According to the method and the device, the first server fuses the received audio information and the rendered video image to obtain the audio and video fusion data of the virtual reality device, so that the virtual reality device has an audio interaction function, and user experience is improved.

Description

Data processing method, first server, second server and storage medium

Technical Field

The present application relates to, but not limited to, the field of Virtual Reality (VR) technology, and in particular, to a data processing method, a first server, a second server, and a storage medium.

Background

The virtual reality technology is an information technology for constructing an immersive human-computer interaction environment based on computable information, a vivid integrated virtual environment with a specific range of visual, auditory, tactile and the like is generated by adopting modern high technology with a computer as a core, and a user interacts with objects in the virtual environment in a natural mode by means of necessary equipment.

Virtual reality interaction has been applied to various industries such as education, training, games, and electronic competitions, however, the existing virtual reality devices usually do not have audio interaction function, so that user experience is poor.

Disclosure of Invention

The application provides a data processing method, a first server, a second server and a storage medium, which can enable a virtual reality device to have an audio interaction function.

The embodiment of the application provides a data processing method, which comprises the following steps: the method comprises the steps that a first server receives control information, pose information and audio information sent by a virtual reality device; the first server renders the image according to the received control information and pose information to obtain a rendered video image; and the first server fuses the received audio information and the rendered video image to obtain audio and video fusion data of the virtual reality device.

In some embodiments, the method further comprises: the first server sends the video image or audio and video fusion data of the virtual reality device to one or more client terminals, and the client terminals are video playing equipment connected with the first server through a network.

In some embodiments, before the first server sends the video images or audio-video fusion data of the virtual reality device to one or more client terminals, the method further comprises: the first server detects whether the client terminal meets the sharing condition of the video image or the audio and video fusion data of the virtual reality device; and if the client terminal meets the sharing condition of the video image or the audio and video fusion data of the virtual reality device, triggering the first server to send the video image or the audio and video fusion data of the virtual reality device to the operation of the client terminal.

In some embodiments, the sharing conditions include any one or more of: all shareable, not shareable, shareable within a group, shareable by relatives and friends, and shareable credits.

Embodiments of the present application further provide a first server, which includes a processor and a memory, where the processor is configured to execute a computer program stored in the memory to implement the steps of the data processing method according to any one of the above.

An embodiment of the present application further provides a data processing method, including: the second server receives source data sent by N first servers, wherein the N first servers are connected with M virtual reality devices through a network, and the source data comprises at least one of the following: local video images and/or audio information of the M virtual reality devices, wherein the local video images are obtained by rendering the corresponding image by the first server according to control information and pose information of the virtual reality device connected with the first server, the audio information is obtained by receiving the audio information from the virtual reality device connected with the first server by the first server, N, M is a natural number, and M is less than or equal to N; and the second server integrates the received source data to form fused video images and/or audio and video fused data of the M virtual reality devices.

In some embodiments, the method further comprises: and the second server sends the fused video images and/or audio and video fused data of the M virtual reality devices to one or more client terminals, and the client terminals are video playing equipment connected with the first server through a network.

In some embodiments, when the audio information is received from a first virtual reality device and the client terminal is a second virtual reality device, the method further comprises: the second server detects the position relation of the first virtual reality device and the second virtual reality device; and the second server sends audio control information to the second virtual reality device according to the detected position relation, so that the second virtual reality device plays the audio and video fusion data according to the audio control information.

An embodiment of the present application further provides a second server, which includes a processor and a memory, where the processor is configured to execute a computer program stored in the memory to implement the steps of the data processing method according to any one of the above.

An embodiment of the present application further provides a storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the data processing method according to any one of the above items are implemented.

According to the data processing method, the first server, the second server and the storage medium, the received audio information and the rendered video image are fused through the first server to obtain the audio and video fusion data of the virtual reality device, so that the virtual reality device has an audio interaction function, and user experience is improved;

furthermore, the second server is arranged on the upper layer of the N first servers, and source data of the N first servers are integrated through the second server, so that the virtual reality device can be used for a multi-person interactive scene, and user experience is further improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.

Drawings

The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 2 is a schematic application scenario diagram of a first server according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a first server according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram illustrating another data processing method according to an embodiment of the present application;

fig. 5 is a schematic view of an application scenario of a second server according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a second server according to an embodiment of the present application.

Detailed Description

The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.

The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.

Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.

As shown in fig. 1, an embodiment of the present application provides a data processing method, which includes steps 101 to 103.

Specifically, step 101 includes: the first server receives control information, pose information and audio information sent by the virtual reality device.

In this embodiment, the virtual reality apparatus may include a head-mounted display and an interaction device, where the head-mounted display may include a mobile-end head-mounted display, a PC-end head-mounted display, an integrated head-mounted display, a split head-mounted display, and the like; the interactive devices may include handles, gloves, mice, keyboards, and other hand-held or wearable devices.

In this embodiment, a virtual reality application is installed in the first server, the virtual reality application receives control information, pose information, and audio information sent by the head-mounted display, and the first server may be a cloud server.

In this embodiment, the interactive device and the head-mounted display are paired to form a set of virtual reality apparatus. The head-mounted display is worn on the head of a user, the user holds the interactive device to move in a real scene, the first server has virtual reality application content provided by a content provider, and the head-mounted display of the user displays virtual reality video images or audio and video fusion data transmitted by the first server. The virtual reality device firstly transmits the collected control information, pose information and audio information to a first server through a network (for example, the network can be a 5G network), and the first server receives and processes the control information, the pose information and the audio information sent by the virtual reality device to form virtual reality video images or audio and video fusion data corresponding to the sent control information, the pose information and the audio information. For example, assuming that the content provider provides a virtual reality game of the CS of the first person, the first server may obtain a virtual reality video image corresponding to the control information, the pose information, and the audio information when receiving the control information, the pose information, and the audio information, such as the virtual character in the virtual reality game, and the first server transmits the processed virtual reality video image to the head-mounted display on the virtual reality device for display.

In this embodiment, the control information may be key information on the head mounted display and/or the interactive device.

In this embodiment, the pose information may include position information obtained by a locator on the interactive device and pose information obtained by a sensor on the interactive device.

In the present embodiment, the position information includes position information in X, Y, Z three cartesian axis directions, and the attitude information includes attitude information Pitch, Yaw, Roll about X, Y, Z three cartesian axis directions, where Pitch is a Pitch angle of rotation about the X axis, Yaw is a Yaw angle of rotation about the Y axis, and Roll is a Roll angle of rotation about the Z axis. The position information in the direction of X, Y, Z three rectangular axes and the attitude information Pitch, Yaw, and Roll in the direction of X, Y, Z three rectangular axes are collectively referred to as six-degree-of-freedom information.

In the present embodiment, the position information may be obtained by: arranging a positioning system in the space, wherein the positioning system comprises a signal emitter, a positioner arranged in or outside the head-mounted display, and a positioner arranged in or outside the interaction equipment, and the positioner arranged on the head-mounted display is used for receiving a positioning signal emitted by the signal emitter so as to obtain the position information of the head-mounted display; and the locator on the interactive equipment is used for receiving the locating signal transmitted by the signal transmitter so as to obtain the position information of the interactive equipment.

In this embodiment, the position information may also be obtained by: one or more positioning base stations with known coordinates are arranged in space, an interactive device to be positioned carries a positioner (for example, the positioner can be a positioning tag), the positioning tag transmits a pulse signal with a request property according to a certain frequency, the positioning base station transmits a pulse signal with a response property after receiving the pulse signal with the request property, the flight time of the pulse signal is calculated according to the time difference between the transmission and the reception of the pulse signal by the positioning tag, the distance information between the positioning tag and the positioning base station is determined according to the flight time of the pulse signal, and then the position information of the positioning tag (namely the position information of the interactive device) is calculated.

In this embodiment, the sensor on the interaction device comprises a six-axis sensor or a nine-axis sensor, wherein the six-axis sensor comprises a three-axis gyroscope and a three-axis accelerometer; the nine-axis sensor includes a three-axis gyroscope, a three-axis accelerometer, and a three-axis magnetometer.

In this embodiment, the pose information may further include image information captured by a head mounted display and/or a camera on the interaction device.

It should be noted that the image information captured by the camera on the head-mounted display may only include the pose information of the head-mounted display; or, the image information captured by the camera on the head-mounted display may not only include the pose information of the head-mounted display, but also include the pose information of the interactive device, where one or more positioning identifiers having a pattern, a dot matrix, or another scheme with an identifying significance may be preset on the interactive device, so that the first server side positions the pose information of the interactive device according to the positioning identifiers.

After receiving the pose information, the first server directly calculates the position And pose information of the head-mounted display And/or the interactive device through visual algorithms such as a Simultaneous L localization And Mapping (S L AM) algorithm, a Perspective-n-Point (PNP) algorithm And the like, or realizes the fusion of visual And inertial navigation information by using a multi-sensor fusion technology, thereby improving the real-time performance And precision of pose information estimation.

In this embodiment, the pose information sent by the virtual reality device may be X, Y, Z three-axis position information and pose information that have been solved by the virtual reality device; or raw signal data acquired by a sensor and a locator on the virtual reality device, and at this time, the first server further resolves to obtain X, Y, Z three-axis position information and attitude information.

In this embodiment, the audio information may be sound information collected by a microphone on the head-mounted display and/or the interactive device.

In this embodiment, the first server may receive the control information, the pose information, and the audio information sent by the virtual reality device through the 5G network. The 5G has faster transmission speed, larger transmission capacity and extremely low time delay, and the transmission through the 5G network can reduce the time of data transmission, thereby reducing the time delay. For the virtual reality application, in order to enable a user to obtain good use experience, the delay is required to be as small as possible, otherwise, symptoms such as dizziness and the like are caused.

Step 102 comprises: and the first server renders the image according to the received control information and pose information to obtain a rendered video image.

In this embodiment, step 102 may include:

the first server collects an image picture applied by the current virtual reality according to the received control information and pose information (the collection frame rate can be 60 Frames Per Second (fps) or higher);

the first server carries out lens distortion processing on each collected frame image to obtain a processed image frame;

the first server encodes the processed image frame (which may be encoded as an h.264 code stream, or Moving Picture Experts Group-2 (MPEG-2), source coding standard (AVS), etc.).

For some virtual reality applications, the required display frame rate is high, for example, assuming that the required display frame rate of the virtual reality device is 120fps, but the acquisition frame rate of the first server is 60fps, the first server needs to perform time warping on each acquired frame image before performing lens warping processing, that is, the image frames are subjected to frame interpolation rendering of a time axis to obtain an image frame display frame rate of 120fps, and then the image at the image frame display frame rate of 120fps is subjected to lens warping processing and is encoded and sent to the head-mounted display for display. The frame insertion process may also be implemented in a head mounted display.

Step 103 comprises: and the first server fuses the received audio information and the rendered video image to obtain audio and video fusion data of the virtual reality device.

In this embodiment, a virtual sound card can be set on the first server, the virtual sound card is set as a default microphone input, audio data of a speaker is transferred to the microphone, so that an internal recording function is realized, the first server can store and process obtained audio and video, and the audio and video is transferred to a head-mounted display on the virtual reality device for display and play or is transferred to other client terminals for display and play as needed.

Fusing the received audio information and the rendered video image may specifically include: determining the duration of an audio unit for audio acquisition, wherein the duration of the audio unit for audio acquisition can be determined according to the preset acquisition frequency, acquisition channel number and acquisition digit for audio acquisition; determining a timestamp corresponding to each collected audio unit based on the duration of the audio unit, wherein the timestamp can be rounded downwards and dynamically compensated so as to reduce synchronization errors caused by rounding; and fusing the audio units and the image units based on the acquired time stamps corresponding to the audio units and the acquired time stamps corresponding to the image units to obtain fused video data, and fusing data of which the difference between the time stamps is within a preset difference.

In this embodiment, the method may further include: and the first server sends the video image or audio-video fusion data of the virtual reality device to one or more client terminals, wherein the client terminals are video playing equipment connected with the first server through a network.

In this embodiment, the client terminal may be a virtual reality device that sends the control information, the pose information, and the audio information in step 101, may be any other virtual reality device, or may be other network equipment with a video playing function, such as a personal computer, a mobile phone, a tablet display, and the like.

In this embodiment, before the first server sends the video image or the audio-video fusion data of the virtual reality device to one or more client terminals, the method may further include:

the method comprises the steps that a first server detects whether a client terminal meets the sharing condition of video images or audio and video fusion data of a virtual reality device;

and if the client terminal meets the sharing condition of the video image or the audio and video fusion data of the virtual reality device, triggering the first server to send the video image or the audio and video fusion data of the virtual reality device to the client terminal.

In this embodiment, the sharing condition may include any one or more of the following: all shareable, not shareable, shareable within a group, shareable by relatives and friends, shareable by points, or any other type of analysis condition.

For example, it is assumed that the virtual reality device that transmits the control information, the pose information, and the audio information is a first virtual reality device, the client terminal is a second virtual reality device that is different from the virtual reality device that transmits the control information, the pose information, and the audio information, and it is assumed that the video image or the audio-video fusion data of the first virtual reality device is sharable in a group, and when the second virtual reality device is in the same group as the first virtual reality device, the second virtual reality device can obtain the video image or the audio-video fusion data of the first virtual reality device.

For another example, assuming that the client terminal is any type of network device such as a personal computer, a mobile phone, a tablet display, and the like connected to the first server through a network, and assuming that the sharing condition of the video image or the audio/video fusion data of the virtual reality device is score sharable (assuming that the score required to be spent is 100 game scores), the user can obtain the video image or the audio/video fusion data of the virtual reality device by spending 100 game scores through the network device.

As shown in fig. 2, a client terminal (e.g., PC, head phone, mobile phone, PAD, etc.) may request from a first server to obtain audio and video of a virtual reality device. Further, the virtual reality device may set up a right to divide the audio and video transmitted and stored in the first server into sharable, non-sharable, parent-friend sharable, score sharable, and the like, and the client terminal may obtain the audio and video of the corresponding virtual reality device according to its definition, for example, it takes 100 game scores to see, or it takes the parent-friend to see, and the like.

Based on the same inventive concept, the present application further provides a first server, which includes a processor and a memory, where the processor is configured to execute a computer program stored in the memory to implement the steps of the data processing method according to any one of the above.

Based on the same inventive concept, embodiments of the present application further provide a storage medium storing a computer program, which when executed by a processor implements the steps of the data processing method according to any one of the above.

Based on the same inventive concept, as shown in fig. 3, an embodiment of the present application further provides a first server, which includes a first communication module 301 and a first audio/video processing module 302, where:

the first communication module 301 is configured to receive control information, pose information, and audio information sent by a virtual reality device;

the first audio/video processing module 302 is configured to render the image according to the received control information and pose information to obtain a rendered video image, and fuse the received audio information and the rendered video image to obtain audio/video fusion data of the virtual reality device.

In this embodiment, the rendering, by the first audio/video processing module 302, the image of the application according to the received control information and pose information to obtain a rendered video image, including:

acquiring an image picture of the current virtual reality application according to the received control information and pose information (the acquisition frame rate can be 60fps or higher);

performing lens distortion processing on each collected frame image to obtain a processed image frame;

and encoding the processed image frame (which can be encoded into an H.264 code stream, or MPEG-2, AVS and the like).

In this embodiment, the first audio/video processing module 302 is further configured to obtain audio/video fusion data of the virtual reality device through internal recording of the virtual sound card.

In this embodiment, the first communication module 301 is further configured to send the video image or the audio/video fusion data of the virtual reality device to one or more client terminals, where the client terminals are video playing devices connected to the first server through a network.

In this embodiment, the client terminal may be a virtual reality device that sends control information, pose information, and audio information, may also be any other virtual reality device, and may also be other network devices with a video playing function, such as a personal computer, a mobile phone, and a tablet display.

In this embodiment, the fusing the received audio information and the rendered video image by the first audio/video processing module 302 may specifically include: determining the duration of an audio unit for audio acquisition, wherein the duration of the audio unit for audio acquisition can be determined according to the preset acquisition frequency, acquisition channel number and acquisition digit for audio acquisition; determining a timestamp corresponding to each collected audio unit based on the duration of the audio unit, wherein the timestamp can be rounded downwards and dynamically compensated so as to reduce synchronization errors caused by rounding; and fusing the audio units and the image units based on the acquired time stamps corresponding to the audio units and the acquired time stamps corresponding to the image units to obtain fused video data, and fusing data of which the difference between the time stamps is within a preset difference.

In this embodiment, the first server further includes a first processing module, where:

the first processing module is configured to detect whether the client terminal meets a sharing condition of the video image or the audio/video fusion data of the virtual reality device, and notify the first communication module 301 to send the video image or the audio/video fusion data of the virtual reality device to the client terminal if the client terminal meets the sharing condition of the video image or the audio/video fusion data of the virtual reality device.

In this embodiment, the sharing condition may include: all shareable, not shareable, shareable within a group, shareable by relatives and friends, shareable by points, or any other type of analysis condition.

Based on the same inventive concept, as shown in fig. 4, an embodiment of the present application further provides a data processing method, which includes steps 401 to 402.

Wherein step 401 comprises: the second server receives source data sent by the N first servers, the N first servers are connected with the M virtual reality devices through a network, and the source data comprises at least one of the following data: the method comprises the steps that local video images and/or audio information of the M virtual reality devices are obtained by rendering the corresponding image by the first server according to control information and pose information of the virtual reality device connected with the first server, the audio information is received by the first server from the virtual reality device connected with the first server, N, M are natural numbers, and M is smaller than or equal to N.

The first server and the second server may be cloud servers. The data processing method provided by the embodiment of the application can realize multi-user interaction in virtual reality application, and an application scenario of the data processing method provided by the embodiment of the application is specifically described below by taking M as an example.

Suppose that in a large CS game, there are N users, each user has a set of respective virtual reality devices, each set of virtual reality devices may include an interactive device and a head-mounted display that are paired with each other, and each set of virtual reality devices corresponds to a first server, respectively. In this embodiment, as shown in fig. 5, it is assumed that a user 1 corresponds to a first server 1, image data formed by interaction of the user 1 is formed in the first server 1 for processing, a user 2 corresponds to a first server 2, image data formed by interaction of the user 2 is formed in the first server 2 for processing, and so on. Supposing that each set of virtual reality device sends the control information, the pose information and/or the audio information collected by the virtual reality device to the corresponding first server, and the first server processes the control information, the pose information and/or the audio information to obtain a local video image of the corresponding virtual reality device according to the received control information and the pose information. Arranging a second server on the upper layer of the N first servers, wherein the first servers send processed local video images and/or received audio information to the second server on the upper layer, the second server integrates interactive data of the N first servers to form integrated and/or fused video images and/or audio and video fused data from the perspective of each virtual reality device, and the fused video images and/or audio and video fused data can be transmitted to one or more first servers and transmitted to corresponding head-mounted displays by the first servers for displaying and playing; alternatively, only the storage may be performed without transmission.

Step 402 comprises: and the second server integrates the received source data to form fused video images and/or audio and video fused data of the M virtual reality devices.

In this embodiment, the method further includes: and the second server sends the fused video images and/or audio and video fused data of the M virtual reality devices to one or more client terminals, and the client terminals are video playing equipment connected with the first server through a network.

In this embodiment, before the second server sends the fused video images and/or the audio/video fused data of the M virtual reality devices to one or more client terminals, the method may further include:

the second server detects whether the client terminal meets the sharing condition of the fused video image and/or the audio and video fused data;

and if the client terminal meets the sharing condition of the fused video images and/or the audio and video fused data, triggering the second server to send the fused video images and/or the audio and video fused data of the M virtual reality devices to one or more client terminals.

For example, in a large CS game, all users are divided into two groups, and at this time, the second server may analyze the audio emitted by the users, whether the audio is played in the team, or all users, or a certain user.

In this embodiment, the fusion video image includes an overall fusion video image and local fusion video images corresponding to the virtual reality devices, and the audio/video fusion data includes overall audio/video fusion data and local audio/video fusion data corresponding to the virtual reality devices.

When the client terminal requests to view the fused video image and/or the audio/video fused data stored in the second server, the client terminal can select the above viewing angles to view the whole fused video image or the whole audio/video fused data (at this time, the client terminal can be a virtual reality device, and can also be network equipment of other types), or view the local fused video image or the local audio/video fused data corresponding to each virtual reality device at the viewing angle corresponding to the user of each virtual reality device (at this time, the client terminal is a virtual reality device).

In this embodiment, the overall fusion video image and the local fusion video images corresponding to the virtual reality devices may respectively set respective sharing conditions; the whole audio and video fusion data and the local audio and video fusion data corresponding to each virtual reality device can also be respectively set with respective sharing conditions.

In this embodiment, when the audio information is received from the first virtual reality device and the client terminal is the second virtual reality device, the method further includes:

the second server detects the position relation between the first virtual reality device and the second virtual reality device;

and the second server sends the audio control information to the second virtual reality device according to the detected position relation, so that the second virtual reality device plays the audio and video fusion data according to the audio control information.

In this embodiment, the second server needs to perform processing according to the user's location during audio processing, for example, when the user a is located on the left side of the entire map, and when he speaks to the user B located on the right side in the virtual scene, and the audio is transmitted to the user B, the user B can hear the sound by using the stereo sound playing system on the head display to perceive the sound emitted from the user on the left side of the user B.

Based on the same inventive concept, the present application further provides a second server, which includes a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to implement the steps of the data processing method according to any one of the above.

Based on the same inventive concept, as shown in fig. 6, an embodiment of the present application further provides a second server, which includes a second communication module 601 and a second audio/video processing module 602.

Specifically, the second communication module 601 is configured to receive source data sent by N first servers, where the N first servers are connected to M virtual reality devices through a network, and the source data includes at least one of the following: the local video images or the audio information of the M virtual reality devices are obtained by rendering the corresponding image by the first server according to the control information and the pose information of the virtual reality device connected with the first server, and the audio information is obtained by receiving the audio information from the virtual reality device connected with the first server.

And a second audio/video processing module 602, configured to integrate the received source data to form fused video images and/or audio/video fused data of the M virtual reality devices.

In this embodiment, the second communication module 601 is further configured to send the fused video images and/or the audio/video fused data of the M virtual reality devices to one or more client terminals, where the client terminals are video playing devices connected to the first server through a network.

In this embodiment, the second server further includes a second processing module, wherein:

the second processing module is used for detecting whether the client terminal meets the sharing condition of the fused video image and/or the audio and video fused data; if the client terminal meets the sharing condition of the fused video image and/or the audio/video fused data, the second communication module 601 is notified to send the fused video image and/or the audio/video fused data of the M virtual reality devices to the client terminal.

In this embodiment, when the audio information is received from the first virtual reality device and the client terminal is the second virtual reality device, the second processing module is further configured to detect a position relationship between the first virtual reality device and the second virtual reality device; generating audio control information according to the detected position relation and transmitting the audio control information to the second communication module 601;

the second communication module 601 is further configured to send audio control information to the second virtual reality device, so that the second virtual reality device plays the audio and video fusion data according to the audio control information.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A data processing method, comprising:

the method comprises the steps that a first server receives control information, pose information and audio information sent by a virtual reality device;

the first server renders the image according to the received control information and pose information to obtain a rendered video image;

and the first server fuses the received audio information and the rendered video image to obtain audio and video fusion data of the virtual reality device.

2. The data processing method of claim 1, wherein the method further comprises: the first server sends the video image or audio and video fusion data of the virtual reality device to one or more client terminals, and the client terminals are video playing equipment connected with the first server through a network.

3. The data processing method according to claim 2, wherein before the first server sends the video image or audio-video fusion data of the virtual reality device to one or more client terminals, the method further comprises:

the first server detects whether the client terminal meets the sharing condition of the video image or the audio and video fusion data of the virtual reality device;

and if the client terminal meets the sharing condition of the video image or the audio and video fusion data of the virtual reality device, triggering the first server to send the video image or the audio and video fusion data of the virtual reality device to the operation of the client terminal.

4. The data processing method according to claim 3, wherein the sharing condition includes any one or more of: all shareable, not shareable, shareable within a group, shareable by relatives and friends, and shareable credits.

5. A first server, characterized in that it comprises a processor and a memory, said processor being adapted to execute a computer program stored in the memory to implement the steps of the data processing method according to any of claims 1 to 4.

6. A data processing method, comprising:

the second server receives source data sent by N first servers, wherein the N first servers are connected with M virtual reality devices through a network, and the source data comprises at least one of the following: local video images and/or audio information of the M virtual reality devices, wherein the local video images are obtained by rendering the corresponding image by the first server according to control information and pose information of the virtual reality device connected with the first server, the audio information is obtained by receiving the audio information from the virtual reality device connected with the first server by the first server, N, M is a natural number, and M is less than or equal to N;

and the second server integrates the received source data to form fused video images and/or audio and video fused data of the M virtual reality devices.

7. The data processing method of claim 6, wherein the method further comprises: and the second server sends the fused video images and/or audio and video fused data of the M virtual reality devices to one or more client terminals, and the client terminals are video playing equipment connected with the first server through a network.

8. The data processing method of claim 7, wherein when the audio information is received from a first virtual reality device and the client terminal is a second virtual reality device, the method further comprises:

the second server detects the position relation of the first virtual reality device and the second virtual reality device;

and the second server sends audio control information to the second virtual reality device according to the detected position relation, so that the second virtual reality device plays the audio and video fusion data according to the audio control information.

9. A second server, characterized in that it comprises a processor and a memory, said processor being adapted to execute a computer program stored in the memory to implement the steps of the data processing method according to any of claims 6 to 8.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the steps of the data processing method according to any one of claims 1 to 4 or claims 6 to 8.