CN111984111A

CN111984111A - Multimedia processing method, device and communication equipment

Info

Publication number: CN111984111A
Application number: CN201910428614.2A
Authority: CN
Inventors: 郑智民
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2020-11-24

Abstract

The invention provides a multimedia processing method, a device and communication equipment, wherein the multimedia processing method comprises the following steps: acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio/video data and motion state data of a target object acquired by the plurality of acquisition components; and determining target audio and video data to be played according to the second data part and the third data part. The embodiment of the invention can improve the time efficiency when processing the multimedia data, prevent the phenomena of asynchronous pictures and sounds and the like, and bring better virtual reality experience to users.

Description

Multimedia processing method, device and communication equipment

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a multimedia processing method and apparatus, and a communication device.

Background

With the development of Virtual Reality (VR) technology, the application of VR terminals to video teaching is accepted by more and more users. Such as: the method adopts a 3D camera, collects action videos of people in the real world from a plurality of angles, and synthesizes the action videos with a background picture set by a computer or shot in real time in another place through image preprocessing methods such as matting, filling and the like. At the viewing client, the user may be given the immersive effect of the human in the drawing.

Disclosure of Invention

The embodiment of the invention provides a multimedia processing method, a multimedia processing device and communication equipment, and aims to solve the problem of poor timeliness in the process of processing multimedia data at present.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a multimedia processing method, applied to a second communication device, including:

acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio/video data and motion state data of a target object acquired by the plurality of acquisition components;

and determining target audio and video data to be played according to the second data part and the third data part.

In a second aspect, an embodiment of the present invention provides a multimedia processing apparatus, applied to a second communication device, including:

the acquisition module is used for acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio and video data and motion state data of a target object acquired by the plurality of acquisition components;

And the determining module is used for determining target audio and video data to be played according to the second data part and the third data part.

In a third aspect, an embodiment of the present invention provides a multimedia processing system, including: the system comprises a plurality of acquisition components, at least one first communication device, a second communication device and a plurality of user equipment terminals;

the acquisition components are respectively connected with the first communication equipment and the second communication equipment and are used for acquiring audio and video data and motion state data of a target object, sending a first data part to the first communication equipment and sending a second data part to the second communication equipment; the first data part and the second data part are obtained by dividing audio and video data and motion state data of the target object;

the plurality of first communication devices are respectively connected with the second communication device and used for processing the received first data part to obtain a third data part and sending the third data part to the second communication device;

the second communication equipment is respectively connected with the plurality of user equipment terminals and is used for processing the received second data part and the third data part to obtain target audio and video data to be played and respectively sending the target audio and video data to each user equipment terminal;

And the user equipment end is used for outputting the received target audio and video data.

In a fourth aspect, an embodiment of the present invention provides a communication device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the multimedia processing method.

In a fifth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the multimedia processing method described above.

In the embodiment of the invention, the data of the target object can be processed by the first communication equipment and the second communication equipment respectively, so that the time efficiency in processing the multimedia data is improved, the phenomena of asynchronous pictures and sounds and the like are prevented, the viewing effect is enhanced, and better virtual reality experience is brought to a user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a block diagram of a multimedia processing system according to an embodiment of the present invention;

FIG. 2 is a flow chart of a multimedia processing method according to an embodiment of the invention;

FIG. 3 is a schematic structural diagram of a multimedia processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a communication device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Currently, the most important characteristic of virtual reality is that a person can feel the dynamic characteristics of a scene under freely changing interactive control, in other words, a virtual reality system requires a corresponding graphic picture to be generated immediately along with the movement (change of position and direction) of the person. However, in the process of combining the character movement and the virtual scene in the virtual reality, interactive movement panoramic shooting and transmission of a plurality of cameras, transmission and calculation of large data volume such as combination with a virtual picture and the like are involved. Meanwhile, virtual reality needs to achieve an immersive effect, needs a three-dimensional picture and also needs three-dimensional sound, and particularly needs the cooperative matching of a plurality of sound boxes in different directions when a non-closed sound box is adopted, so that the effect of sound in different directions and distances is achieved. The requirement on the software and hardware performance of cooperative data processing among multiple devices in the virtual reality system is high.

However, when the virtual reality system processes multimedia data, the processing is mainly implemented by a cloud computing device such as a cloud server, so that the timeliness of data processing is poor, and phenomena such as asynchronous pictures and sounds may occur, which affect the viewing effect.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a multimedia processing system according to an embodiment of the present invention, as shown in fig. 1, the multimedia processing system includes: a plurality of acquisition components 11, a plurality of first communication devices 12, a second communication device 13, and a plurality of user equipment terminals 14.

Optionally, the plurality of collecting assemblies 11 are respectively connected to the first communication device 12 and the second communication device 13, and are configured to collect audio/video data and motion state data of the target object, send the first data portion to the first communication device 12, and send the second data portion to the second communication device 13. The first data part and the second data part are obtained by dividing audio and video data and motion state data of the target object. For example, the audio-video data and the motion state data of the target object may be divided into a first data portion and a second data portion after being preprocessed according to a preset condition. The number of target objects in this embodiment can be selected to be plural.

The plurality of first communication devices 12 are connected to the second communication device 13, respectively, and are configured to process the received first data portion to obtain a third data portion, and send the third data portion to the second communication device 13. It should be noted that the first communication device 12 can be understood as an edge computing side, such as an edge server. And at least one first communication device 12 may be provided for each target object to process audio-video data and the like of the target object.

The second communication device 13 is connected to the plurality of user device terminals 14, and is configured to process the received second data portion and the third data portion to obtain target audio/video data to be played, and send the target audio/video data to the plurality of user device terminals 14, respectively. It should be noted that the second communication device 13 can be understood as a cloud computing end, such as a cloud server.

The user equipment 14 is configured to output the received target audio/video data.

The multimedia processing system provided by the embodiment of the invention can be used for respectively processing the data of the target object by virtue of the first communication equipment and the second communication equipment, thereby improving the time efficiency in the process of processing the multimedia data, preventing the phenomena of asynchronous pictures and sounds and the like, enhancing the viewing effect and bringing better virtual reality experience to users.

In at least one embodiment of the present invention, the first communication device 12 may be an edge computing device, and the second communication device 13 may be a cloud computing device.

In at least one embodiment of the present invention, the acquisition component 11 can be selected from wearable devices (such as an acceleration sensor, an angular velocity sensor, etc.) carried by the target object, a sound pickup, a plurality of cameras surrounding the target object, an eye tracker at the user device, and the like. Taking a VR video teaching scene as an example, the target object may be a teacher participating in teaching or a student participating in listening and speaking.

Optionally, the acceleration sensor may acquire motion state data of the target object in real time. The sound pickup may include a plurality of microphones and an audio processing device to acquire the direction, amplitude, frequency, etc. of the sound emitted by the target object. The camera can acquire the picture of the target object. The eye tracker at the user equipment end can acquire a visual attention area ROI in real time so as to outline an area needing to be processed in a square frame, circle, ellipse, irregular polygon and other modes from the processed image.

In one embodiment, the acceleration sensor may be Micro Electro Mechanical Systems (MEMS), and the key part of the acceleration sensor is a middle capacitor plate in a cantilever structure, when the speed or acceleration is large enough, the inertial force applied to the middle capacitor plate exceeds the force for fixing or supporting the middle capacitor plate, and then the middle capacitor plate moves, and the distance between the middle capacitor plate and the lower capacitor plate changes, so that the upper and lower capacitors change. The change in capacitance is proportional to the acceleration. The capacitance change can be converted into a voltage signal to be directly output or output after digital processing.

In at least one embodiment of the present invention, the data collected by the acceleration sensor may include a plurality of directions, for example, three-axis acceleration, including x, y, and z directions. In horizontal movement of the user, the vertical and forward accelerations may exhibit periodic variations. In the walking and foot-receiving action, the gravity center is upward, and only one foot touches the ground, the vertical acceleration tends to increase in a positive direction, then the gravity center is moved downwards, and the two feet touch the bottom, and the acceleration is opposite. The horizontal acceleration decreases when the foot is retracted and increases when the stride is taken. In the walking exercise, the acceleration generated by the vertical direction and the forward direction is approximately a sine curve with the time, and a peak value is arranged at a certain point, wherein the acceleration change in the vertical direction is the largest, and the motion state data of the user can be calculated in real time by carrying out monitoring calculation and acceleration threshold decision on the peak value of the track.

In at least one embodiment of the present invention, the first communication device 12 may include a first pre-processing module and a first processing module; the first preprocessing module is used for preprocessing the received first data part to obtain first preprocessed data; the first processing module is used for inputting the first preprocessing data into a pre-trained edge processing model matched with the current target environment to obtain a third data part.

Optionally, the preprocessing performed on the first data portion may include motion state recognition, denoising, time domain and frequency domain feature extraction, fusion recognition and the like, and the frames with the visual retention time longer than 20ms may be automatically compensated by using a machine learning feedforward algorithm model, so as to obtain data suitable for model input.

Optionally, the edge processing model is obtained by training based on training sample data of a single target object in the target environment. The edge processing model is obtained by utilizing first training sample data to train in advance; the first training sample data is for a single target object, that is, a target object corresponding to a respective first communication device, and includes: characteristic data of the environment where the target object is located, and static characteristic data and dynamic characteristic data of the target object.

In at least one embodiment of the present invention, the training process of the edge processing model may include: firstly, selecting a universal model matched with a current target environment (such as the number of crowds) from a plurality of universal models of a database; and then, training the selected general model according to the first training sample data to obtain a corresponding edge processing model.

Optionally, the general model may be constructed based on a neural network, the neural network includes an input layer, an output layer, and a hidden layer, the input layer, the output layer, and the hidden layer respectively include a plurality of neurons, and the neurons between the input layer, the output layer, and the hidden layer have connection weight values. As the sample size increases, the model training parameters may be further optimized based on selection, crossover, and mutation operations. The training process of the model may be implemented at the edge server side.

In at least one embodiment of the present invention, the training sample data input factor can be divided into three levels as follows:

a first level: characteristic data of the environment where the target object is located, such as time, amplitude, attenuation ratio, sound source angle relative to the target object and the like of indoor environment echoes input at the target object and returned to the user equipment end;

and a second level: personal static feature data such as personal demographic data (age, gender, height, etc.), and personal historical training data models (personal historical best-look models);

a third level: personal dynamic feature data such as acceleration, ROI view, etc.

Optionally, the process of training the selected general model by using the training sample data may include:

Leading the preprocessed training sample data into an input layer of the neural network, and outputting the training sample data from the output layer after the training of the hidden layer;

detecting whether the result output by the output layer reaches an expected result, if not, obtaining an error signal according to the output result and the expected result, and entering a back propagation stage;

and taking the error signal as an input signal in a back propagation stage to reversely return from the output layer to the input layer, and modifying connection weight values of neurons among the input layer, the output layer and the hidden layer in the process of reversely returning so as to gradually reduce the finally output error signal.

In at least one embodiment of the present invention, the second communication device 13 may include a second preprocessing module and a second processing module; the second preprocessing module is used for preprocessing the received second data part to obtain second preprocessed data; and the second processing module is used for inputting the second preprocessing data and the third data part into a pre-trained viewing model matched with the current target environment to obtain the target audio/video data.

Optionally, the preprocessing performed on the second data portion may include motion state recognition, denoising, time domain and frequency domain feature extraction, fusion recognition and the like, and the frames with the visual retention time longer than 20ms may be automatically compensated by using a machine learning feedforward algorithm model, so as to obtain data suitable for model input.

Optionally, the viewing model is obtained by training based on training sample data of all target objects in the target environment. The film watching model is obtained by utilizing second training sample data to train in advance; the second training sample data is for all target objects in the current target environment, the second training sample data comprising: feature data of the current target environment, and static feature data and dynamic feature data of each target object in the current target environment.

In at least one embodiment of the present invention, optionally, the multimedia processing system may further include: a resource equalizer; the resource equalizer is connected to the plurality of acquisition components 11, the first communication device 12, and the second communication device 13, respectively, and configured to perform resource coordination.

Thus, by means of the resource balancer, the time efficiency of data processing can be further improved.

For example, for a plurality of collection components and a plurality of edge servers, if the load rate of one party exceeds 50%, the calculation task, i.e., the preprocessing task, can be coordinated to other nodes less than 50%, and if both of them exceed, the calculation can be performed by the cloud server.

Referring to fig. 2, fig. 2 is a flowchart of a multimedia processing method according to an embodiment of the present invention, the method is applied to the second communication device, as shown in fig. 2, and the method includes the following steps:

step 201: the second data portion is obtained from the plurality of acquisition components and the third data portion is obtained from the at least one first communication device.

Optionally, the third data portion is obtained by processing a first data portion, and the first data portion and the second data portion are obtained by dividing audio/video data and motion state data of the target object acquired by the plurality of acquisition components.

Step 202: and determining target audio and video data to be played according to the second data part and the third data part.

Optionally, after the target audio/video data to be played is obtained, the target audio/video data may be output through the user equipment terminal.

In the embodiment of the invention, the data of the target object can be processed simultaneously by the first communication equipment and the second communication equipment, so that the time efficiency in processing the multimedia data is improved, the phenomena of asynchronous pictures and sounds and the like are prevented, the viewing effect is enhanced, and better virtual reality experience is brought to a user.

Optionally, the first communication device may be an edge computing device, such as an edge server; the second communication device may be a cloud computing device, such as a cloud server.

Optionally, the third data portion is obtained by the first communication device inputting first preprocessing data into a pre-trained edge processing model matched with the current target environment, and the first preprocessing data is obtained by preprocessing the first data portion.

Optionally, the edge processing model is obtained by training based on training sample data of the target object in the target environment. The edge processing model is obtained by utilizing first training sample data to train in advance; the first training sample data is for a single target object, that is, a target object corresponding to a respective first communication device, and includes: characteristic data of the environment where the target object is located, and static characteristic data and dynamic characteristic data of the target object.

Optionally, step 202 may include:

preprocessing the second data part to obtain second preprocessed data;

and inputting the second preprocessing data and the third data part into a pre-trained viewing model matched with the current target environment to obtain the target audio/video data.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a multimedia processing apparatus according to an embodiment of the present invention, as shown in fig. 3, the apparatus includes:

an obtaining module 31 for obtaining the second data portion from the plurality of collecting assemblies and the third data portion from the at least one first communication device.

Optionally, the third data portion is obtained by processing a first data portion, and the first data portion and the second data portion are obtained by dividing audio/video data and motion state data of the target object acquired by the plurality of acquisition components;

And the determining module 32 is configured to determine target audio/video data to be played according to the second data portion and the third data portion.

Optionally, the determining module 32 is specifically configured to:

preprocessing the second data part to obtain second preprocessed data;

In the embodiment of the present invention, each process of the method embodiment shown in fig. 2 can be implemented, and the same technical effect can be achieved, and in order to avoid repetition, the details are not described here.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a communication device according to an embodiment of the present invention, and as shown in fig. 4, the communication device 40 includes: a processor 41, a memory 42, and a computer program stored in the memory 42 and capable of running on the processor 41, where the components in the communication device 40 are coupled together through a bus interface 43, and when the computer program is executed by the processor 41, the processes of the above-mentioned multimedia processing method embodiment can be implemented, and the same technical effect can be achieved, and in order to avoid repetition, details are not described here again.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements each process of the foregoing multimedia processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A multimedia processing method applied to a second communication device, comprising:

2. The method of claim 1, wherein the first communication device is an edge computing device and the second communication device is a cloud computing device.

3. The method of claim 1, wherein the third data portion is obtained by the first communication device inputting first pre-processed data into a pre-trained edge processing model matching a current target environment, and wherein the first pre-processed data is obtained by pre-processing the first data portion.

4. The method of claim 3, wherein the edge processing model is trained based on training sample data of the target object in the target environment.

5. The method according to claim 1, wherein the determining target audio-video data to be played according to the second data portion and the third data portion comprises:

preprocessing the second data part to obtain second preprocessed data;

6. The method of claim 5, wherein the viewing model is trained based on training sample data of all target objects in the target environment.

7. A multimedia processing apparatus, applied to a second communication device, comprising:

8. A multimedia processing system, comprising: the system comprises a plurality of acquisition components, at least one first communication device, a second communication device and a plurality of user equipment terminals;

9. A communication device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program, when executed by the processor, implements the steps of the multimedia processing method according to any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the multimedia processing method according to any one of claims 1 to 6.