CN111984111A - Multimedia processing method, device and communication equipment - Google Patents
Multimedia processing method, device and communication equipment Download PDFInfo
- Publication number
- CN111984111A CN111984111A CN201910428614.2A CN201910428614A CN111984111A CN 111984111 A CN111984111 A CN 111984111A CN 201910428614 A CN201910428614 A CN 201910428614A CN 111984111 A CN111984111 A CN 111984111A
- Authority
- CN
- China
- Prior art keywords
- data
- communication device
- parts
- target
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004891 communication Methods 0.000 title claims abstract description 79
- 238000003672 processing method Methods 0.000 title claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 52
- 230000033001 locomotion Effects 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims description 41
- 238000000034 method Methods 0.000 claims description 32
- 238000007781 pre-processing Methods 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 12
- 230000001133 acceleration Effects 0.000 description 16
- 230000000694 effects Effects 0.000 description 10
- 230000003068 static effect Effects 0.000 description 8
- 239000003990 capacitor Substances 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013499 data model Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/01—Indexing scheme relating to G06F3/01
- G06F2203/012—Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention provides a multimedia processing method, a device and communication equipment, wherein the multimedia processing method comprises the following steps: acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio/video data and motion state data of a target object acquired by the plurality of acquisition components; and determining target audio and video data to be played according to the second data part and the third data part. The embodiment of the invention can improve the time efficiency when processing the multimedia data, prevent the phenomena of asynchronous pictures and sounds and the like, and bring better virtual reality experience to users.
Description
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a multimedia processing method and apparatus, and a communication device.
Background
With the development of Virtual Reality (VR) technology, the application of VR terminals to video teaching is accepted by more and more users. Such as: the method adopts a 3D camera, collects action videos of people in the real world from a plurality of angles, and synthesizes the action videos with a background picture set by a computer or shot in real time in another place through image preprocessing methods such as matting, filling and the like. At the viewing client, the user may be given the immersive effect of the human in the drawing.
Disclosure of Invention
The embodiment of the invention provides a multimedia processing method, a multimedia processing device and communication equipment, and aims to solve the problem of poor timeliness in the process of processing multimedia data at present.
In order to solve the technical problem, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides a multimedia processing method, applied to a second communication device, including:
acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio/video data and motion state data of a target object acquired by the plurality of acquisition components;
and determining target audio and video data to be played according to the second data part and the third data part.
In a second aspect, an embodiment of the present invention provides a multimedia processing apparatus, applied to a second communication device, including:
the acquisition module is used for acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio and video data and motion state data of a target object acquired by the plurality of acquisition components;
And the determining module is used for determining target audio and video data to be played according to the second data part and the third data part.
In a third aspect, an embodiment of the present invention provides a multimedia processing system, including: the system comprises a plurality of acquisition components, at least one first communication device, a second communication device and a plurality of user equipment terminals;
the acquisition components are respectively connected with the first communication equipment and the second communication equipment and are used for acquiring audio and video data and motion state data of a target object, sending a first data part to the first communication equipment and sending a second data part to the second communication equipment; the first data part and the second data part are obtained by dividing audio and video data and motion state data of the target object;
the plurality of first communication devices are respectively connected with the second communication device and used for processing the received first data part to obtain a third data part and sending the third data part to the second communication device;
the second communication equipment is respectively connected with the plurality of user equipment terminals and is used for processing the received second data part and the third data part to obtain target audio and video data to be played and respectively sending the target audio and video data to each user equipment terminal;
And the user equipment end is used for outputting the received target audio and video data.
In a fourth aspect, an embodiment of the present invention provides a communication device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the multimedia processing method.
In a fifth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the multimedia processing method described above.
In the embodiment of the invention, the data of the target object can be processed by the first communication equipment and the second communication equipment respectively, so that the time efficiency in processing the multimedia data is improved, the phenomena of asynchronous pictures and sounds and the like are prevented, the viewing effect is enhanced, and better virtual reality experience is brought to a user.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a block diagram of a multimedia processing system according to an embodiment of the present invention;
FIG. 2 is a flow chart of a multimedia processing method according to an embodiment of the invention;
FIG. 3 is a schematic structural diagram of a multimedia processing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a communication device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Currently, the most important characteristic of virtual reality is that a person can feel the dynamic characteristics of a scene under freely changing interactive control, in other words, a virtual reality system requires a corresponding graphic picture to be generated immediately along with the movement (change of position and direction) of the person. However, in the process of combining the character movement and the virtual scene in the virtual reality, interactive movement panoramic shooting and transmission of a plurality of cameras, transmission and calculation of large data volume such as combination with a virtual picture and the like are involved. Meanwhile, virtual reality needs to achieve an immersive effect, needs a three-dimensional picture and also needs three-dimensional sound, and particularly needs the cooperative matching of a plurality of sound boxes in different directions when a non-closed sound box is adopted, so that the effect of sound in different directions and distances is achieved. The requirement on the software and hardware performance of cooperative data processing among multiple devices in the virtual reality system is high.
However, when the virtual reality system processes multimedia data, the processing is mainly implemented by a cloud computing device such as a cloud server, so that the timeliness of data processing is poor, and phenomena such as asynchronous pictures and sounds may occur, which affect the viewing effect.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a multimedia processing system according to an embodiment of the present invention, as shown in fig. 1, the multimedia processing system includes: a plurality of acquisition components 11, a plurality of first communication devices 12, a second communication device 13, and a plurality of user equipment terminals 14.
Optionally, the plurality of collecting assemblies 11 are respectively connected to the first communication device 12 and the second communication device 13, and are configured to collect audio/video data and motion state data of the target object, send the first data portion to the first communication device 12, and send the second data portion to the second communication device 13. The first data part and the second data part are obtained by dividing audio and video data and motion state data of the target object. For example, the audio-video data and the motion state data of the target object may be divided into a first data portion and a second data portion after being preprocessed according to a preset condition. The number of target objects in this embodiment can be selected to be plural.
The plurality of first communication devices 12 are connected to the second communication device 13, respectively, and are configured to process the received first data portion to obtain a third data portion, and send the third data portion to the second communication device 13. It should be noted that the first communication device 12 can be understood as an edge computing side, such as an edge server. And at least one first communication device 12 may be provided for each target object to process audio-video data and the like of the target object.
The second communication device 13 is connected to the plurality of user device terminals 14, and is configured to process the received second data portion and the third data portion to obtain target audio/video data to be played, and send the target audio/video data to the plurality of user device terminals 14, respectively. It should be noted that the second communication device 13 can be understood as a cloud computing end, such as a cloud server.
The user equipment 14 is configured to output the received target audio/video data.
The multimedia processing system provided by the embodiment of the invention can be used for respectively processing the data of the target object by virtue of the first communication equipment and the second communication equipment, thereby improving the time efficiency in the process of processing the multimedia data, preventing the phenomena of asynchronous pictures and sounds and the like, enhancing the viewing effect and bringing better virtual reality experience to users.
In at least one embodiment of the present invention, the first communication device 12 may be an edge computing device, and the second communication device 13 may be a cloud computing device.
In at least one embodiment of the present invention, the acquisition component 11 can be selected from wearable devices (such as an acceleration sensor, an angular velocity sensor, etc.) carried by the target object, a sound pickup, a plurality of cameras surrounding the target object, an eye tracker at the user device, and the like. Taking a VR video teaching scene as an example, the target object may be a teacher participating in teaching or a student participating in listening and speaking.
Optionally, the acceleration sensor may acquire motion state data of the target object in real time. The sound pickup may include a plurality of microphones and an audio processing device to acquire the direction, amplitude, frequency, etc. of the sound emitted by the target object. The camera can acquire the picture of the target object. The eye tracker at the user equipment end can acquire a visual attention area ROI in real time so as to outline an area needing to be processed in a square frame, circle, ellipse, irregular polygon and other modes from the processed image.
In one embodiment, the acceleration sensor may be Micro Electro Mechanical Systems (MEMS), and the key part of the acceleration sensor is a middle capacitor plate in a cantilever structure, when the speed or acceleration is large enough, the inertial force applied to the middle capacitor plate exceeds the force for fixing or supporting the middle capacitor plate, and then the middle capacitor plate moves, and the distance between the middle capacitor plate and the lower capacitor plate changes, so that the upper and lower capacitors change. The change in capacitance is proportional to the acceleration. The capacitance change can be converted into a voltage signal to be directly output or output after digital processing.
In at least one embodiment of the present invention, the data collected by the acceleration sensor may include a plurality of directions, for example, three-axis acceleration, including x, y, and z directions. In horizontal movement of the user, the vertical and forward accelerations may exhibit periodic variations. In the walking and foot-receiving action, the gravity center is upward, and only one foot touches the ground, the vertical acceleration tends to increase in a positive direction, then the gravity center is moved downwards, and the two feet touch the bottom, and the acceleration is opposite. The horizontal acceleration decreases when the foot is retracted and increases when the stride is taken. In the walking exercise, the acceleration generated by the vertical direction and the forward direction is approximately a sine curve with the time, and a peak value is arranged at a certain point, wherein the acceleration change in the vertical direction is the largest, and the motion state data of the user can be calculated in real time by carrying out monitoring calculation and acceleration threshold decision on the peak value of the track.
In at least one embodiment of the present invention, the first communication device 12 may include a first pre-processing module and a first processing module; the first preprocessing module is used for preprocessing the received first data part to obtain first preprocessed data; the first processing module is used for inputting the first preprocessing data into a pre-trained edge processing model matched with the current target environment to obtain a third data part.
Optionally, the preprocessing performed on the first data portion may include motion state recognition, denoising, time domain and frequency domain feature extraction, fusion recognition and the like, and the frames with the visual retention time longer than 20ms may be automatically compensated by using a machine learning feedforward algorithm model, so as to obtain data suitable for model input.
Optionally, the edge processing model is obtained by training based on training sample data of a single target object in the target environment. The edge processing model is obtained by utilizing first training sample data to train in advance; the first training sample data is for a single target object, that is, a target object corresponding to a respective first communication device, and includes: characteristic data of the environment where the target object is located, and static characteristic data and dynamic characteristic data of the target object.
In at least one embodiment of the present invention, the training process of the edge processing model may include: firstly, selecting a universal model matched with a current target environment (such as the number of crowds) from a plurality of universal models of a database; and then, training the selected general model according to the first training sample data to obtain a corresponding edge processing model.
Optionally, the general model may be constructed based on a neural network, the neural network includes an input layer, an output layer, and a hidden layer, the input layer, the output layer, and the hidden layer respectively include a plurality of neurons, and the neurons between the input layer, the output layer, and the hidden layer have connection weight values. As the sample size increases, the model training parameters may be further optimized based on selection, crossover, and mutation operations. The training process of the model may be implemented at the edge server side.
In at least one embodiment of the present invention, the training sample data input factor can be divided into three levels as follows:
a first level: characteristic data of the environment where the target object is located, such as time, amplitude, attenuation ratio, sound source angle relative to the target object and the like of indoor environment echoes input at the target object and returned to the user equipment end;
and a second level: personal static feature data such as personal demographic data (age, gender, height, etc.), and personal historical training data models (personal historical best-look models);
a third level: personal dynamic feature data such as acceleration, ROI view, etc.
Optionally, the process of training the selected general model by using the training sample data may include:
Leading the preprocessed training sample data into an input layer of the neural network, and outputting the training sample data from the output layer after the training of the hidden layer;
detecting whether the result output by the output layer reaches an expected result, if not, obtaining an error signal according to the output result and the expected result, and entering a back propagation stage;
and taking the error signal as an input signal in a back propagation stage to reversely return from the output layer to the input layer, and modifying connection weight values of neurons among the input layer, the output layer and the hidden layer in the process of reversely returning so as to gradually reduce the finally output error signal.
In at least one embodiment of the present invention, the second communication device 13 may include a second preprocessing module and a second processing module; the second preprocessing module is used for preprocessing the received second data part to obtain second preprocessed data; and the second processing module is used for inputting the second preprocessing data and the third data part into a pre-trained viewing model matched with the current target environment to obtain the target audio/video data.
Optionally, the preprocessing performed on the second data portion may include motion state recognition, denoising, time domain and frequency domain feature extraction, fusion recognition and the like, and the frames with the visual retention time longer than 20ms may be automatically compensated by using a machine learning feedforward algorithm model, so as to obtain data suitable for model input.
Optionally, the viewing model is obtained by training based on training sample data of all target objects in the target environment. The film watching model is obtained by utilizing second training sample data to train in advance; the second training sample data is for all target objects in the current target environment, the second training sample data comprising: feature data of the current target environment, and static feature data and dynamic feature data of each target object in the current target environment.
In at least one embodiment of the present invention, optionally, the multimedia processing system may further include: a resource equalizer; the resource equalizer is connected to the plurality of acquisition components 11, the first communication device 12, and the second communication device 13, respectively, and configured to perform resource coordination.
Thus, by means of the resource balancer, the time efficiency of data processing can be further improved.
For example, for a plurality of collection components and a plurality of edge servers, if the load rate of one party exceeds 50%, the calculation task, i.e., the preprocessing task, can be coordinated to other nodes less than 50%, and if both of them exceed, the calculation can be performed by the cloud server.
Referring to fig. 2, fig. 2 is a flowchart of a multimedia processing method according to an embodiment of the present invention, the method is applied to the second communication device, as shown in fig. 2, and the method includes the following steps:
step 201: the second data portion is obtained from the plurality of acquisition components and the third data portion is obtained from the at least one first communication device.
Optionally, the third data portion is obtained by processing a first data portion, and the first data portion and the second data portion are obtained by dividing audio/video data and motion state data of the target object acquired by the plurality of acquisition components.
Step 202: and determining target audio and video data to be played according to the second data part and the third data part.
Optionally, after the target audio/video data to be played is obtained, the target audio/video data may be output through the user equipment terminal.
In the embodiment of the invention, the data of the target object can be processed simultaneously by the first communication equipment and the second communication equipment, so that the time efficiency in processing the multimedia data is improved, the phenomena of asynchronous pictures and sounds and the like are prevented, the viewing effect is enhanced, and better virtual reality experience is brought to a user.
Optionally, the first communication device may be an edge computing device, such as an edge server; the second communication device may be a cloud computing device, such as a cloud server.
Optionally, the third data portion is obtained by the first communication device inputting first preprocessing data into a pre-trained edge processing model matched with the current target environment, and the first preprocessing data is obtained by preprocessing the first data portion.
Optionally, the edge processing model is obtained by training based on training sample data of the target object in the target environment. The edge processing model is obtained by utilizing first training sample data to train in advance; the first training sample data is for a single target object, that is, a target object corresponding to a respective first communication device, and includes: characteristic data of the environment where the target object is located, and static characteristic data and dynamic characteristic data of the target object.
Optionally, step 202 may include:
preprocessing the second data part to obtain second preprocessed data;
and inputting the second preprocessing data and the third data part into a pre-trained viewing model matched with the current target environment to obtain the target audio/video data.
Optionally, the viewing model is obtained by training based on training sample data of all target objects in the target environment. The film watching model is obtained by utilizing second training sample data to train in advance; the second training sample data is for all target objects in the current target environment, the second training sample data comprising: feature data of the current target environment, and static feature data and dynamic feature data of each target object in the current target environment.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a multimedia processing apparatus according to an embodiment of the present invention, as shown in fig. 3, the apparatus includes:
an obtaining module 31 for obtaining the second data portion from the plurality of collecting assemblies and the third data portion from the at least one first communication device.
Optionally, the third data portion is obtained by processing a first data portion, and the first data portion and the second data portion are obtained by dividing audio/video data and motion state data of the target object acquired by the plurality of acquisition components;
And the determining module 32 is configured to determine target audio/video data to be played according to the second data portion and the third data portion.
Optionally, the third data portion is obtained by the first communication device inputting first preprocessing data into a pre-trained edge processing model matched with the current target environment, and the first preprocessing data is obtained by preprocessing the first data portion.
Optionally, the edge processing model is obtained by training based on training sample data of the target object in the target environment. The edge processing model is obtained by utilizing first training sample data to train in advance; the first training sample data is for a single target object, that is, a target object corresponding to a respective first communication device, and includes: characteristic data of the environment where the target object is located, and static characteristic data and dynamic characteristic data of the target object.
Optionally, the determining module 32 is specifically configured to:
preprocessing the second data part to obtain second preprocessed data;
and inputting the second preprocessing data and the third data part into a pre-trained viewing model matched with the current target environment to obtain the target audio/video data.
Optionally, the viewing model is obtained by training based on training sample data of all target objects in the target environment. The film watching model is obtained by utilizing second training sample data to train in advance; the second training sample data is for all target objects in the current target environment, the second training sample data comprising: feature data of the current target environment, and static feature data and dynamic feature data of each target object in the current target environment.
In the embodiment of the present invention, each process of the method embodiment shown in fig. 2 can be implemented, and the same technical effect can be achieved, and in order to avoid repetition, the details are not described here.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a communication device according to an embodiment of the present invention, and as shown in fig. 4, the communication device 40 includes: a processor 41, a memory 42, and a computer program stored in the memory 42 and capable of running on the processor 41, where the components in the communication device 40 are coupled together through a bus interface 43, and when the computer program is executed by the processor 41, the processes of the above-mentioned multimedia processing method embodiment can be implemented, and the same technical effect can be achieved, and in order to avoid repetition, details are not described here again.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements each process of the foregoing multimedia processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
Computer-readable media, which include both non-transitory and non-transitory, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A multimedia processing method applied to a second communication device, comprising:
acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio/video data and motion state data of a target object acquired by the plurality of acquisition components;
and determining target audio and video data to be played according to the second data part and the third data part.
2. The method of claim 1, wherein the first communication device is an edge computing device and the second communication device is a cloud computing device.
3. The method of claim 1, wherein the third data portion is obtained by the first communication device inputting first pre-processed data into a pre-trained edge processing model matching a current target environment, and wherein the first pre-processed data is obtained by pre-processing the first data portion.
4. The method of claim 3, wherein the edge processing model is trained based on training sample data of the target object in the target environment.
5. The method according to claim 1, wherein the determining target audio-video data to be played according to the second data portion and the third data portion comprises:
preprocessing the second data part to obtain second preprocessed data;
and inputting the second preprocessing data and the third data part into a pre-trained viewing model matched with the current target environment to obtain the target audio/video data.
6. The method of claim 5, wherein the viewing model is trained based on training sample data of all target objects in the target environment.
7. A multimedia processing apparatus, applied to a second communication device, comprising:
the acquisition module is used for acquiring second data parts from a plurality of acquisition components and acquiring third data parts from at least one first communication device, wherein the third data parts are obtained by processing first data parts, and the first data parts and the second data parts are obtained by dividing audio and video data and motion state data of a target object acquired by the plurality of acquisition components;
And the determining module is used for determining target audio and video data to be played according to the second data part and the third data part.
8. A multimedia processing system, comprising: the system comprises a plurality of acquisition components, at least one first communication device, a second communication device and a plurality of user equipment terminals;
the acquisition components are respectively connected with the first communication equipment and the second communication equipment and are used for acquiring audio and video data and motion state data of a target object, sending a first data part to the first communication equipment and sending a second data part to the second communication equipment; the first data part and the second data part are obtained by dividing audio and video data and motion state data of the target object;
the plurality of first communication devices are respectively connected with the second communication device and used for processing the received first data part to obtain a third data part and sending the third data part to the second communication device;
the second communication equipment is respectively connected with the plurality of user equipment terminals and is used for processing the received second data part and the third data part to obtain target audio and video data to be played and respectively sending the target audio and video data to each user equipment terminal;
And the user equipment end is used for outputting the received target audio and video data.
9. A communication device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program, when executed by the processor, implements the steps of the multimedia processing method according to any of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the multimedia processing method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910428614.2A CN111984111A (en) | 2019-05-22 | 2019-05-22 | Multimedia processing method, device and communication equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910428614.2A CN111984111A (en) | 2019-05-22 | 2019-05-22 | Multimedia processing method, device and communication equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111984111A true CN111984111A (en) | 2020-11-24 |
Family
ID=73435949
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910428614.2A Pending CN111984111A (en) | 2019-05-22 | 2019-05-22 | Multimedia processing method, device and communication equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111984111A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113012497A (en) * | 2021-03-24 | 2021-06-22 | 东莞市臻兴电子科技有限公司 | Chinese and English evaluation system for paper book |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106792246A (en) * | 2016-12-09 | 2017-05-31 | 福建星网视易信息系统有限公司 | A kind of interactive method and system of fusion type virtual scene |
CN107102728A (en) * | 2017-03-28 | 2017-08-29 | 北京犀牛数字互动科技有限公司 | Display methods and system based on virtual reality technology |
CN107995503A (en) * | 2017-11-07 | 2018-05-04 | 西安万像电子科技有限公司 | Audio and video playing method and apparatus |
WO2018095400A1 (en) * | 2016-11-24 | 2018-05-31 | 深圳市道通智能航空技术有限公司 | Audio signal processing method and related device |
CN109474648A (en) * | 2017-09-07 | 2019-03-15 | 中国移动通信有限公司研究院 | A kind of compensation method and server device of virtual reality interaction |
CN109640125A (en) * | 2018-12-21 | 2019-04-16 | 广州酷狗计算机科技有限公司 | Video content processing method, device, server and storage medium |
US10277813B1 (en) * | 2015-06-25 | 2019-04-30 | Amazon Technologies, Inc. | Remote immersive user experience from panoramic video |
-
2019
- 2019-05-22 CN CN201910428614.2A patent/CN111984111A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10277813B1 (en) * | 2015-06-25 | 2019-04-30 | Amazon Technologies, Inc. | Remote immersive user experience from panoramic video |
WO2018095400A1 (en) * | 2016-11-24 | 2018-05-31 | 深圳市道通智能航空技术有限公司 | Audio signal processing method and related device |
CN106792246A (en) * | 2016-12-09 | 2017-05-31 | 福建星网视易信息系统有限公司 | A kind of interactive method and system of fusion type virtual scene |
CN107102728A (en) * | 2017-03-28 | 2017-08-29 | 北京犀牛数字互动科技有限公司 | Display methods and system based on virtual reality technology |
CN109474648A (en) * | 2017-09-07 | 2019-03-15 | 中国移动通信有限公司研究院 | A kind of compensation method and server device of virtual reality interaction |
CN107995503A (en) * | 2017-11-07 | 2018-05-04 | 西安万像电子科技有限公司 | Audio and video playing method and apparatus |
CN109640125A (en) * | 2018-12-21 | 2019-04-16 | 广州酷狗计算机科技有限公司 | Video content processing method, device, server and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113012497A (en) * | 2021-03-24 | 2021-06-22 | 东莞市臻兴电子科技有限公司 | Chinese and English evaluation system for paper book |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3815398B1 (en) | Multi-sync ensemble model for device localization | |
CN102270275B (en) | The method of selecting object and multimedia terminal in virtual environment | |
WO2022105519A1 (en) | Sound effect adjusting method and apparatus, device, storage medium, and computer program product | |
KR20210123399A (en) | Animated image driving method based on artificial intelligence, and related devices | |
CN111274910B (en) | Scene interaction method and device and electronic equipment | |
CN109640224B (en) | Pickup method and device | |
US10762663B2 (en) | Apparatus, a method and a computer program for video coding and decoding | |
CN112598780B (en) | Instance object model construction method and device, readable medium and electronic equipment | |
CN113709543A (en) | Video processing method and device based on virtual reality, electronic equipment and medium | |
US11516296B2 (en) | Location-based application stream activation | |
CN111984111A (en) | Multimedia processing method, device and communication equipment | |
CN116778058B (en) | Intelligent interaction system of intelligent exhibition hall | |
CN111192305B (en) | Method and apparatus for generating three-dimensional image | |
CN116095353A (en) | Live broadcast method and device based on volume video, electronic equipment and storage medium | |
Zhang et al. | Automatic generation of spatial tactile effects by analyzing cross-modality features of a video | |
CN115546408A (en) | Model simplifying method and device, storage medium, electronic equipment and product | |
CN111738087B (en) | Method and device for generating face model of game character | |
CN115442519A (en) | Video processing method, device and computer readable storage medium | |
Wu et al. | Acuity: Creating realistic digital twins through multi-resolution pointcloud processing and audiovisual sensor fusion | |
CN111652831A (en) | Object fusion method and device, computer-readable storage medium and electronic equipment | |
CN112991542B (en) | House three-dimensional reconstruction method and device and electronic equipment | |
KR20240005727A (en) | Panoptic segmentation prediction for augmented reality | |
CN115272571A (en) | Method for constructing game scene model | |
CN116841391A (en) | Digital human interaction control method, device, electronic equipment and storage medium | |
CN117291954A (en) | Method for generating optical flow data set, related method and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |