CN117065357A

CN117065357A - Media data processing method, device, computer equipment and storage medium

Info

Publication number: CN117065357A
Application number: CN202210504909.5A
Authority: CN
Inventors: 沈咸飞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-05-10
Filing date: 2022-05-10
Publication date: 2023-11-17

Abstract

The present application relates to a media data processing method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: according to a target calling logic matched with an operating system for the operation of the illusion engine, calling a media data acquisition device to acquire data, and obtaining real scene data; acquiring a media data file; recording a virtual interaction scene presented in a virtual interaction application operated based on a illusion engine to obtain virtual scene data; determining media data to be processed corresponding to a target service, wherein the media data to be processed comprises at least one of real scene data, a media data file and virtual scene data; and carrying out corresponding service processing on the media data to be processed to obtain a service processing result corresponding to the target service. By adopting the method, different platforms can be adapted, and diversified business requirements can be met.

Description

Media data processing method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer processing technology, and in particular, to a media data processing method, apparatus, computer device, storage medium, and computer program product.

Background

With the technological development of the game development field, the application of the illusion engine is more and more common. To achieve different functionality, game developers often develop targeted based on the running platform of the illusion engine.

Various plug-ins existing in the illusive engine are aimed at one platform, which causes the use of the plug-ins to be limited by an operating system, and the plug-ins may not be normally used on another platform, for example, data cannot be normally collected. If the game developer wants to acquire data from different sources, the game developer must pertinently develop and install corresponding plug-ins, so that the plug-ins are compatible with the system, and considerable manpower and material resources are consumed for maintenance, so that the application scene of the virtual engine plug-ins is limited, and the diversified business requirements cannot be met.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a media data processing method, apparatus, computer device, computer readable storage medium, and computer program product.

In one aspect, the present application provides a media data processing method. The method comprises the following steps:

under the condition of a data acquisition instruction, calling a media data acquisition device to acquire data according to a target calling logic matched with an operating system for the operation of the illusion engine, so as to acquire real scene data;

Under the condition of a data reading instruction, acquiring a media data file;

recording a virtual interaction scene presented in a virtual interaction application operated based on a illusion engine under the condition of a data recording instruction to obtain virtual scene data;

determining media data to be processed corresponding to a target service, wherein the media data to be processed comprises at least one of the following: the real scene data, the media data file, and the virtual scene data;

and carrying out corresponding service processing on the media data to be processed to obtain a service processing result corresponding to the target service.

On the other hand, the application also provides a media data processing device. The device comprises:

the acquisition module is used for calling the media data acquisition device to acquire data according to a target calling logic matched with an operating system for the operation of the illusion engine under the condition of a data acquisition instruction, so as to obtain real scene data;

the reading module is used for acquiring the media data file under the condition of generating a data reading instruction;

the recording module is used for recording the virtual interaction scene presented in the virtual interaction application operated based on the illusion engine under the condition of generating a data recording instruction to obtain virtual scene data;

The determining module is used for determining to-be-processed media data corresponding to the target service, wherein the to-be-processed media data comprises at least one of the following: the real scene data, the media data file, and the virtual scene data;

and the processing module is used for carrying out corresponding service processing on the media data to be processed to obtain a service processing result corresponding to the target service.

On the other hand, the application also provides computer equipment. The computer device comprises a memory storing a computer program and a processor implementing the steps of the above media data processing method when the processor executes the computer program.

In another aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the above-described media data processing method.

In another aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the above-described media data processing method.

According to the media data processing method, the device, the computer equipment, the storage medium and the computer program product, the media data acquisition device is called to acquire the data to obtain the real scene data according to the target calling logic matched with the operating system, so that different operating systems and platforms can be adapted, plug-ins do not need to be developed aiming at specific platforms, and the applicability is higher; meanwhile, a media data file is acquired in response to a data reading instruction, and a virtual interaction scene presented in a virtual interaction application operated based on a illusion engine is recorded in response to a data recording instruction to obtain virtual scene data, so that the media data in different source modes are acquired, at least one of the media data is selected as media data to be processed corresponding to a target service, and the corresponding service of the media data to be processed is processed according to the service requirement of the target service, so that a service processing result corresponding to the target service is obtained, the media data in various sources can be flexibly processed according to the actual service requirement, the diversified service requirements can be met, and the applicable scene is wider.

Drawings

FIG. 1 is an application environment diagram of a media data processing method in one embodiment;

FIG. 2 is a flow chart of a media data processing method according to an embodiment;

FIG. 2A is a schematic diagram of acquiring real scene data in one embodiment;

FIG. 3 is a schematic diagram of an overall frame in one embodiment;

FIG. 4A is a schematic diagram of detecting traffic in one embodiment;

FIG. 4B is a diagram of format conversion in one embodiment;

FIG. 4C is a schematic diagram showing the specific steps of detecting traffic in one embodiment;

FIG. 5A is a schematic diagram of video encoding traffic in one embodiment;

FIG. 5B is a schematic diagram showing the specific steps of video coding service in one embodiment;

FIG. 5C is a schematic block diagram of the PBO principle in one embodiment;

FIG. 5D is a schematic block diagram of the PBO principle in another embodiment;

FIG. 6 is a schematic diagram of a play service in one embodiment;

FIG. 7 is a schematic diagram of an overall framework of a media data processing method in one embodiment;

FIG. 8 is a block diagram of a media data processing device in one embodiment;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The media data processing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers. The terminal 102 invokes the media data acquisition device through the plug-in to acquire real scene data, acquires a media data file through the plug-in, captures a virtual interaction scene presented in the virtual interaction application through the plug-in to acquire virtual scene data, determines media data to be processed corresponding to the target service according to at least one of the real scene data, the media data file and the virtual scene data, and performs corresponding service processing on the media data to be processed to acquire a service processing result corresponding to the target service. The media data file may be downloaded and transmitted by the terminal 102 through the server 104.

The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like.

The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like.

In some embodiments, APP (Application) developed based on the illusion engine is loaded on the terminal, and the terminal drives the illusion engine to realize picture display and audio play when the Application is run.

Among them, the illusion Engine (un real Engine) is a game Engine, including multiple versions, and currently, the 4-generation version, i.e., UE4, is widely used. The game engine refers to a series of core components of a compiled editable game system or an interactive real-time image application program, so that game developers can quickly compile game programs by using tools provided by the compiled editable game system. Editable gaming systems include, for example, renderers, physical engines, collision detection systems, sound effects, script engines, computer animations, and network engines. The illusion engine may run in a different operating system. The Operating System (Operating System) refers to a computer program configured in the terminal to manage computer hardware and software resources, including, but not limited to, a Windows Operating System, a macOS Operating System, a Linux Operating System, an iOS Operating System, an Android Operating System, an iPad OS Operating System, and the like.

The illusion engine is preset with a modularized system, a custom plug-in and a source control set. Game developers need to develop on the basis of the code framework provided by the illusion engine. To implement different functions, various services that can be applied are typically provided in the form of plug-ins. Wherein a plug-in is a collection of code and data that a game developer can enable or disable item by item in an Editor (Editor). The plug-in may add functionality to modify the built-in engine functionality or add functionality, create a file type, and extend the functionality of the editor using new menus, toolbar commands, and sub-modes.

Since the code framework provided by the illusion engine can be applied to different platforms (including mobile phones, computers, hosts and the like), and the different platforms have their own operating systems and running logic, game developers and the like develop pertinently according to the used platforms, which results in that the current plug-in is only applicable to one operating system. For example, a Media Framework plug-in of the illusion engine can only be applied to a Windows system, and the system authority cannot be obtained in the macOS operating system to obtain the use authority of the camera.

For cross-platform game items, on the one hand, most plug-ins do not support secondary development, such as provided in Arkit (a plug-in that can be used to create an augmented reality application) with the ability to acquire the underlying camera stream of the iOS operating system, but do not allow secondary development of the relevant plug-ins and are not suitable for other operating systems. On the other hand, a great deal of modification is needed to the code logic of the plug-ins, and great effort and time cost are required to carry out related development and test of plug-in compatibility, plug-in splicing and the like. Furthermore, the codes of the combination of different plug-ins are maintained on different platforms, so that the error is very easy to occur, and the maintainability is poor.

Accordingly, embodiments of the present application provide a media data processing method to solve the above-mentioned problems. The media data processing method provided by the embodiment of the application can be integrated in one audio/video suite, and the audio/video suite is operated by the terminal to realize the butt joint of various platforms and execute diversified business functions.

In some embodiments, the audio/video suite may be a plug-in, and the steps of the media data processing method described above are integrated into the code of the plug-in. In other embodiments, the audio/video suite may include a plurality of plugins, where each plugin may perform data transmission, plugin multiplexing, and the like, and the respective plugins collectively perform the steps of the media data processing method described above.

In one embodiment, as shown in fig. 2, a media data processing method is provided, which may be applied to a terminal or a server in fig. 1, or may be cooperatively performed by the terminal and the server. In the following, the method is applied to the terminal in fig. 1 as an example, and the terminal executes the media data processing method through the audio/video suite operated by the terminal so as to realize the cross-platform service function. Specifically, the method comprises the following steps:

step S202, under the condition of data acquisition instructions, according to target calling logic matched with an operating system for the operation of the illusion engine, a media data acquisition device is called to acquire data, and real scene data is obtained.

The data acquisition instruction is used for instructing a media data acquisition device configured in the terminal to acquire data of one or more media data. Media data collection means refers to means provided by the terminal hardware for collecting media data, including but not limited to one or more of a microphone, a camera, etc. Media data includes, but is not limited to, image data, audio data, and the like. The media data may also be video data, which may be composed of one of image data or audio data, or composed of both image data and audio data.

Wherein, the calling logic refers to a series of operation flows matched with the corresponding functions. For example, the calling logic includes a series of operation flows of calling the system interface, applying for the system authority through the system interface, and calling the corresponding hardware device after the system authority is acquired.

For media data acquisition, the terminal can initiate a system permission application through a system interface of the audio/video suite, so that the corresponding use permission of the media data acquisition device is acquired.

In the embodiment of the application, the calling logic of the media data acquisition device of various operating systems is encapsulated in the system interface of the audio/video suite, so that the system can be adapted to various operating systems, and the terminal can call corresponding hardware to acquire local media data by running the audio/video suite.

For convenience of distinction, media data acquired by a media data acquisition device provided through terminal hardware is referred to as real scene data.

Specifically, under the condition that a data acquisition instruction occurs, a terminal responds to the data acquisition instruction, determines a currently operated operating system, selects a target calling logic matched with the currently operated operating system from a plurality of calling logics respectively corresponding to different operating systems and packaged in an audio-video suite according to the operating system, and initiates calling through a system interface of the audio-video suite, and the audio-video suite executes the target calling logic, so that a media data acquisition device provided by terminal hardware is called to acquire data, and real scene data is obtained.

The terminal determines that the currently running operating system is a Windows operating system according to the data acquisition instruction, and initiates system permission acquisition through a system interface of an audio/video suite, and the audio/video suite executes target call logic matched with the Windows operating system so as to acquire the system permission of the Windows operating system, thereby calling a camera and a microphone to acquire data.

As shown in fig. 2A, the terminal invokes the camera and the microphone to obtain the camera image and the audio data through the audio and video suite call request, so as to obtain the real scene data, and the real scene data is used for being subsequently loaded into the audio and video suite to complete the processing of the target service.

Therefore, the calling logic adapted to various operating systems is provided through the audio and video suite, and a plurality of sets of calling logic are packaged in the system interface, so that no perception on the operation flow of the bottom layer is realized when the audio and video suite is operated, rights acquisition can be completed only by calling the system interface, the operation is portable, and the use threshold is reduced.

In step S204, in the case of a data reading instruction, a media data file is acquired.

The data reading instruction is used for indicating the terminal to read the media data file. The media data file is media data encapsulated in a specific file format. The process of packaging media data is typically a process of packaging and combining video compression data and audio compression data into one file. The file formats include, but are not limited to, one or more of AVI (Audio Video Interleave ) format, WMV (Windows Media Video, microsoft streaming) format, and MPEG (Moving Picture Experts Group ) format, among others.

Specifically, in the case where the data reading instruction occurs, the terminal reads a pre-stored media data file from the local storage space in response to the data reading instruction, or reads a media data file transmitted by another terminal or server received in real time. For example, the media data file may be downloaded and stored in the local storage space by the terminal in advance, and for example, the terminal responds to the data reading instruction, establishes a communication connection with other terminals or servers, and reads the media data file transmitted by the other terminals or servers.

Step S206, recording the virtual interaction scene presented in the virtual interaction application operated based on the illusion engine under the condition of the data recording instruction, and obtaining virtual scene data.

The data recording instruction is used for instructing the terminal to record the picture and sound of the currently running virtual interactive application. Virtual interactive applications refer to applications, such as game programs, that run based on a illusion engine. The virtual interactive application presents a virtual interactive scene in the running process, wherein the virtual interactive scene consists of pictures and sound and is used for an operator to watch the scenario, control the roles to perform various actions and the like.

The data recorded by the virtual interactive scene comprises image data and audio data, and in order to distinguish the data from the real scene data acquired by the terminal through calling the media data acquisition device, the data provided by the virtual interactive application is called virtual scene data.

Specifically, under the condition that a data recording instruction occurs, the terminal responds to the data recording instruction to record a virtual interaction scene presented in a virtual interaction application running in an operating system, so as to obtain virtual scene data. For example, when the terminal runs the virtual interactive application, the terminal displays a picture through the display screen and plays sound through the loudspeaker, and then the terminal acquires the image displayed by the display screen and the audio played by the loudspeaker to obtain virtual scene data.

The terminal may record by calling the media collection device, thereby obtaining virtual scene data. Or the terminal can also call the audio/video suite to realize recording. In some embodiments, the terminal may obtain the virtual scene data through a Media Framework plug-in provided by the illusion engine.

In step S208, the to-be-processed media data corresponding to the target service is determined, where the to-be-processed media data includes at least one of real scene data, a media data file, and virtual scene data.

According to different service demands, media data from different sources need to be processed, so that corresponding service functions are realized. Therefore, specifically, before the media data is specifically processed, the terminal determines the current target service and determines the media data to be processed corresponding to the target service. The media data to be processed may be single media data of real scene data, media data file and virtual scene data, or may be a combination of two media data, or may be a combination of three media data, etc.

For example, when the function of recording a game scene is required to be executed, the media data to be processed at least comprises virtual scene data, and on the basis, the media data can also comprise virtual scene data and real scene data according to actual requirements, and can also comprise virtual scene data, real scene data and media data files. For another example, when the function of playing audio and video in the game needs to be executed, the media data to be processed at least comprises a media data file, and at least one of the media data of other two sources can be further included according to actual requirements on the basis. For another example, when the function of customizing and modifying the data is required to be executed, the required media data to be processed at least comprises real scene data, and on the basis, the media data can also comprise real scene data and media data files according to actual requirements, and can also comprise media data of all three sources, and the like.

Step S210, corresponding service processing is carried out on the media data to be processed, and a service processing result corresponding to the target service is obtained.

The processing flow of performing specific service processing on the media data to be processed is different according to different target services. Specifically, the terminal operates the audio and video suite to perform corresponding service processing on the media data to be processed according to the determined target service and a preset processing flow matched with the target service, so as to obtain a service processing result corresponding to the target service. Still referring to the above example, for a target service recorded in a game scene, a service processing result obtained by performing service processing on media data to be processed by a terminal is a coded recorded file.

It should be noted that, the service processing result is a final processing result of the target service, and the final processing result may be used as input of other target services. Such as the recorded file in the above example, which is stored, played, transmitted, etc. by the subsequent terminal.

For example, as shown in fig. 3, the terminal loads the acquired media data to be processed into the audio/video suite, and completes the service processing of the media data to be processed by running the audio/video suite according to different target services. For example, for a target service for algorithm detection, the terminal performs format conversion on media data to be processed by running the audio/video suite, and performs corresponding algorithm detection, such as image detection and voice detection, on the data after format conversion, so as to obtain an algorithm detection result as a service processing result, where the service processing result can be used for other subsequent target services and used as an input, such as a training sample of a neural network to be trained, and the like.

For another example, for a target service of playing audio and video in a game, a terminal acquires to-be-processed media data including a media data file, converts the to-be-processed media data into a data format supported by a illusion engine by running an audio and video suite, and combines a visual control provided by the illusion engine, so that the to-be-processed media data is played in a virtual interactive application running by the terminal, for example, image data of the to-be-processed media data is displayed in a virtual interactive scene provided by the virtual interactive application, and the like.

For another example, for a target service of game scene recording, after obtaining to-be-processed media data including virtual scene data and media data files, a terminal performs service processing on the to-be-processed media data by running an audio/video suite, obtains a coding state, and codes video data and audio data in the to-be-processed media data, thereby obtaining coded recording files.

A plurality of interfaces can be arranged in the audio/video suite for realizing different functions, such as realizing data transmission through the interfaces. In order to conveniently and quickly dock various different platforms and adapt to various different operating systems, the audio and video suite is provided with a data acquisition universal interface, so that terminal hardware is quickly called to acquire data by calling the data acquisition universal interface. To this end, in some embodiments, invoking a media data collection device to collect data according to a target invocation logic that matches an operating system for execution by the illusion engine to obtain real scene data includes: determining a data acquisition universal interface, wherein calling logic matched with a media data acquisition device respectively corresponding to a plurality of different operating systems is packaged in the data acquisition universal interface; determining a target calling logic which is encapsulated in a data acquisition universal interface and is matched with the system type of an operating system; and calling a media data acquisition device to acquire data through the data acquisition universal interface based on the target calling logic, so as to obtain real scene data.

Specifically, in a data acquisition universal interface provided in the audio-video suite, call logic of various operating systems supported by the same is packaged, and the call logic is used for calling terminal hardware when being executed, such as calling a camera or a microphone. And the terminal selects a set of calling logic matched with the operating system from a plurality of sets of calling logic according to the current operating system by calling a data acquisition universal interface of the audio/video suite as a target calling logic. And the audio and video suite running by the terminal executes the operation flow indicated by the target calling logic, and calls the media data acquisition device through the data acquisition universal interface to acquire data, so that the real scene data is obtained.

In the above embodiment, by encapsulating multiple sets of call logic in the data acquisition universal interface, the target call logic to be used is determined specifically according to the currently configured operating system during running, so that different platforms can be docked and various operating systems can be adapted, and the method has wide applicability.

For the task of data acquisition, the main purpose is to obtain the calling authority of the terminal hardware. In some embodiments, invoking the media data collection device to perform data collection based on the target invocation logic to obtain real scene data comprises: based on the target calling logic, initiating a permission acquisition request to an operating system; under the condition that the calling authorization is obtained according to the permission acquisition request, calling a media data acquisition device to acquire data; and receiving the real scene data acquired by the media data acquisition device.

Specifically, the terminal executes the target calling logic through the audio/video suite, and the audio/video suite initiates a permission acquisition request to the current operating system, wherein the permission acquisition request is used for acquiring the calling permission of the media data acquisition device configured by the terminal. After the operating system is authorized, the audio and video suite obtains the calling authority and calls the media data acquisition device to acquire data.

The terminal requests the operation system to acquire the use authority of the microphone through the audio/video suite, and invokes the microphone to collect audio data after acquiring the authorization. Or the terminal requests the operation system to acquire the use authority of the camera through the audio/video suite, and invokes the camera to acquire image data or video data after acquiring the authority.

In the above embodiment, by encapsulating multiple sets of call logic in the data acquisition universal interface, the target call logic to be used is determined specifically according to the currently configured operating system during running, so that different platforms can be docked and various operating systems can be adapted, and the method has wide applicability. Meanwhile, the system authority is required to be acquired before the media data acquisition device is called, so that the system has certain safety.

In order to improve the overall processing efficiency and optimize the performance of the audio/video suite, in some embodiments, in the process of loading data of the audio/video suite, performing service processing (such as data customizing processing, decoding, encoding and the like) on different target services by the audio/video suite, and the like, acceleration processing is performed, so as to further improve the service processing efficiency of the audio/video suite.

As further shown in fig. 3, for example, when the terminal loads media data to be processed into the audio/video suite, the process of loading the data may be accelerated. For this purpose, in addition to the universal data collection interface for adapting to different platforms mentioned in the above embodiment, an acceleration interface is further provided in the audio/video suite, where the acceleration interface is used to increase the speed during the data loading process, so as to accelerate the data transmission.

Accordingly, in some embodiments, before performing corresponding service processing on the media data to be processed to obtain a service processing result corresponding to the target service, the method further includes: determining an acceleration interface matched with an operating system; and performing corresponding decoding processing on the media data to be processed through an acceleration interface matched with the operating system, and performing corresponding business processing on the media data to be processed based on the decoded media data to be processed.

The acceleration interface is preset with interface use specifications adapted to different operating systems, so that after the audio/video suite receives media data to be processed, further identification or conversion and other processes are not needed, and the data loading speed is increased. The interface usage specification includes, for example, interface parameters used by the operating system, data formats, and communication protocols, among others.

Specifically, the terminal loads media data to be processed into the audio/video suite by calling an acceleration interface of the audio/video suite, and decodes the media data to be processed by the audio/video suite, so that corresponding business processing is performed based on the decoded media data to be processed to obtain a business processing result. Therefore, the process of loading the media data to be processed into the audio/video suite is accelerated, so that the efficiency of data transmission is improved, and the processing efficiency of the target service is improved.

In order to better understand the technical concept of the present application, the processing method of media data provided by the embodiment of the present application is described below in a plurality of different application scenarios.

Taking fig. 4A as an example, in some embodiments, the target service includes a detection service for algorithmically detecting media data to be processed loaded into the audio-video suite. In the detection service, the terminal performs algorithm detection, such as image detection and voice detection, on media data to be processed by running an audio/video suite, so as to obtain an algorithm detection result as a service processing result, wherein the service processing result can be used for other subsequent target services and used as input, such as a training sample of a neural network to be trained, and the like. In other words, the media data to be processed corresponding to the target service may include at least one of real scene data and a media data file.

Correspondingly, the corresponding service processing is carried out on the media data to be processed to obtain a service processing result corresponding to the target service, which comprises the following steps: determining a target format adapted to a target detection algorithm; converting the data format of the media data to be processed into a target format; image detection is carried out on the video data to be processed in the media data to be processed based on a target detection algorithm, so that a video detection result is obtained, and audio detection is carried out on the audio data to be processed in the media data to be processed, so that an audio detection result is obtained; and determining a service processing result corresponding to the target service based on the video detection result and the audio detection result.

For real scene data, the coding modes of different operating systems for acquiring image data through a camera are not uniform: some operating systems use YUV (a color coding method) format as the coding scheme, while some platforms use RGB (a color mode) format as the coding scheme. While the data format as input to the detection algorithm tends to be uniform. Therefore, it is necessary to convert the format of the input media data into a format supported by the detection algorithm, i.e., a target format.

Specifically, according to the current detection service, the terminal performs data preprocessing on the media data to be processed loaded into the audio/video suite, and firstly converts the data format of the media data to be processed into a target format which is suitable for a target detection algorithm. For example, as shown in fig. 4B, the terminal performs format conversion on the media data to be processed, thereby converting the media data to be processed from the original format to a target format adapted to the target detection algorithm.

After format conversion, the terminal runs a target detection algorithm through an audio/video suite, and performs image detection on the video data to be processed in the media data to be processed to obtain a video detection result, wherein the video detection result comprises but is not limited to detection results such as a target object image, a background image and the like. The related detection algorithm of the image detection includes, but is not limited to, one or more of an image recognition algorithm, a target detection algorithm, a threshold segmentation algorithm, an action detection algorithm and the like according to different algorithm purposes. Meanwhile, audio detection is performed on audio data to be processed in the media data to be processed, so that an audio detection result is obtained, wherein the video detection result comprises, but is not limited to, human voice, environment voice, music voice and the like in the extracted audio data. The related detection algorithm of the audio detection comprises one or more of a voice emotion recognition algorithm, an audio rhythm detection algorithm, a semantic detection algorithm, a noise detection algorithm and the like according to different algorithm purposes. And according to the video detection result and the audio detection result, the terminal can determine the service processing result corresponding to the detection service.

Illustratively, as shown in fig. 4C, taking an example that the media data to be processed includes real scene data and a media data file, the terminal separates audio and video in the real scene data and the media data file respectively to obtain native audio data and native image data of the real scene data, and obtains file video data and file audio data in the media data file.

In the detection service, the audio/video suite performs preprocessing on the data, wherein the preprocessing includes format conversion on the data so that the format of the data input into the detection algorithm accords with the format specified by the detection algorithm. Specifically, the audio and video suite performs format conversion on input data according to a target format specified by an algorithm protocol to obtain video algorithm data and audio algorithm data. Carrying out algorithm detection on the video algorithm data by using an image detection algorithm to obtain a video detection result; performing audio detection on the audio algorithm data by a voice detection algorithm to obtain an audio detection result; thereby obtaining an algorithm detection result. The algorithm detection result can be used for other subsequent target services.

In the above embodiment, the native audio/video input is converted into the data matched with the audio/video algorithm protocol, and then the data is used as the algorithm input to perform algorithm detection, so that the data can be further processed later, and the customization transformation of the data is realized. Meanwhile, different algorithms can be adopted for detection according to service requirements, the algorithms can be replaced and updated in real time, the use is more flexible, and different service requirements can be met.

In order to further improve the processing efficiency, as shown in fig. 4B, acceleration processing is also performed during the data format conversion. To this end, in some embodiments, converting the data format of the media data to be processed to a target format includes: a sub-thread is started, the original format of the media data to be processed is determined through the sub-thread, and the media data to be processed is converted from the original format to a target format through a logic conversion function; after the sub-thread performs the conversion operation, the main thread is notified that the format conversion has been completed.

Specifically, the terminal creates a sub-thread and starts the sub-thread to perform the process of format conversion. The sub-thread firstly accesses the media data to be processed to determine the original format of the media data to be processed, and then the logical conversion function is loaded to convert the media data to be processed from the original format to the target format. For example, for image data, a sub-thread accesses the media data to be processed and reads it pixel by pixel, and converts the media data to be processed from the original format to the target format by a logical conversion function, thereby outputting the pixels in the target format.

Illustratively, the conversion process may be accelerated by the NEON instruction, which initiates a sub-thread when a format conversion is triggered, to address the problem of blocking the main thread when a format conversion is performed. The NEON instruction is a 128-bit SIMD (Single Instruction, multiple Data, single instruction, multiple Data) extended instruction set suitable for ARM Cortex-a series processors.

After the sub-thread performs the conversion operation, the sub-thread sends a message to the main thread to prompt the main thread that the format conversion is completed, so that the main thread ends the format conversion flow. While in the process of executing format conversion by the sub-thread, the main thread can simultaneously execute other logic to improve efficiency.

In the embodiment, the sub-thread independently executes the format conversion by the sub-thread, so that the sub-thread and the main thread are not interfered with each other, the problem that the main thread is blocked by the media data in the format conversion process is effectively avoided, the format conversion speed is improved, and the processing efficiency is further improved.

In addition to the manner in which format conversion is performed by creating sub-threads for acceleration, in other embodiments, converting the data format of the media data to be processed to the target format includes: calling a drawing function matched with an operating system, and inputting media data to be processed into a graphic processor; performing batch format conversion processing by a graphic processor to convert the data format of the media data to be processed into a target format; and transmitting the media data to be processed belonging to the target format in the graphic processor back to the central processing unit.

In other words, in the present embodiment, the terminal performs format conversion by a graphics processor (Graphics Processing Unit, GPU). Specifically, the terminal calls a graphic interface (API) matched with the current operating system to realize the call of a drawing function, media data to be processed is input into a graphic processor, the graphic processor performs batch format conversion processing to convert the data format of the media data to be processed into a target format, and after the conversion is completed, the graphic processor returns the converted media data to be processed which belongs to the target format to a central processing unit (Central Processing Unit, CPU). The process of performing format conversion processing in batches by the graphics processor can be realized by running a Shader program.

For example, the terminal calls a graphic interface in the GPU to call a drawing function, and inputs the media data to be processed into the GPU by calling the drawing function, and the loader program in the GPU enables the media data to be processed to perform batch format conversion processing in the GPU so as to convert the data format of the media data to be processed into a target format, and after conversion is completed, the CPU reads the GPU texture to realize the return of the data completed by the GPU conversion.

In the embodiment, the format conversion is performed through the graphics processing, so that the parallel calculation can be performed in a large scale, the calculation speed is far higher than the processing speed of the central processing unit, the calculation resource of the central processing unit is not occupied in the conversion process, and the format conversion efficiency is greatly improved.

As stated above, the service processing result obtained by the target service processing may be used for further processing. In some application scenarios, the business process results may include drive related data, which refers to limb drive data and expression drive data for indicating virtual characters in the virtual interaction scenario, and the like. For example, the driving related data is a three-dimensional reconstruction result of human skeleton and facial expression, and the three-dimensional reconstruction result can be combined with image data in the acquired real scene data to perform expression driving, limb actions and the like in a three-dimensional displayed scene. For this purpose, in the application scenario, the method further includes: and controlling the virtual roles in the virtual interaction scene according to the driving related data, and moving according to the actions of the objects in the media data to be processed.

Specifically, the terminal loads the driving related data into the virtual interactive application according to the driving related data obtained by carrying out service processing on the media data to be processed, so that in the virtual interactive application, the virtual role in the virtual interactive scene displayed by the terminal is controlled according to the driving related data, and the virtual role is controlled to move according to the action of the object in the media data to be processed. For example, the terminal controls the limb parts of the virtual character in the virtual interaction scene to be driven to move according to the action of the object in the acquired real scene data according to the driving related data, or controls the facial expression of the virtual character to change according to the facial expression of the object in the acquired real scene data.

Therefore, the function of interaction between the virtual interaction application and the real object is provided by carrying out algorithm detection on the media data to be processed and further using the service processing result obtained by the algorithm detection for controlling the virtual character, no additional plug-in is needed to be used for realizing, and meanwhile, different platforms can be docked, so that the method has good adaptability.

Taking fig. 5A as an example, in some embodiments, the target service includes a video encoding service for encoding media data including at least virtual scene data. In the video coding service, virtual scene data such as pictures, sounds and the like in a virtual interactive application (such as a game) can be loaded into an audio-video suite, and the audio-video suite codes to obtain a target coding file. Or, the virtual scene data and the real scene data such as a native camera, a native microphone and the like captured by the terminal through calling the media data acquisition device can be loaded into the audio/video suite and encoded, so that the target encoded file is obtained. Alternatively, the virtual scene data, the real scene data, and the media data file may be encoded together to obtain the corresponding target encoded file.

In other words, the media data to be processed corresponding to the target service includes at least virtual scene data, and may further include one or more of real scene data and media data files on the basis of the virtual scene data.

Correspondingly, the corresponding service processing is carried out on the media data to be processed to obtain a service processing result corresponding to the target service, which comprises the following steps: acquiring a preset coding state; and the promoter thread encodes the media data to be processed through the sub thread according to the encoding state to obtain a target encoding file which is suitable for the operating system.

The coding state is parameter information related to coding, including but not limited to coding time, coding format, code rate, resolution, and the like. The audio and video suite provided by the embodiment of the application can provide an interface for configuring parameters, and the terminal can flexibly set the coding state by calling the interface.

In particular, the terminal may provide the operator with a visual editing interface that allows the operator to set the encoding status. Therefore, the terminal can acquire the set coding state according to the input of the editing interface. And further, the terminal carries out corresponding coding processing on the media data to be processed according to the coding state, so as to obtain a target coding file which is adapted to the local operating system.

In the process of encoding the media data to be processed, the terminal can encode the media data to be processed into target encoding files in various formats supported by a local operating system based on file formats (such as AVI or mp4 formats) adapted by the local operating system.

In some embodiments, before the terminal performs the encoding process, the terminal may start the sub-thread and execute the encoding step through the sub-thread, so that the process of the main thread is not affected, and the processing efficiency may be improved.

In the above embodiment, through the preset encoding state and the corresponding encoding processing of the media data to be processed under the sub-thread, the rapid encoding of multiple types of data can be realized in the illusion engine.

Illustratively, as shown in fig. 5B, taking an example that the media data to be processed includes real scene data and virtual scene data, the terminal separates audio and video of the real scene data and the virtual scene data, respectively, to obtain native audio data and native image data of the real scene data, and game screen data and game audio data of the virtual scene data.

In the video coding service, the audio and video suite is accelerated in the process of collecting coded data. In general, screen data at the time of capturing game operation requires that screen data rendered in the GPU be returned to the CPU, and the CPU performs a capturing operation of encoded data. But the performance of the GPU to CPU transmission process is severely limited by graphics card bandwidth. If the system interface is called to read the image data in the GPU, for example, the ReadPixel (read function) interface is called to read the image data in the GPU. However, this method blocks the main thread, and does not release the blocking state of the main thread until the complete rendering data is read into the CPU, which is extremely time-consuming.

To this end, in order to improve the processing efficiency, in some embodiments, recording a virtual interaction scene presented in a virtual interaction application running based on a illusion engine to obtain virtual scene data includes: rendering, by a graphics processor, a virtual interaction scene in a virtual interaction application operated based on a illusion engine to obtain image data; loading image data into a preset buffer area through a graphic processor; and asynchronously reading the image data in the preset buffer area by the central processing unit, and acquiring based on the image data read by the central processing unit to obtain virtual scene data.

Specifically, in the running process of the virtual interactive application, the terminal renders the virtual interactive scene in the virtual interactive application running based on the illusion engine through the graphic processor to obtain image data. And for the image data obtained by rendering, the terminal loads the image data into a preset buffer area through the graphic processor. The terminal asynchronously reads the image data in the preset buffer zone through the central processing unit, so that the acquisition operation is completed and virtual scene data are obtained.

The preset buffer is, for example, a high-speed memory area set based on PBO (Pixel Buffer Object ) technology. As shown in fig. 5C, pixel data can be read from the frame buffer of OpenGL by a glReadPixels (read function) function and written into the PBO, which can be regarded as a packing process. The pixel data can then be read from the PBO by a gldragwpixels function and copied to the frame buffer of OpenGL, which can be regarded as a decompression process. The same is true for texture objects. Therefore, by utilizing PBO to realize asynchronous DMA (direct memory access) transmission, the rapid uploading and downloading of pixel data can be realized, the efficiency is higher, and the CPU period is not influenced. As shown in fig. 5D, the GPU directly loads data such as image textures into the PBO through DMA transmission, and the CPU can asynchronously read the data in the PBO and collect the data, so as to obtain image data.

In the embodiment, the rendered image data can be rapidly transmitted through asynchronous transmission, so that the time for image acquisition is saved, and the processing efficiency is improved.

Taking fig. 6 as an example, in some embodiments, the target service further includes a play service. The playing service is used for combining with a UI (User Interface) control in the illusion engine, obtaining target video data in a data format supported by the illusion engine after carrying out data preprocessing on media data to be processed loaded into the audio/video suite, and playing the target video data in a running virtual interactive application.

In the play service, the read media data file may be played in a virtual interactive application (e.g., playing music, animation, etc.). Or, the collected real scene data can be played in the virtual interactive application.

Correspondingly, the corresponding service processing is carried out on the media data to be processed to obtain a service processing result corresponding to the target service, which comprises the following steps: performing format conversion on at least one of the real scene data and the media data file to obtain target video data suitable for the illusion engine; and in the process of playing the virtual interaction scene through the virtual interaction application, superposing the target video data for playing.

Specifically, the terminal may perform format conversion on at least one of the real scene data and the media data file to obtain the target video data adapted to the illusion engine. That is, the terminal may perform format conversion on only the real scene data based on the actual service requirement to obtain the target video data suitable for the illusion engine, or perform format conversion on only the media data file to obtain the target video data suitable for the illusion engine, or perform format conversion on both the real scene data and the media data file to obtain the target video data suitable for the illusion engine.

Further, in playing the virtual interactive scene corresponding to the virtual scene data through the virtual interactive application, the terminal may superimpose the target video data for playing. Namely, the target video data is overlapped in the virtual interaction scene for playing. By taking the virtual interaction scene as a game scene example, the method and the device can realize that the target video data are overlapped in the game scene for playing.

In some embodiments, the terminal may perform format conversion on Media data from various sources to convert to a unified data format supported by the illusion engine, and further play the Media frame in the illusion engine in combination with the interactive control. The audio and video suite provided by the embodiment of the application opens up the original UI control in the illusion engine, so that the target video data can be directly displayed on the canvas of the UI control.

In the above embodiment, at least one of the real scene data and the media data file is subjected to format conversion to obtain the target video data suitable for the illusion engine, so that the target video can be superimposed in the virtual interactive scene for playing, the presentation mode and content of the video are widened, the interactivity and the interactivity are greatly enhanced, and the real scene superimposed virtual scene can be subjected to fusion display, so that the interactivity is greatly improved.

In order to further improve the processing speed, the embodiment of the application also carries out acceleration processing in the data preprocessing process. The step of the acceleration process may refer to the description of the foregoing embodiments, and will not be repeated here.

The application also provides an application scene, which applies the media data processing method. Specifically, the application of the media data processing method in the application scene is as follows: the terminal acquires media data from different sources through the plug-in, for example, the plug-in invokes a microphone and a camera of a computer or a mobile phone to acquire real scene data, and acquires media data files through network downloading, and acquires virtual scene data through recording games through screen recording software, and the like. According to the actual service requirement, the terminal performs service processing on one or more media data through various service processing flows preset in the plug-in unit, so as to obtain a service processing result, wherein the service processing result can be used for subsequent processing and used as input, for example, the acquired media data file is played in a game, or the role in the game is driven to perform corresponding actions according to the real scene data, and the like.

Of course, the method for processing media data provided by the application is not limited to this, and can be applied to other application scenarios, such as live game, VR (Virtual Reality) video or game, and avatar rendering.

In a specific embodiment, the terminal determines a current target service and obtains media data of different sources according to service requirements of the target service. When the target service needs to use the real scene data, the terminal generates a data acquisition instruction according to the target service, operates an audio/video suite according to the data acquisition instruction, and calls a media data acquisition device to acquire the data according to a target calling logic matched with an operating system operated by the illusion engine by the audio/video suite to acquire the real scene data. When the target service needs to be used for the media data file, the terminal generates a data reading instruction according to the target service and reads the media data file according to the data reading instruction. When the target service needs to use the virtual scene data, the terminal generates a data recording instruction according to the target service, operates an audio/video suite or other plug-ins according to the data recording instruction, and records the virtual interaction scene presented in the virtual interaction application operated based on the illusion engine to obtain the virtual scene data. Of course, not limited thereto, the target service may use media data from two or more different sources. According to the target service, the terminal determines the media data to be processed corresponding to the target service, and performs corresponding service processing on the media data to be processed to obtain a service processing result corresponding to the target service. The service processing includes, for example, a detection service, a video encoding service, a play service, and the like.

The following description is given by way of a specific example in connection with an actual application scenario. Taking the illusion engine 4 generation (UE 4) as an example, as shown in fig. 7, the overall application framework of the media data processing method provided in the embodiment of the present application may respectively obtain corresponding audio data and image/video/screen data for media data (including real scene data, media data files, and virtual scene data) from different sources, and respectively perform processing of different target services based on these data, for example, detection services, video coding services, playing services, and so on in the above embodiment. In the process of related operations such as data loading, encoding and decoding, depth optimization and acceleration are also performed, the processing efficiency is improved, and the performance is more excellent.

Therefore, the embodiment of the application provides a cross-platform media data processing method for a virtual engine, which solves the problems of high development cost, high maintenance cost and high stability risk caused by the fact that only respective plug-ins of each independent platform can be assembled and used at present. Meanwhile, the audio and video suite can perform corresponding service processing based on actual service requirements, has extremely high expansibility, and is convenient for secondary development. In addition, the deep performance acceleration is carried out on the data loading and the processing process of the target service, and the processing efficiency is improved.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a media data processing device for realizing the above related media data processing method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the media data processing apparatus provided below may refer to the limitation of the media data processing method described above, and will not be repeated here.

In one embodiment, as shown in FIG. 8, there is provided a media data processing device 800 comprising: acquisition module 801, reading module 802, recording module 803, determination module 804 and processing module 805, wherein:

and the acquisition module 801 is used for calling the media data acquisition device to acquire data according to target calling logic matched with an operating system for the operation of the illusion engine under the condition of generating a data acquisition instruction, so as to obtain real scene data.

A reading module 802, configured to obtain a media data file in the case of a data reading instruction.

And the recording module 803 is used for recording the virtual interaction scene presented in the virtual interaction application running based on the illusion engine under the condition of the data recording instruction to obtain virtual scene data.

A determining module 804, configured to determine to-be-processed media data corresponding to the target service, where the to-be-processed media data includes at least one of: real scene data, media data files, and virtual scene data.

And the processing module 805 is configured to perform corresponding service processing on the media data to be processed, so as to obtain a service processing result corresponding to the target service.

In some embodiments, the reading module is further configured to determine a data collection generic interface, where call logic adapted to media data collection devices respectively corresponding to a plurality of different operating systems is encapsulated; determining a target calling logic which is encapsulated in a data acquisition universal interface and is matched with the system type of an operating system; and calling a media data acquisition device to acquire data through a data acquisition universal interface based on the target calling logic, so as to obtain real scene data.

In some embodiments, the read module is further to initiate a rights acquisition request to the operating system based on the target call logic; under the condition that the calling authorization is obtained according to the permission acquisition request, calling a media data acquisition device to acquire data; and receiving the real scene data acquired by the media data acquisition device.

In some embodiments, the recording module is further configured to render, by the graphics processor, a virtual interaction scene in a virtual interaction application running based on the illusion engine, to obtain image data; loading image data into a preset buffer area through a graphic processor; and asynchronously reading the image data in the preset buffer area by the central processing unit, and acquiring based on the image data read by the central processing unit to obtain virtual scene data.

In some embodiments, the apparatus further comprises an acceleration module for determining an acceleration interface that matches an operating system; and performing corresponding decoding processing on the media data to be processed through an acceleration interface matched with the operating system, and performing corresponding business processing on the media data to be processed based on the decoded media data to be processed.

In some embodiments, the target service includes a detection service, and the media data to be processed corresponding to the target service includes at least one of real scene data and a media data file; the processing module is also used for determining a target format adapted to a target detection algorithm; converting the data format of the media data to be processed into a target format; image detection is carried out on the video data to be processed in the media data to be processed based on a target detection algorithm, so that a video detection result is obtained, and audio detection is carried out on the audio data to be processed in the media data to be processed, so that an audio detection result is obtained; and determining a service processing result corresponding to the target service based on the video detection result and the audio detection result.

In some embodiments, the processing module is further configured to initiate a sub-thread, determine an original format of the media data to be processed by the sub-thread, and convert the media data to be processed from the original format to a target format by a logic conversion function; after the sub-thread performs the conversion operation, the main thread is notified that the format conversion has been completed.

In some embodiments, the processing module is further configured to invoke a rendering function that matches the operating system to input media data to be processed into the graphics processor; performing batch format conversion processing by a graphic processor to convert the data format of the media data to be processed into a target format; and transmitting the media data to be processed belonging to the target format in the graphic processor back to the central processing unit.

In some embodiments, the service processing result includes driving related data, and the apparatus further includes an interaction module, configured to control a virtual character in the virtual interaction scene according to the driving related data, and move according to an action of an object in the media data to be processed.

In some embodiments, the target service includes a play service, the media data to be processed corresponding to the target service includes virtual scene data, and at least one of real scene data and a media data file; the processing module is also used for carrying out format conversion on at least one of the real scene data and the media data file to obtain target video data suitable for the illusion engine; and in the process of playing the virtual interaction scene corresponding to the virtual scene data through the virtual interaction application, overlapping the target video data for playing.

In some embodiments, the target service comprises a video encoding service, and the media data to be processed corresponding to the target service comprises virtual scene data and at least one of real scene data or a media data file; the processing module is also used for acquiring a preset coding state; and the promoter thread encodes the media data to be processed through the sub thread according to the encoding state to obtain a target encoding file which is suitable for the operating system.

The various modules in the media data processing device described above may be implemented in whole or in part in software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal or a server. Taking the computer device as an example, the internal structure diagram of the computer device may be as shown in fig. 9. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a media data processing method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by persons skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements are applicable, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the media data (including, but not limited to, data for analysis, stored data, displayed data, etc.) related to the present application are all information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of media data processing, the method comprising:

under the condition of a data reading instruction, acquiring a media data file;

2. The method of claim 1, wherein the calling the media data collection device to collect data according to the target calling logic matched with the operating system for the illusion engine to operate, to obtain the real scene data, comprises:

determining a data acquisition universal interface, wherein calling logic adapted to media data acquisition devices respectively corresponding to a plurality of different operating systems is packaged in the data acquisition universal interface;

determining target calling logic which is encapsulated in the data acquisition universal interface and is matched with the system type of the operating system;

and calling a media data acquisition device to acquire data through the data acquisition universal interface based on the target calling logic, so as to obtain real scene data.

3. The method according to claim 2, wherein the calling the media data collection device to collect data through the data collection universal interface based on the target calling logic to obtain real scene data comprises:

based on the target calling logic, initiating a permission acquisition request to the operating system;

under the condition that the calling authorization is obtained according to the permission acquisition request, calling a media data acquisition device to acquire data;

and receiving the real scene data acquired by the media data acquisition device.

4. The method of claim 1, wherein recording virtual interactive scenes presented in a virtual interactive application running based on a illusion engine to obtain virtual scene data comprises:

rendering, by a graphics processor, a virtual interaction scene in a virtual interaction application operated based on a illusion engine to obtain image data;

loading the image data into a preset buffer area through the graphic processor;

and asynchronously reading the image data in the preset buffer area through a central processing unit, and acquiring based on the image data read by the central processing unit to obtain virtual scene data.

5. The method according to claim 1, wherein before the performing corresponding service processing on the media data to be processed to obtain a service processing result corresponding to the target service, the method further includes:

determining an acceleration interface matched with the operating system;

and performing corresponding decoding processing on the media data to be processed through an acceleration interface matched with the operating system, and performing corresponding business processing on the media data to be processed based on the decoded media data to be processed.

6. The method according to any one of claims 1 to 5, wherein the target service comprises a detection service, and the media data to be processed corresponding to the target service comprises at least one of the real scene data and the media data file;

the corresponding service processing is carried out on the media data to be processed to obtain a service processing result corresponding to the target service, which comprises the following steps:

determining a target format adapted to a target detection algorithm;

converting the data format of the media data to be processed into the target format;

performing image detection on the to-be-processed video data in the to-be-processed media data based on the target detection algorithm to obtain a video detection result, and performing audio detection on the to-be-processed audio data in the to-be-processed media data to obtain an audio detection result;

And determining a service processing result corresponding to the target service based on the video detection result and the audio detection result.

7. The method of claim 6, wherein the converting the data format of the media data to be processed to the target format comprises:

a sub-thread, determining an original format of the media data to be processed through the sub-thread, and converting the media data to be processed from the original format to a target format through a logic conversion function;

and after the sub-thread performs the conversion operation, notifying the main thread that the format conversion is completed.

8. The method of claim 6, wherein the converting the data format of the media data to be processed to the target format comprises:

calling a drawing function matched with the operating system, and inputting the media data to be processed into a graphic processor;

performing batch format conversion processing by the graphic processor so as to convert the data format of the media data to be processed into the target format;

and transmitting the media data to be processed belonging to the target format in the graphic processor back to the central processing unit.

9. The method of claim 6, wherein the business process results include drive related data, the method further comprising:

and controlling the virtual roles in the virtual interaction scene according to the driving related data, and moving according to the actions of the objects in the media data to be processed.

10. The method according to any one of claims 1 to 5, wherein the target service comprises a play service, the media data to be processed corresponding to the target service comprises the virtual scene data, and further comprises at least one of the real scene data and the media data file;

performing format conversion on at least one of the real scene data and the media data file to obtain target video data suitable for the illusion engine;

and superposing the target video data for playing in the process of playing the virtual interaction scene corresponding to the virtual scene data through the virtual interaction application.

11. The method according to any one of claims 1 to 5, wherein the target service comprises a video encoding service, wherein the media data to be processed corresponding to the target service comprises the virtual scene data, or wherein the media data to be processed further comprises at least one of the real scene data and the media data file;

acquiring a preset coding state;

and the sub-thread is used for encoding the media data to be processed according to the encoding state through the sub-thread to obtain a target encoding file which is adapted to the operating system.

12. A media data processing device, the device comprising:

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 11 when the computer program is executed.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 11.

15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 11.