WO2021248988A1

WO2021248988A1 - Cross-terminal screen recording method, terminal device, and storage medium

Info

Publication number: WO2021248988A1
Application number: PCT/CN2021/084338
Authority: WO
Inventors: 熊彬; 冯鹏
Original assignee: 华为技术有限公司
Priority date: 2020-06-12
Filing date: 2021-03-31
Publication date: 2021-12-16
Also published as: CN113873187B; CN113873187A

Abstract

The present application is applied to the technical field of terminals, and particularly relates to a cross-terminal screen recording method, a terminal device, and a computer readable storage medium. According the method, original audio data and original video data encoded by an encoder in a second terminal can be converted according to a target audio structure and a target video structure corresponding to a stream mixer in a first terminal, so as to obtain target audio data and target video data required for stream mixing by the stream mixer in the first terminal, so that the stream mixer can mix streams to obtain screen recording data that can be played normally, the compatibility between different types of encoders and stream mixers is achieved, the problem that the cross-terminal screen recording cannot be applied to terminals having different types of encoders and stream mixers is solved, the application range of cross-terminal screen recording is expanded, and the strong usability and practicality are achieved.

Description

Cross-terminal screen recording method, terminal equipment and storage medium

This application claims the priority of a Chinese patent application submitted to the State Intellectual Property Office on June 12, 2020, the application number is 202010534337.6, and the application name is "cross-terminal screen recording method, terminal equipment and storage medium", the entire content of which is incorporated by reference Incorporated in this application.

Technical field

This application belongs to the field of terminal technology, and in particular relates to a cross-terminal screen recording method, terminal equipment, and computer-readable storage medium.

Background technique

Cross-terminal screen recording refers to the process of using the first terminal to record the screen being presented by the second terminal and save it in the first terminal. The current cross-terminal screen recording generally involves the second terminal collecting real-time audio and video of the screen it is presenting, and encoding the collected audio and video through an encoder in the second terminal and sending it to the first terminal. After receiving the encoded audio and video, the first terminal can mix the audio and video to synthesize the screen recording data through the mixer in the first terminal and save the screen recording data in the first terminal, so that the first terminal can share the screen being presented by the second terminal in real time. However, the existing cross-terminal screen recording generally requires that the encoder in the second terminal and the mixer in the first terminal are developed based on the same framework, and when the encoder in the second terminal and the mixer in the first terminal When developing based on different frameworks, the screen recording data synthesized by the first terminal mixing stream cannot be played normally. In other words, the existing cross-terminal screen recording can only be applied between terminals with the same type of encoder and mixer, and cannot be applied between terminals with different types of encoders and mixers.

Summary of the invention

The embodiments of the present application provide a cross-terminal screen recording method, terminal equipment, and computer-readable storage medium, which can achieve compatibility between different types of encoders and mixers.

In the first aspect, an embodiment of the present application provides a cross-terminal screen recording method, which is applied to a first terminal, and the method may include:

Sending screen recording request information to the second terminal, where the screen recording request information is used to instruct the second terminal to send original audio data and original video data corresponding to the currently displayed content to the first terminal;

Receiving original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal;

Determining the target audio structure and the target video structure corresponding to the mixer in the first terminal;

Obtaining target audio data corresponding to the original audio data according to the target audio structure, and obtaining target video data corresponding to the original video data according to the target video structure;

The target audio data and the target video data are mixed stream processed by the stream mixer to obtain screen recording data.

In this embodiment, the first terminal may convert the original audio data and original video data obtained after encoding by the encoder in the second terminal according to the target audio structure and the target video structure corresponding to the mixer in the first terminal to Obtain the target audio data and target video data required by the mixer in the first terminal for mixing, so that the mixer can mix to obtain the screen recording data that can be played normally, and realize the compatibility between different types of encoders and mixers. Cross-terminal screen recording cannot be applied to the problem between terminals with different types of encoders and mixers, and the application range of cross-terminal screen recording is improved, and it has strong ease of use and practicability.

In a possible implementation manner of the first aspect, the target audio data corresponding to the original audio data is obtained according to the target audio structure, and the target video corresponding to the original video data is obtained according to the target video structure The data can include:

Obtaining candidate audio data corresponding to the original audio data according to a preset audio structure, and obtaining candidate video data corresponding to the original video data according to the preset video structure;

Convert the candidate audio data into the target audio data according to the pre-established correspondence between the preset audio structure and the target audio structure, and convert the candidate audio data into the target audio data according to the pre-established preset video structure and the target audio structure. The corresponding relationship between the target video structures is used to convert the candidate video data into the target video data.

Exemplarily, the obtaining candidate audio data corresponding to the original audio data according to a preset audio structure, and obtaining candidate video data corresponding to the original video data according to the preset video structure may include:

Determine the original audio structure corresponding to the original audio data, and the original video structure corresponding to the original video data;

According to the pre-established correspondence between the original audio structure and the preset audio structure, the original audio data is converted into the candidate audio data, and according to the pre-established original video structure and the preset audio structure, the original audio data is converted into the candidate audio data. Assuming the corresponding relationship between the video structures, the original video data is converted into the candidate video data.

Specifically, the screen recording data is data in MP4 format.

In a possible implementation of the first aspect, after receiving the original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal, the method may further include:

The original video data is decoded by the video decoder in the first terminal, and the original video data obtained by the decoding is rendered on the display interface of the first terminal.

In the method provided by this possible implementation manner, the first terminal can simultaneously display the recorded screen content during the process of recording the screen of the content being presented by the second terminal, thereby improving user experience.

In another possible implementation manner of the first aspect, after receiving the original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal, the method may further include:

The original audio data is decoded by the audio decoder in the first terminal, and the original audio data obtained by the decoding is played by the sound playing device of the first terminal.

Exemplarily, the method may further include:

If a screen recording stop instruction is detected on the first terminal, instruct the second terminal to stop sending original audio data and original video data, and save the screen recording data in the first terminal.

In the second aspect, an embodiment of the present application provides a cross-terminal screen recording method, which is applied to a second terminal, and the method may include:

After receiving the screen recording request information of the first terminal, obtain the original audio data and the original video data corresponding to the content currently displayed on the second terminal;

The target audio data and the target video data are sent to the first terminal to instruct the first terminal to perform processing on the target audio data and the target video data through the mixer in the first terminal Mixed-stream processing to obtain screen recording data.

In this embodiment, the second terminal can convert the original audio data and original video data obtained after encoding by the encoder in the second terminal according to the target audio structure and the target video structure corresponding to the mixer in the first terminal, to obtain The mixer in the first terminal mixes the required target audio data and target video data, and sends the target audio data and target video data to the first terminal, so that the mixer of the first terminal can compare the target audio data and the target video data. The data is mixed stream processing to obtain screen recording data that can be played normally, to achieve compatibility between different types of encoders and mixers, and to solve the problem that cross-terminal screen recording cannot be applied to terminals with different types of encoders and mixers , Improve the application range of cross-terminal screen recording, with strong ease of use and practicality.

In a possible implementation of the second aspect, the target audio data corresponding to the original audio data is obtained according to the target audio structure, and the target video corresponding to the original video data is obtained according to the target video structure The data includes:

It should be understood that, after receiving the screen recording request information of the first terminal, acquiring the original audio data and original video data corresponding to the content currently displayed on the second terminal may include:

After detecting the touch operation of the first terminal on the second terminal, the original audio data and the original video data corresponding to the content currently displayed by the second terminal are acquired.

Exemplarily, the method may further include:

If an instruction to stop screen recording is detected on the second terminal, stop sending original audio data and original video data to the first terminal.

In the third aspect, an embodiment of the present application provides a cross-terminal screen recording method, which may include:

The first terminal sends screen recording request information to the second terminal;

After receiving the screen recording request information of the first terminal, the second terminal acquires original audio data and original video data corresponding to the content currently displayed on the second terminal;

The second terminal obtains the candidate audio data corresponding to the original audio data according to the preset audio structure, and obtains the candidate video data corresponding to the original video data according to the preset video structure, and combines the candidate audio data with the Sending candidate video data to the first terminal;

The first terminal determines the target audio structure and the target video structure corresponding to the mixer in the first terminal, and obtains the target audio data corresponding to the candidate audio data according to the target audio structure, and according to the target Obtaining the target video data corresponding to the candidate video data by the video structure;

The first terminal performs stream mixing processing on the target audio data and the target video data through the mixer in the first terminal to obtain screen recording data.

In this embodiment, by setting the MFSM module in the first terminal and the second terminal to perform the intermediate conversion of the target audio data and the target video data, the configuration of the corresponding relationship in the first terminal and the second terminal can be greatly simplified, so that Reduce the development workload of the development staff and subsequent update workload, and can effectively reduce the search time of the target audio structure and the target video structure, thereby effectively increasing the conversion speed of the target audio data and the target video data, and improving the mixing of the mixer efficient.

In a possible implementation of the third aspect, the second terminal obtains candidate audio data corresponding to the original audio data according to a preset audio structure, and obtains candidate video data corresponding to the original video data according to the preset video structure The data can include:

Determining, by the second terminal, an original audio structure corresponding to the original audio data, and an original video structure corresponding to the original video data;

The second terminal converts the original audio data into the candidate audio data according to the pre-established correspondence between the original audio structure and the preset audio structure, and according to the pre-established original video The corresponding relationship between the structure and the preset video structure is to convert the original video data into the candidate video data.

Exemplarily, the first terminal obtaining target audio data corresponding to the candidate audio data according to the target audio structure, and obtaining target video data corresponding to the candidate video data according to the target video structure may include:

The first terminal converts the candidate audio data into the target audio data according to the pre-established correspondence between the preset audio structure and the target audio structure, and converts the candidate audio data into the target audio data according to the pre-established preset The corresponding relationship between the video structure and the target video structure is used to convert the candidate video data into the target video data.

In a possible implementation manner of the third aspect, the method may further include:

If the first terminal detects a stop screen recording instruction on the first terminal, the first terminal instructs the second terminal to stop sending original audio data and original video data, and save the screen recording data At the first terminal.

Exemplarily, after the second terminal receives the screen recording request information of the first terminal, acquiring original audio data and original video data corresponding to the content currently displayed on the second terminal may include:

After detecting the touch operation of the first terminal on the second terminal, the second terminal acquires original audio data and original video data corresponding to the content currently displayed by the second terminal.

If the second terminal detects a stop screen recording instruction on the second terminal, it stops sending original audio data and original video data to the first terminal.

Specifically, the screen recording data is data in MP4 format.

In a fourth aspect, an embodiment of the present application provides a cross-terminal screen recording device, which is applied to a first terminal, and the device may include:

The request sending module is configured to send screen recording request information to the second terminal, where the screen recording request information is used to instruct the second terminal to send the original audio data and original video data corresponding to the current display content to the first terminal ；

An original audio and video receiving module, configured to receive original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal;

A target structure determining module, configured to determine a target audio structure and a target video structure corresponding to the mixer in the first terminal;

A target audio and video acquisition module, configured to acquire target audio data corresponding to the original audio data according to the target audio structure, and acquire target video data corresponding to the original video data according to the target video structure;

The stream mixing module is used to perform stream mixing processing on the target audio data and the target video data through the stream mixer to obtain screen recording data.

In a possible implementation manner of the fourth aspect, the target audio and video acquisition module may include:

A candidate audio and video obtaining unit, configured to obtain candidate audio data corresponding to the original audio data according to a preset audio structure, and obtain candidate video data corresponding to the original video data according to the preset video structure;

The target audio and video acquisition unit is configured to convert the candidate audio data into the target audio data according to the pre-established correspondence between the preset audio structure and the target audio structure, and to convert the candidate audio data into the target audio data according to the pre-established The corresponding relationship between the preset video structure and the target video structure is used to convert the candidate video data into the target video data.

Exemplarily, the candidate audio and video acquisition unit may include:

An original structure determining subunit for determining the original audio structure corresponding to the original audio data, and the original video structure corresponding to the original video data;

The candidate audio and video acquisition subunit is configured to convert the original audio data into the candidate audio data according to the pre-established correspondence between the original audio structure and the preset audio structure, and according to the pre-established The corresponding relationship between the original video structure and the preset video structure is to convert the original video data into the candidate video data.

Specifically, the screen recording data is data in MP4 format.

In a possible implementation manner of the fourth aspect, the device may further include:

The video display module is configured to decode the original video data through the video decoder in the first terminal, and render the decoded original video data on the display interface of the first terminal.

In another possible implementation manner of the fourth aspect, the apparatus may further include:

The audio playing module is used to decode the original audio data through the audio decoder in the first terminal, and to play the decoded original audio data through the sound playing device of the first terminal.

Exemplarily, the device may further include:

The screen recording saving module is configured to, if a screen recording stop instruction is detected on the first terminal, instruct the second terminal to stop sending original audio data and original video data, and save the screen recording data in the The first terminal.

In a fifth aspect, an embodiment of the present application provides a cross-terminal screen recording device, which is applied to a second terminal, and the device may include:

The original audio and video acquisition module is configured to, after receiving the screen recording request information of the first terminal, acquire the original audio data and the original video data corresponding to the content currently displayed on the second terminal;

The target audio and video sending module is configured to send the target audio data and the target video data to the first terminal to instruct the first terminal to send the target audio data to the target audio through the mixer in the first terminal. The data and the target video data are mixed stream processing to obtain screen recording data.

In a possible implementation manner of the fifth aspect, the target audio and video acquisition module may include:

Exemplarily, the candidate audio and video acquisition unit may include:

It should be understood that the original audio and video acquisition module is specifically configured to acquire the original audio data corresponding to the current display content of the second terminal after detecting the touch operation of the second terminal by the first terminal. Raw video data.

Exemplarily, the device may further include:

The screen recording stop module is configured to stop sending original audio data and original video data to the first terminal if a screen recording stop instruction is detected on the second terminal.

In a sixth aspect, an embodiment of the present application provides a cross-terminal screen recording system, including a first terminal and a second terminal. The first terminal includes a request sending module, a target structure determination module, and a mixed stream module. The second terminal Including the original audio and video acquisition module and candidate audio and video acquisition module, including:

The request sending module is configured to send screen recording request information to the second terminal;

The original audio and video obtaining module is configured to obtain original audio data and original video data corresponding to the current display content of the second terminal after receiving the screen recording request information of the first terminal;

The candidate audio and video obtaining module is configured to obtain candidate audio data corresponding to the original audio data according to a preset audio structure, and obtain candidate video data corresponding to the original video data according to the preset video structure, and combine the candidate Sending audio data and the candidate video data to the first terminal;

The target structure determining module is configured to determine the target audio structure and the target video structure corresponding to the mixer in the first terminal, and obtain the target audio data corresponding to the candidate audio data according to the target audio structure, and Acquiring target video data corresponding to the candidate video data according to the target video structure;

The stream mixing module is configured to perform stream mixing processing on the target audio data and the target video data through a stream mixer in the first terminal to obtain screen recording data.

In a possible implementation manner of the sixth aspect, the candidate audio and video acquisition module may include:

An original structure determining unit, configured to determine the original audio structure corresponding to the original audio data, and the original video structure corresponding to the original video data;

The candidate audio and video acquisition unit is configured to convert the original audio data into the candidate audio data according to the pre-established correspondence between the original audio structure and the preset audio structure, and to convert the original audio data into the candidate audio data according to the pre-established The corresponding relationship between the original video structure and the preset video structure is used to convert the original video data into the candidate video data.

Exemplarily, the target structure determining module is further configured to convert the candidate audio data into the target audio data according to a pre-established correspondence between the preset audio structure and the target audio structure, And according to the pre-established correspondence between the preset video structure and the target video structure, the candidate video data is converted into the target video data.

In a possible implementation manner of the sixth aspect, the first terminal may further include a screen recording saving module:

The screen recording saving module is configured to, if a screen recording stop instruction is detected on the first terminal, instruct the second terminal to stop sending original audio data and original video data, and save the screen recording data in The first terminal.

Exemplarily, the second terminal may further include a screen recording stop module;

Specifically, the screen recording data is data in MP4 format.

In a seventh aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, , Enabling the terminal device to implement any one of the foregoing first aspect or the cross-terminal screen recording method described in any one of the second aspect.

In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the computer realizes the above-mentioned first aspect Either one, or the cross-terminal screen recording method as described in any one of the second aspect.

In the ninth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute any one of the foregoing first aspect or any one of the second aspect The cross-terminal screen recording method.

Description of the drawings

FIG. 1 is a schematic diagram of a scene of cross-terminal screen recording in the prior art;

FIG. 2 is a schematic diagram of an application scenario of a cross-terminal screen recording method provided by an embodiment of the present application;

3a and 3b are schematic diagrams of a communication connection between a first terminal and a second terminal in an embodiment of the present application;

4 is a schematic flowchart of a cross-terminal screen recording method provided by Embodiment 1 of the present application;

5a and 5b are schematic diagrams of application scenarios of the cross-terminal screen recording method provided in Embodiment 2 of the present application;

6 is a schematic flowchart of a cross-terminal screen recording method provided in Embodiment 2 of the present application;

FIG. 7 is a schematic diagram of an application scenario of the cross-terminal screen recording method provided in Embodiment 3 of the present application;

FIG. 8 is a schematic flowchart of a cross-terminal screen recording method provided in Embodiment 3 of the present application;

FIG. 9 is a schematic structural diagram of a cross-terminal screen recording device provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a cross-terminal screen recording device provided by another embodiment of the present application;

FIG. 11 is a system schematic diagram of a cross-terminal screen recording system provided by an embodiment of the present application;

FIG. 12 is a schematic structural diagram of a terminal device provided by an embodiment of the present application;

FIG. 13 is a schematic structural diagram of a mobile phone to which the cross-terminal screen recording method provided by an embodiment of the present application is applicable;

FIG. 14 is a schematic diagram of a software architecture to which the cross-terminal screen recording method provided by an embodiment of the present application is applicable.

detailed description

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application.

It should be understood that when used in the specification and appended claims of this application, the term "comprising" indicates the existence of the described features, wholes, steps, operations, elements and/or components, but does not exclude one or more other The existence or addition of features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the term "and/or" used in the specification and appended claims of this application refers to any combination of one or more of the items listed in the associated and all possible combinations, and includes these combinations.

As used in the description of this application and the appended claims, the term "if" can be construed as "when" or "once" or "in response to determination" or "in response to detecting ". Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

The reference to "one embodiment" or "some embodiments" described in the specification of this application means that one or more embodiments of this application include a specific feature, structure, or characteristic described in combination with the embodiment. Therefore, the sentences "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized. The terms "including", "including", "having" and their variations all mean "including but not limited to" unless otherwise specifically emphasized.

The cross-terminal screen recording method provided by the embodiments of this application can be applied to a first terminal, where the first terminal can be a mobile phone, a tablet computer, a desktop computer, a wearable device, a vehicle-mounted device, a notebook computer, a smart TV, or a smart speaker , Ultra-mobile personal computers (UMPC), netbooks, personal digital assistants (personal digital assistants, PDAs) and other terminal devices with display screens, the embodiments of this application do not impose any restrictions on the specific types of terminal devices.

Cross-terminal screen recording refers to the process of using the first terminal to record the screen (which may include sound) being presented by the second terminal and save it in the first terminal. For example, using a mobile phone to record the screen being presented on the computer and record it The obtained content is stored in the mobile phone to facilitate the user to view and share the content through the mobile phone. The current cross-terminal screen recording is mainly to take a video of the screen being presented by the computer through the mobile phone, and save the captured video on the mobile phone. This method of shooting a computer with a mobile phone to achieve cross-terminal screen recording requires the user to hold the mobile phone facing the computer screen, which is inconvenient to operate, and the video effect of the recorded video is likely to be poor due to problems such as jitter or camera pixels in the mobile phone.

In order to improve the convenience of cross-terminal screen recording, as shown in Figure 1, in the prior art, it is also possible to use a computer to collect real-time audio and video of the screen it is presenting to obtain video data in YUV format and pulse code modulation (Puls Code Modulation). Modulation, PCM) format audio data, and the collected video data in YUV format can be encoded into H.264 format through the ffmpeg encoder in the computer (ie based on the Fast Forward moving pictures expert group, the encoder developed by the ffmpeg framework) The video data and the collected audio data in PCM format are encoded into audio data in Advanced Audio Coding (AAC) format, and then the video data in H.264 format and the audio data in AAC format can be combined into a transport stream ( Transport Stream, ts), and can send the ts stream to the mobile phone through the Transmission Control Protocol (tcp). After the mobile phone receives the ts stream sent by the computer, it can extract video data in H.264 format and audio data in AAC format from the ts stream, and can use the ffmpeg mixer in the mobile phone (that is, based on Fast Forward moving pictures expert group , The mixer developed by the ffmpeg framework) mixes video data in H.264 format and audio data in AAC format into MP4 format data and saves it on the mobile phone. Although this computer-encoded and mobile phone mixed-stream screen recording method can improve the convenience of cross-terminal screen recording and ensure the video effect of the recorded video, it can only be applied between encoders and mixers developed based on the same framework. , That is, it can only be applied between terminals with the same type of encoder and mixer. For example, it can only be applied between a computer with a ffmpeg encoder and a mobile phone with a ffmpeg mixer. It cannot be applied between different types of encoders and Between the terminals of the mixer. That is to say, the screen recording data obtained by this kind of screen recording mode of encoding at one end and mixing the other end between the terminals with different types of encoders and mixers (that is, the data in the MP4 format synthesized by the mixing of audio data and video data) ) There will be problems such as unable to play, green screen, and only half of the display.

In order to solve the above problems, the embodiments of the present application provide a cross-terminal screen recording method, device, terminal equipment, and computer-readable storage medium. When cross-terminal screen recording, it can be based on the target corresponding to the mixer in the first terminal. The audio structure and the target video structure convert the original audio data and original video data encoded by the encoder in the second terminal to obtain the target audio data and target video data required by the mixer in the first terminal for mixing, so that The mixer can mix streams to obtain the screen recording data that can be played normally, realize the compatibility between different types of encoders and mixers, and solve the problem that cross-terminal screen recording cannot be applied to terminals with different types of encoders and mixers. Improve the application range of cross-terminal screen recording, with strong ease of use and practicality.

FIG. 2 shows a schematic diagram of an application scenario of the cross-terminal screen recording method provided by an embodiment of the present application. The application scenario may include a first terminal 100 and a second terminal 200, and both the first terminal 100 and the second terminal 200 may be mobile phones. , Tablet computers, desktop computers, wearable devices, vehicle-mounted devices, notebook computers, smart TVs, smart speakers, ultra-mobile personal computers, netbooks, personal digital assistants and other terminal devices with displays.

It should be noted that there is no strict distinction between the first terminal 100 and the second terminal 200. For the same terminal device, it can be used as the first terminal 100 in some scenarios, and it can also be used as the second terminal in other scenarios. 200 uses. For example, in a certain scene, the screen being presented by the computer can be recorded through the mobile phone; in another scene, the screen being presented by the mobile phone can also be recorded through the smart TV.

In addition, when recording across terminals, the first terminal 100 may be used to record the screen being presented by the second terminal 200, or the second terminal 200 may be used to record the screen being presented by the first terminal 100. For example, in a certain scene, the screen being presented by the computer can be recorded through the mobile phone; in another scene, the screen being presented by the mobile phone can also be recorded through the computer. In the embodiment of the present application, the screen recording of the screen being presented by the second terminal 200 through the first terminal 100 is taken as an example for exemplification.

In the embodiment of the present application, when performing the first cross-terminal screen recording, the user can establish a short-range communication connection between the first terminal 100 and the second terminal 200, so that the first terminal 100 can send messages to the second terminal 200 through short-range communication. Screen recording request and obtaining audio data and video data returned by the second terminal 200. Among them, the short-range communication connection may be a Bluetooth connection, a near field communication (Near Field Communication, NFC) connection, a wireless fidelity (Wireless-Fidelity, WiFi) connection, or a ZigBee (ZigBee) connection. In the embodiments of the present application, the short-range communication connection is a Bluetooth connection and a WiFi connection as an example for exemplification.

In order to improve the convenience and speed of Bluetooth connection and WiFi connection establishment, both the first terminal 100 and the second terminal 200 may be terminal devices provided with an NFC chip, so that the first terminal 100 and the second terminal 200 can be realized through the NFC chip. Fast pairing between the first terminal 100 and the second terminal 200, thereby conveniently and quickly establishing a Bluetooth connection and a WIFI connection between the first terminal 100 and the second terminal 200. Specifically, before the user uses the first terminal 100 to record the screen that the second terminal 200 is presenting for the first time, the user can touch the NFC in the second terminal 200 by using the first preset area where the NFC chip in the first terminal 100 is located. The second preset area where the chip is located is shown in FIG. 3a. At this time, the display interface of the first terminal 100 can pop up a connection pop-up box about whether to establish a connection with the second terminal 200. The connection pop-up box can include "connection" and "Ignore" button. When the user clicks the "connect" button in the first terminal 100, the first terminal 100 can send a connection request to the second terminal. As shown in Figure 3b, at this time, the display interface of the second terminal 200 can pop up an authorization pop-up box for whether to establish a connection with the first terminal 100. The authorization pop-up box can include "authorize" and "reject" buttons. When the "authorize" button is clicked in the second terminal 200, the Bluetooth connection and the WiFi connection between the first terminal 100 and the second terminal 200 can be successfully established. It should be understood that after the Bluetooth connection and WiFi connection between the first terminal 100 and the second terminal 200 are successfully established, when the first terminal 100 is far away from the second terminal 200, the Bluetooth connection between the first terminal 100 and the second terminal 200 Both the connection and the WIFI connection are disconnected. Subsequently, when the first terminal 100 approaches the second terminal 200, the second terminal 200 can automatically establish a Bluetooth connection with the first terminal 100 based on the saved Media Access Control (MAC) address of the first terminal 100, and at the same time A WiFi connection can be established with the first terminal 100.

[Embodiment One]

Please refer to FIG. 4. FIG. 4 is a schematic flowchart of a cross-terminal screen recording method provided by this embodiment. The method can be applied to the application scenario shown in FIG. 2. As shown in Figure 4, the method may include:

S401: The first terminal sends screen recording request information to the second terminal.

It should be understood that after the Bluetooth connection and WiFi connection between the first terminal 100 and the second terminal 200 are successfully established, when the user needs to use the first terminal 100 to record the content being presented by the second terminal 200, the user can use The first terminal 100 sends screen recording request information to the second terminal 200. At this time, the first terminal 100 can send screen recording request information to the second terminal 200 based on Bluetooth communication. The screen recording request information is used to instruct the second terminal 200 to The content it is presenting acquires original audio data and original video data, and sends the original audio data and original video data to the first terminal 100 through WiFi communication. Exemplarily, the user can shake the first terminal 100 first, and can touch the first preset area in the first terminal 100 to the second preset in the second terminal 200 within a preset time after the shaking is finished. Area to send screen recording request information to the second terminal 200; or you can directly touch the first preset area in the first terminal 100 to the second preset area in the second terminal 200 to send the screen recording to the second terminal 200 Request information; or you can directly shake the first terminal 100 to send the screen recording request information to the second terminal 200; or you can click the screen recording button in the first terminal 100 to send the screen recording request information to the second terminal 200, This embodiment does not specifically limit the manner in which the first terminal 100 sends the screen recording request information to the second terminal 200.

It should be noted that after receiving the screen recording request information sent by the first terminal 100, the second terminal 200 can create a data transmission channel for data transmission, so as to send the acquired audio data and audio data to the first terminal 100 through the data transmission channel. Video data, and a notification message that the data transmission channel is successfully created can be fed back to the first terminal 100. After receiving the notification message, the first terminal 100 can connect to the data transmission channel created by the second terminal 200, so that the audio data and video data sent by the second terminal 200 can be received through the data transmission channel.

S402. After receiving the screen recording request information sent by the first terminal, the second terminal obtains original audio data and original video data corresponding to the content currently displayed on the second terminal, and sends the original audio data and original video data to the first terminal. terminal.

Here, after the second terminal 200 receives the screen recording request information sent by the first terminal 100, it can perform real-time collection of video data on the screen being presented on the screen of the second terminal 200, as well as the sound of the second terminal 200. The sound being played in the playback device (such as a sound card) collects audio data in real time to obtain initial video data and initial audio data. Then the original audio data can be encoded by the encoder in the second terminal 200 to obtain the original audio data after the original audio data is encoded, and the original video data can be encoded by the encoder in the second terminal 200 to obtain the original video data The encoded original video data, and the original audio data and the original video data can be respectively sent to the first terminal 100 through the data transmission channel. The encoder in the second terminal 200 may be any type of encoder, for example, it may be an ffmpeg encoder, an AMD encoder, or an Intel encoder. The original audio data may be audio data in AAC format, and the original video data may be video data in H.264 format.

S403: The first terminal determines the target audio structure and the target video structure corresponding to the mixer in the first terminal.

Exemplarily, the first terminal 100 may store a correspondence table between the device type and the mixer type, or may store a correspondence table between the device type and the target audio structure and the target video structure, the first terminal 100 The device type of the first terminal 100 can be determined, so that the target audio structure and target corresponding to the mixer in the first terminal 100 can be determined according to the device type of the first terminal 100 and the correspondence table stored in the first terminal 100 Video structure. Exemplarily, the correspondence table between the device type and the mixer type, or the correspondence table between the device type and the target audio structure and the target video structure may also be stored in the server or the cloud, and the first terminal 100 may interact with the server/ Cloud connection, therefore, after determining the device type of the first terminal 100, the first terminal 100 can obtain the target audio corresponding to the mixer in the device type returned by the server/cloud by sending the device type to the server/cloud Structure and target video structure.

It should be understood that the target audio structure corresponding to the mixer is used to characterize attributes such as the data type and data format of the audio data required for the mixer to mix, and the target video structure corresponding to the mixer is used to characterize the data type and data type of the video data required for the mixer to mix. Data format and other attributes. For example, the target audio structure GoogleMuxerAudioFrame and the target video structure GoogleMuxerVideoFrame corresponding to Google Mixer can be respectively:

Among them, the flags in GoogleMuxerAudioFrame represent the audio type, and the default can be 0 (that is, when the flags of a certain data is 0, it means that the data is audio); esds represents the sampling rate, channel number and frame length of the audio, and audioFrame represents Audio frame, audioSize represents the audio frame size, presentationTimeUs represents the timestamp; the flags in GoogleMuxerVideoFrame represents the video frame type, which can include 1 (characterizing that the video frame is an intra-encoded frame, I frame) and 0 (characterizing that the video frame is an inter-frame Predictive coding frame, P frame), sps stands for sequence parameter set, pps stands for image parameter set, videoFrame stands for video frame, videoSize stands for video frame size, presentationTimeUs stands for timestamp, sps, pps and videoFrame all carry NALU (Network Abstract Layer unit) Head, videoFrame has only 1 slice (ie slice).

It should be noted that the first terminal 100 may also determine the location of the mixer in the first terminal 100 when sending the screen recording request information to the second terminal 200 or in the process of acquiring the original audio data and the original video data by the second terminal 200. Corresponding target audio structure and target video structure. In other words, there is no strict timing execution relationship between S403 and S402. S403 can be executed before S402, after S402, or simultaneously with S402, which is not specifically limited in this embodiment.

S404: The first terminal obtains target audio data corresponding to the original audio data according to the target audio structure, and obtains target video data corresponding to the original video data according to the target video structure;

Exemplarily, the first terminal 100 may first determine the original audio structure corresponding to the original audio data and the original video structure corresponding to the original video data, and then may obtain the original audio structure corresponding to the original audio data according to the correspondence between the original audio structure and the target audio structure. Target audio data, and the target video data corresponding to the original video data can be obtained according to the corresponding relationship between the original video structure and the target video structure. The correspondence between the original audio structure and the target audio structure, and the correspondence between the original video structure and the target video structure may be pre-established according to actual conditions. It should be understood that the original audio structure and the original video structure may be related to the type of encoder, that is, the first terminal 100 may determine the original audio structure corresponding to the original audio data and the original video data corresponding to the original audio structure according to the type of the encoder in the second terminal 200. Original video structure.

Specifically, the first terminal 100 may extract and convert data from the original audio data according to the data type and data format corresponding to the original audio structure, and the data type and data format corresponding to the target audio structure, so as to obtain the target audio data. Similarly, the first terminal 100 may also extract and convert data from the original video data according to the data type and data format corresponding to the original video structure, and the data type and data format corresponding to the target video structure, so as to obtain the target video data.

For example, because the Google mixer only receives the video frame containing a single slice, and the video frame encoded by ffmpeg includes multiple slices multiSlice, therefore, the mixer at the first terminal 100 is the Google mixer, and the encoding at the second terminal 200 When the ffmpeg encoder is the ffmpeg encoder, the first terminal 100 may first extract a video frame containing multiSlice from the original video data, and then use mergeMultiSliceToOneSlice() to convert the video frame containing multiSlice into a video frame containing singleSlice.

S405. The first terminal performs stream mixing processing on the target audio data and the target video data through the stream mixer to obtain screen recording data.

Here, after the first terminal 100 obtains the target audio data and the target video data, it can input the target audio data and the target video data to the mixer in the first terminal 100, so that the mixer can compare the target audio data and the target video data. The target video data is mixed stream to synthesize the screen recording data and stored in the first terminal 100. The mixer in the first terminal 100 may be any type of mixer, for example, it may be an ffmpeg mixer, a Google mixer, or an Mp4v2 mixer. The screen recording data synthesized by the mixed stream can be video data in MP4 format.

In this embodiment, the type of the mixer in the first terminal 100 may be the same as or different from the type of the encoder in the second terminal 200. For example, the mixer in the first terminal 100 may be a Google mixer, and the encoder in the second terminal 200 may be an ffmpeg encoder; or the mixer in the first terminal 100 may be an Mp4V2 mixer, and the mixer in the second terminal 200 The encoder in the first terminal 100 can be an Intel encoder; or the mixer in the first terminal 100 can be a ffmpeg mixer, and the encoder in the second terminal 200 can be a ffmpeg encoder; or the mixer in the first terminal 100 can be an intel For the mixer, the encoder in the second terminal 200 may be an intel encoder.

It is understandable that when the type of the mixer in the first terminal 100 is different from the type of the encoder in the second terminal 200, the first terminal 100 can use the original data obtained by encoding the second terminal 200 in S403 and S404. The audio data and the original video data are used to obtain the target audio data and the target video data, and then the target audio data and the target video data can be mixed by the mixer to synthesize the screen recording data; when the type of the mixer in the first terminal 100 and the first terminal 100 When the encoder types in the two terminals 200 are the same, the first terminal 100 can directly use the mixer in the first terminal 100 to mix the original audio data and the original video data sent by the second terminal 200 to synthesize the screen recording data.

In a possible implementation manner, the first terminal 100 may also display the recorded screen simultaneously during the process of recording the screen being presented by the second terminal 200. Exemplarily, the first terminal 100 may decode the screen recording data in real time through the decoder in the first terminal 100, and may render the decoded video data on the display interface of the first terminal 100, and at the same time, the decoded data may be decoded. The audio data of the first terminal 100 is played by a sound playing device (for example, a sound card) of the first terminal 100, so as to synchronously present the picture and sound being presented by the second terminal 200 in the first terminal 100.

It should be noted that when the distance between the first terminal 100 and the second terminal 200 is less than the preset threshold, that is, when the distance between the first terminal 100 and the second terminal 200 is relatively short, the first terminal 100 may only The decoded video data is rendered on the display interface of the first terminal 100 to reduce sound mixing during the synchronous presentation process and improve user experience. Among them, the preset threshold can be specifically set according to actual conditions, which is not specifically limited in this embodiment.

Exemplarily, in order to reduce the synchronization presentation time delay of the first terminal 100, the first terminal 100 may also use the video decoder in the first terminal 100 to directly record the screen being presented by the second terminal 200. The original video data transmitted by the second terminal 200 can be decoded, and the decoded video data can be rendered on the display interface of the first terminal 100. At the same time, the audio decoder in the first terminal 100 can be used to directly communicate with the second terminal. The original audio data passed by 200 is decoded, and the decoded audio data can be played through the sound playback device of the first terminal 100, so as to synchronously present the picture and sound being presented by the second terminal 200 in the first terminal 100 .

It should be understood that during the process of the first terminal 100 recording the screen being presented in the second terminal 200, the user can input a stop screen recording instruction on the first terminal 100 to instruct the first terminal 100 to stop recording the screen. That is, the first terminal 100 can detect in real time whether the user inputs a stop recording command on the first terminal 100 during the screen recording process, and if it detects that the user inputs a stop screen recording command on the first terminal 100, it can instruct the second terminal 200 The transmission of the original audio data and the original video data can be stopped or the data transmission channel between the first terminal 100 and the second terminal 200 can be closed to stop the screen recording, and the screen recording data can be saved in the first terminal 100.

Exemplarily, the instruction to stop screen recording in this embodiment may be an instruction generated when it is detected that the user clicks a specific button such as "Stop" on the first terminal 100, or it may be an instruction that detects that the user shakes the first terminal 100. The instruction generated at the time, or may be the instruction generated when it is detected that the user input a specific voice keyword such as "stop", or it may be the instruction generated when the user input a specific gesture on the first terminal 100 is detected. The embodiment does not specifically limit the generation method of the stop screen recording instruction.

It should be noted that, in this embodiment, the conversion process between the target audio data and the target video data may also be performed in the second terminal 200. That is, the second terminal 200 can determine the target audio structure and the target video structure corresponding to the mixer in the first terminal 100, and then can separately encode the original audio obtained by the encoder in the second terminal 200 according to the target audio structure and the target video structure. The data and original video data are converted into target audio data and target video data and sent to the first terminal 100. The mixer in the first terminal 100 can directly mix the target audio data and the target video data to synthesize the screen recording data and save it in the first terminal. In the terminal 100.

Wherein, the second terminal 200 determines the target audio structure and the target video structure corresponding to the mixer in the first terminal 100, and the first terminal 100 determines the target audio structure and the target audio structure corresponding to the mixer in the first terminal 100. The process of the target video structure is similar, that is, the second terminal 200 may store a correspondence table between the device type and the mixer type, or may store a correspondence table between the device type and the target audio structure and the target video structure. The second terminal 200 may first obtain the device type of the first terminal 100, and then may determine the target corresponding to the mixer in the first terminal 100 according to the device type of the first terminal 100 and the correspondence table stored in the second terminal 200 Audio structure and target video structure. Exemplarily, the correspondence table between the device type and the mixer type, or the correspondence table between the device type and the target audio structure and the target video structure may also be stored in the server or the cloud, and the second terminal 200 may be connected to the server/ Cloud connection, therefore, after determining the device type of the first terminal 100, the second terminal 200 can obtain the target audio corresponding to the mixer in the device type returned by the server/cloud by sending the device type to the server/cloud Structure and target video structure.

It should be understood that the process in which the second terminal 200 converts original audio data and original video data into target audio data and target video data respectively according to the target audio structure and the target video structure is the same as the above-mentioned first terminal 100 according to the target audio structure and target video structure. The process of obtaining the target audio data corresponding to the original audio data and the target video data corresponding to the original video data is similar, and the basic principles are the same. For the sake of brevity, details are not repeated here.

In this embodiment, the original audio data and original video data encoded by the encoder in the second terminal can be converted according to the target audio structure and the target video structure corresponding to the mixer in the first terminal to obtain the first terminal The target audio data and target video data required by the mixer in the mixer are mixed, so that the mixer can mix the stream to obtain the screen recording data that can be played normally, realize the compatibility between different types of encoders and mixers, and solve the problem of cross-terminal screen recording It can not be applied to the problems between terminals with different types of encoders and mixers, and the application range of cross-terminal screen recording is improved, and it has strong ease of use and practicability.

[Embodiment 2]

In the first embodiment, the first terminal 100/the second terminal 200 search for the correspondence between the original audio structure and the target audio structure, and the correspondence between the original video data and the target video data, to perform the target audio data and the target video data. The extraction and conversion of video data, that is, the corresponding relationship between the original audio structure corresponding to different encoders and the target audio structure corresponding to different mixers must be configured in advance in the first terminal 100/second terminal 200, and different encodings must be configured The corresponding relationship between the original video structure corresponding to the device and the target video structure corresponding to different mixers, and when the type of encoder and/or the type of mixer is more, that is, when the original audio structure, target audio structure, original video structure , When there are many types of target video structures, the corresponding relationships configured are also more and more complex, which greatly increases the developer's development workload and/or update workload. In addition, this more and more complex correspondence The relationship also causes the search for the target audio structure and/or the target video structure to take more time, which easily reduces the conversion speed of the target audio data and/or the target video data, and reduces the mixing efficiency of the mixer.

In order to simplify the configuration of the corresponding relationship, increase the conversion speed of target audio data and target video data, and improve the mixing efficiency of the mixer, as shown in FIG. 5a, in this embodiment, a multi-platform mixing synchronization (Mutil- platform Mixed Flow Synchronization Method (MFSM) module, the MFSM module can uniformly convert the original audio data of any audio structure into candidate audio data of the preset audio structure, and can uniformly convert the original video data of any video structure into the preset video structure The candidate video data can then be converted into target audio data according to the corresponding relationship between the preset audio structure and the target audio structure, and the candidate video can be converted according to the corresponding relationship between the preset video structure and the target video structure The data is converted into target video data. That is, in this embodiment, it is only necessary to configure in advance the correspondence between each original audio structure and the preset audio structure, and configure the correspondence between the preset audio structure and each target audio structure. Similarly, only It is necessary to configure the corresponding relationship between each original video structure and the preset video structure, and configure the corresponding relationship between the preset video structure and each target video structure. Among them, the corresponding relationship between the audio structure and the video structure is M+N, which is obviously less than the M*N in the first embodiment. M is the number of types of original audio structure/original video structure, and N is the target audio structure. /The number of types of target video structures greatly simplifies the configuration of the corresponding relationship, which can reduce the development workload of the development staff and the subsequent update workload, and can effectively reduce the search time of the target audio structure and the target video structure, thereby It can effectively increase the conversion speed of target audio data and target video data, and improve the mixing efficiency of the mixer.

Please refer to FIG. 6. FIG. 6 is a schematic flowchart of a cross-terminal screen recording method provided by this embodiment. The method can be applied to the application scenario shown in FIG. 2. As shown in Figure 6, the method may include:

S601: The first terminal sends screen recording request information to the second terminal.

It should be understood that the content of S601 is similar to that of S401 in the first embodiment, and the basic principle is the same. For the sake of brevity, it will not be repeated here.

S602. After receiving the screen recording request information sent by the first terminal, the second terminal obtains original audio data and original video data corresponding to the content currently displayed on the second terminal, and sends the original audio data and original video data to the first terminal. terminal.

It should be understood that the content of S602 is similar to that of S402 in the first embodiment, and the basic principle is the same. For the sake of brevity, it will not be repeated here.

S603: The first terminal determines the target audio structure and the target video structure corresponding to the mixer in the first terminal.

It should be understood that the content of S603 is similar to that of S403 in the first embodiment, and the basic principle is the same. For the sake of brevity, it will not be repeated here.

S604: The first terminal obtains candidate audio data corresponding to the original audio data according to the preset audio structure, and obtains candidate video data corresponding to the original video data according to the preset video structure.

It should be noted that the preset audio structure is a general audio data structure determined by analyzing the audio data required for mixing by each mixer, and the preset video structure is determined by analyzing the video data required by each mixer for mixing. General video data structure. Exemplarily, the preset audio structure AudioFrame and the preset video structure AudioFrame may be respectively:

Among them, the type in AudioFrame represents the audio type, which can be 0x20 by default (that is, when the type of a certain data is 0x20, it means that the data is audio), adts represents the ADTS (Audio Data Transport Stream) header, and esds represents the audio sampling rate, Channel number and frame length, etc., sample represents audio frame, timeStamp represents time stamp; type in VideoFrame represents video frame type, including 0x10 representing I frame and 0x11 representing P frame, sps represents sequence parameter set, pps represents image parameter set , Sei stands for enhanced meta-information, frame stands for video frame, timestamp stands for timestamp, sps, pps, sei and frame all carry NALU headers.

Exemplarily, the first terminal 100 may convert the original audio data into candidate audio data of a preset audio structure and convert the original video data into candidate video data of a preset video structure through the MFSM module. That is, the first terminal 100 can input the original audio data and original video data of any structure into the MFSM module, and the MFSM module can according to the data type and data format corresponding to the original audio structure, and the data type and data format corresponding to the candidate audio structure, And the pre-established correspondence between the original audio structure and the preset audio structure performs data extraction and conversion on the original audio data, thereby obtaining candidate audio data. Similarly, the MFSM module can also compare the original video structure based on the data type and data format corresponding to the original video structure, the data type and data format corresponding to the candidate video structure, and the pre-established correspondence between the original video structure and the preset video structure. The data is extracted and converted to obtain candidate video data. For example, the MFSM module can extract the video frame type from the original video data, and can convert the extracted video frame type into the type in the preset video structure according to the format corresponding to the video frame in the preset video structure; for example, MFSM The module can extract the sps from the original video data, and can convert the extracted sps into the sps corresponding to the preset video structure according to the format corresponding to the sps in the preset video structure, etc.

S605: The first terminal obtains target audio data corresponding to the candidate audio data according to the target audio structure, and obtains target video data corresponding to the candidate video data according to the target video structure.

Here, after the MFSM module in the first terminal 100 obtains the candidate audio data corresponding to the original audio data and the candidate video data corresponding to the original video data, it can then convert the candidate audio data into target audio data and convert the candidate video data into target audio data. Video data. Specifically, the MFSM module can compare candidate audio data according to the data type and data format corresponding to the candidate audio structure, the data type and data format corresponding to the target audio structure, and the pre-established correspondence between the preset audio structure and the target audio structure. Perform data extraction and conversion to obtain target audio data. Similarly, the MFSM module can compare candidate video data according to the data type and data format corresponding to the candidate video structure, the data type and data format corresponding to the target video structure, and the pre-established correspondence between the preset video structure and the target video structure. Perform data extraction and conversion to obtain target video data. Wherein, the correspondence relationship includes the correspondence relationship between data types and the correspondence relationship between data formats. That is, after the first terminal 100 inputs the original audio data and original video data of any structure to the MFSM module for processing, the MFSM module can output the target audio data and target video data required by the mixer in the first terminal 100 for mixing. To the mixer in the first terminal 100.

For example, the target audio structure GoogleMuxerAudioFrame and the target video structure GoogleMuxerVideoFrame corresponding to the Google Mixer are:

The MFSM module can extract and convert the type in AudioFrame to determine the flags in GoogleMuxerAudioFrame; it can extract and convert the esds in AudioFrame to determine the esds in GoogleMuxerAudioFrame; it can extract and convert the sample in AudioFrame to determine the GoogleMuxerAudioFrame The audioFrame; the audioSize in GoogleMuxerAudioFrame can be determined according to the array size of AudioFrame; the timeStamp in AudioFrame can be extracted and converted to determine the presentationTimeUs in GoogleMuxerAudioFrame; the type in VideoFrame can be extracted and converted to determine the flags in GoogleMuxerVideoFrame; Extract and convert sps in VideoFrame to determine sps in GoogleMuxerVideoFrame; extract and convert pps in VideoFrame to determine pps in GoogleMuxerVideoFrame; extract and convert frames in VideoFrame to determine videoFrame in GoogleMuxerVideoFrame; The videoSize in GoogleMuxerVideoFrame can be determined according to the array size of VideoFrame; the timeStamp in VideoFrame can be extracted and converted to determine presentationTimeUs in GoogleMuxerVideoFrame.

For example, the target audio structure Mp4V2MuxerAudioFrame and the target video structure Mp4V2MuxerVideoFrame corresponding to the Mp4V2 mixer are:

The MFSM module can extract and convert the type in AudioFrame to determine the audio type isSyncSample in Mp4V2MuxerAudioFrame; can calculate the audioSpecificConfig in Mp4V2MuxerAudioFrame and the configSize representing the size of audioSpecificConfig according to the adts in AudioFrame; determine the size of Mp4V2MuxerAudioFrame according to the sample in AudioFrame Audio frame audioSample; the audio frame size audioSize in Mp4V2MuxerAudioFrame can be determined according to the array size of AudioFrame; the audio frame length sampleDuration in Mp4V2MuxerAudioFrame can be determined according to the timeStamp in two adjacent AudioFrame, that is, sampleDuration is equal to the timeStamp in the next frame AudioFrame minus the previous The timeStamp in a frame of AudioFrame; the type in VideoFrame can be extracted and converted to determine the video frame type isSyncSample in Mp4V2MuxerVideoFrame; the avcProfileIndication in Mp4V2MuxerVideoFrame can be determined according to the second byte of sps in VideoFrame; the avcProfileIndication in Mp4V2MuxerVideoFrame can be determined according to the sps in VideoFrame The third byte determines the avcProfileCompat in the Mp4V2MuxerVideoFrame; the avcLevelIndication in the Mp4V2MuxerVideoFrame can be determined according to the fourth byte of the sps in the VideoFrame; the avcSampleLenFieldSizeMinusOne in the Mp4V2MuxerVideoFrame is equal to the length of the avcSampleLenFieldSizeMinusOneLen Minus 1; you can extract and convert sps in VideoFrame to determine sps in Mp4V2MuxerVideoFrame; you can extract and convert pps in VideoFrame to determine pps in Mp4V2MuxerVideoFrame; you can extract and convert frame in VideoFrame to determine Mp4V2MuxerVideoFrame Video frame in videoFrame ; The video frame size videoSize in Mp4V2MuxerVideoFrame can be determined according to the array size of VideoFrame; the video frame duration sampleDuration in Mp4V2MuxerAudioFrame can be determined according to the timeStamp in two adjacent VideoFrames, that is, sampleDuration is equal to the timeStamp in the next frame videoFrame minus the previous frame videoFrame TimeStamp in.

For example, because the sps and pps required by the Mp4V2 mixer do not have NALU headers, and the sps and pps in VideoFrame both have NALU headers, MFSM can extract sps and pps in VideoFrame, and remove the extracted sps and pps. The NALU header to get the sps and pps in Mp4V2MuxerVideoFrame.

In this embodiment, the MFSM module is provided with an input interface for receiving original audio data and original video data, and an output interface for outputting each target audio data and each target video data to the corresponding mixer. The MFSM module obtains the information in the first terminal 100 After the target audio data and target video data required by the mixer, each target audio data and each target video data can be output to the mixer for mixing processing through the corresponding output interface. For example, the output interface outputGoogleMuxerVideoSps() is set to output the sps required for the Google Mixer mixing to the Google Mixer, the output GoogleMuxerVideoPps() is set to output the pps required for the Google Mixer mixing to the Google Mixer, and the output GoogleMuxerVideoPps() is set to output the Google Mixer. The video frame type flags required by the mixer to the outputGoogleMuxerVideoFlags() of the GoogleMuxerVideoFlags(), and so on.

S606. The first terminal performs stream mixing processing on the target audio data and the target video data through the stream mixer to obtain screen recording data.

It should be understood that the content of S606 is similar to that of S405 in the first embodiment, and the basic principle is the same. For the sake of brevity, it will not be repeated here.

It should be noted that, as shown in FIG. 5b, in this embodiment, the MFSM module may also be provided in the second terminal 200. That is, after the encoder in the second terminal 200 encodes the initial audio data and the initial video data, the original audio data and the original video data obtained by encoding can be transmitted to the MFSM module in the second terminal 200, respectively. The MFSM module can process the original audio data and original video data, and output the target audio data and target video data to the first terminal 100. The mixer in the first terminal 100 can directly mix the received target audio data and target video data to synthesize the screen recording data and save it in the first terminal 100.

Wherein, the MFSM module in the second terminal 200 processes the original audio data and original video data, and the process of outputting target audio data and target video data is the same as that of the MFSM module in the first terminal 100. For processing, the process of outputting the target audio data and the target video data is similar, and the basic principle is the same. For the sake of brevity, the details are not repeated here.

In this embodiment, by setting the MFSM module in the first terminal/second terminal to perform the intermediate conversion of the target audio data and the target video data, the configuration of the corresponding relationship can be greatly simplified, thereby reducing the development workload of the development staff As well as the subsequent update workload, it can effectively reduce the search time of the target audio structure and the target video structure, thereby effectively increasing the conversion speed of the target audio data and the target video data, and improving the mixing efficiency of the mixer.

[Embodiment Three]

As shown in FIG. 7, in this embodiment, an MFSM module may be set in the second terminal 200 to perform intermediate conversion of target audio data and target video data, and an MFSM module may be set in the first terminal 100 to convert the intermediate conversion result. The data is converted into target audio data and target video data to simplify the configuration of the corresponding relationship in the first terminal 100 and the second terminal 200, reduce the development workload of the development staff and the subsequent update workload, and can effectively reduce the target audio structure And the search time of the target video structure, which can effectively improve the conversion speed of target audio data and target video data, and improve the mixing efficiency of the mixer.

Please refer to FIG. 8. FIG. 8 is a schematic flowchart of a cross-terminal screen recording method provided by this embodiment. The method can also be applied to the application scenario shown in FIG. 2. As shown in Figure 8, the method may include:

S801: The first terminal sends screen recording request information to the second terminal.

It should be understood that when the user needs to use the first terminal 100 to record the content being presented by the second terminal 200, the user can send the screen recording request information to the second terminal 200 through the first terminal 100 to request the second terminal 200 to The content it is presenting collects audio data and video data, and sends them to the first terminal 100. Exemplarily, the user can shake the first terminal 100 first, and can touch the first preset area in the first terminal 100 to the second preset in the second terminal 200 within a preset time after the shaking is finished. Area to send screen recording request information to the second terminal 200; or you can directly touch the first preset area in the first terminal 100 to the second preset area in the second terminal 200 to send the screen recording to the second terminal 200 Request information; or you can directly shake the first terminal 100 to send the screen recording request information to the second terminal 200; or you can click the screen recording button in the first terminal 100 to send the screen recording request information to the second terminal 200, This embodiment does not specifically limit the manner in which the first terminal 100 sends the screen recording request information to the second terminal 200.

S802: After receiving the screen recording request information of the first terminal, the second terminal obtains original audio data and original video data corresponding to the content currently displayed on the second terminal.

It should be understood that the content of S802 is similar to that of S402 in the first embodiment, and the basic principle is the same. For the sake of brevity, it will not be repeated here.

S803. The second terminal obtains candidate audio data corresponding to the original audio data according to the preset audio structure, and obtains candidate video data corresponding to the original video data according to the preset video structure.

Exemplarily, the second terminal 200 may convert the original audio data into candidate audio data of a preset audio structure and convert the original video data into candidate video data of a preset video structure through the MFSM module. That is, the second terminal 200 can input the original audio data and original video data of any structure to the MFSM module, and the MFSM module can preset the data type and data format corresponding to the audio structure according to the data type and data format corresponding to the original audio structure. , And the pre-established correspondence between the original audio structure and the preset audio structure to extract and convert the original audio data to obtain candidate audio data. Similarly, the MFSM module can compare the original video according to the data type and data format corresponding to the original video structure, the data type and data format corresponding to the preset video structure, and the pre-established correspondence between the original video structure and the preset video structure. The data is extracted and converted to obtain candidate video data. For example, the MFSM module can extract the video frame type from the original video data, and can convert the extracted video frame type to the type in the preset audio structure according to the format corresponding to the video frame in the preset audio structure; for example, MFSM The module can extract the sps from the original video data, and can convert the extracted sps into the sps corresponding to the preset video structure according to the format corresponding to the sps in the preset video structure, etc.

S804. The second terminal sends the candidate audio data and the candidate video data to the first terminal.

Here, after the MFSM module in the second terminal 200 obtains the candidate audio data of the preset audio structure and the candidate video data of the preset video structure, the candidate audio data and the candidate video data may be sent to the first terminal 100 respectively. In this embodiment, the original audio data and the original video data are intermediately converted by the MSFM module in the second terminal 200, and the candidate audio data and candidate video data are obtained and sent to the first terminal 100, which can effectively improve the target in the first terminal 100. The conversion speed of audio data and target video data improves the mixing efficiency of the mixer while reducing the processing performance requirements of the first terminal 100.

S805: The first terminal determines the target audio structure and the target video structure corresponding to the mixer in the first terminal.

It should be understood that the content of S805 is similar to that of S403 in the first embodiment, and the basic principle is the same. For the sake of brevity, it will not be repeated here.

S806: The first terminal obtains target audio data corresponding to the candidate audio data according to the target audio structure, and obtains target video data corresponding to the candidate video data according to the target video structure.

It should be understood that after receiving the candidate audio data and candidate video data sent by the second terminal 200, the first terminal 100 may transmit the candidate audio data and candidate video data to the MFSM module in the first terminal 100. The MFSM module can convert candidate audio data into target audio data according to the target audio structure, and can convert candidate video data into target video data according to the target video data. The process of the MFSM module in the first terminal 100 converting the candidate audio data into target audio data according to the target audio structure and converting the candidate video data into target video data according to the target video data is similar to the content of S605 in the second embodiment. The basic principles are the same, so I won’t repeat them here for the sake of brevity.

S807. The first terminal performs stream mixing processing on the target audio data and the target video data through the mixer in the first terminal to obtain screen recording data.

It should be understood that the content of S807 is similar to that of S405 in the first embodiment, and the basic principle is the same. For the sake of brevity, it will not be repeated here.

It should be noted that in the process of the first terminal 100 recording the content being presented by the second terminal 200, the user can also input a stop recording instruction on the second terminal 200 to instruct the first terminal 100 to stop recording. That is, during the process of collecting initial audio data and initial video data by the second terminal 200, the second terminal 200 can detect in real time whether the user inputs a screen recording stop instruction on the second terminal 200. If it is detected that the user inputs a stop screen recording instruction on the second terminal 200, the second terminal 200 can stop the collection of initial audio data and initial video data or can close the data transmission channel between the first terminal 100 and the second terminal 200 To instruct the first terminal 100 to stop recording the screen. The first terminal 100 does not receive the original audio data and the original video data sent by the second terminal 200 within a preset time or can stop after obtaining a notification that the data transmission channel between the first terminal 100 and the second terminal 200 is closed Screen recording operation, and can save the previously obtained screen recording data in the first terminal 100.

It should be noted that the instruction to stop screen recording may be an instruction generated by detecting that the user clicks a specific button such as "Stop" on the second terminal 200, or may be generated by detecting that the user input includes a specific voice keyword such as "stop" Or, it may be an instruction generated by detecting a specific gesture input by the user on the second terminal 200. This embodiment does not specifically limit the generation method of the instruction to stop the screen recording.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

Please refer to FIG. 9. FIG. 9 shows a structural block diagram of a cross-terminal screen recording device provided by an embodiment of the present application, and the device can be applied to the first terminal. As shown in Figure 9, the device may include:

The request sending module 901 is configured to send screen recording request information to the second terminal, where the screen recording request information is used to instruct the second terminal to send original audio data and original video data corresponding to the current display content to the first terminal. terminal;

The original audio and video receiving module 902 is configured to receive original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal;

The target structure determining module 903 is configured to determine the target audio structure and the target video structure corresponding to the mixer in the first terminal;

The target audio and video obtaining module 904 is configured to obtain target audio data corresponding to the original audio data according to the target audio structure, and obtain target video data corresponding to the original video data according to the target video structure;

The stream mixing module 905 is configured to perform stream mixing processing on the target audio data and the target video data through the stream mixer to obtain screen recording data.

In a possible implementation manner, the target audio and video acquisition module 904 may include:

Exemplarily, the candidate audio and video acquisition unit may include:

Specifically, the screen recording data is data in MP4 format.

In a possible implementation manner, the device may further include:

In another possible implementation manner, the apparatus may further include:

Exemplarily, the device may further include:

Please refer to FIG. 10. FIG. 10 shows a structural block diagram of a cross-terminal screen recording device provided by an embodiment of the present application, and the device can be applied to a second terminal. As shown in Figure 10, the device may include:

The original audio and video obtaining module 1001 is configured to obtain original audio data and original video data corresponding to the current display content of the second terminal after receiving the screen recording request information of the first terminal;

The target structure determining module 1002 is configured to determine the target audio structure and the target video structure corresponding to the mixer in the first terminal;

The target audio and video obtaining module 1003 is configured to obtain target audio data corresponding to the original audio data according to the target audio structure, and obtain target video data corresponding to the original video data according to the target video structure;

The target audio and video sending module 1004 is configured to send the target audio data and the target video data to the first terminal, so as to instruct the first terminal to send the target audio data to the target through the mixer in the first terminal. The audio data and the target video data are mixed stream processing to obtain screen recording data.

In a possible implementation manner, the target audio and video acquisition module 1003 may include:

Exemplarily, the candidate audio and video acquisition unit may include:

An original structure determining subunit for determining the original audio structure corresponding to the original audio data and the original video structure corresponding to the original video data;

It should be understood that the original audio and video obtaining module 1001 is specifically configured to obtain original audio data corresponding to the content currently displayed by the second terminal after detecting the touch operation of the second terminal by the first terminal And raw video data.

Exemplarily, the device may further include:

Please refer to FIG. 11, which shows a system schematic diagram of a cross-terminal screen recording system provided by an embodiment of the present application. As shown in FIG. 11, the system includes a first terminal 100 and a second terminal 200. The first terminal 100 includes a request sending module 101, a target structure determination module 102, and a mixing module 103. The second terminal 200 includes an original audio The video acquisition module 201 and the candidate audio and video acquisition module 202, wherein:

The request sending module 101 is configured to send screen recording request information to the second terminal;

The original audio and video obtaining module 201 is configured to obtain original audio data and original video data corresponding to the current display content of the second terminal after receiving the screen recording request information of the first terminal;

The candidate audio and video obtaining module 202 is configured to obtain candidate audio data corresponding to the original audio data according to a preset audio structure, and obtain candidate video data corresponding to the original video data according to the preset video structure, and combine the Sending the candidate audio data and the candidate video data to the first terminal;

The target structure determining module 102 is configured to determine the target audio structure and the target video structure corresponding to the mixer in the first terminal, and obtain the target audio data corresponding to the candidate audio data according to the target audio structure, And obtaining target video data corresponding to the candidate video data according to the target video structure;

The stream mixing module 103 is configured to perform stream mixing processing on the target audio data and the target video data through a stream mixer in the first terminal to obtain screen recording data.

In a possible implementation manner, the candidate audio and video acquisition module 202 may include:

An original structure determining unit, configured for the second terminal to determine the original audio structure corresponding to the original audio data, and the original video structure corresponding to the original video data;

Exemplarily, the target structure determining module 102 is further configured to convert the candidate audio data into the target audio data according to a pre-established correspondence between the preset audio structure and the target audio structure And convert the candidate video data into the target video data according to a pre-established correspondence between the preset video structure and the target video structure.

In a possible implementation manner, the first terminal 100 may further include a screen recording saving module:

It should be understood that the original audio and video obtaining module 201 is specifically configured to obtain original audio data corresponding to the content currently displayed by the second terminal after detecting the touch operation of the second terminal by the first terminal And raw video data.

Exemplarily, the second terminal 200 may further include a screen recording stop module;

Specifically, the screen recording data is data in MP4 format.

It should be noted that the information interaction and execution process among the above-mentioned devices/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment section. I won't repeat it here.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

FIG. 12 is a schematic structural diagram of a terminal device provided by an embodiment of this application. As shown in FIG. 12, the terminal device 12 of this embodiment includes: at least one processor 1200 (only one is shown in FIG. 12), a memory 1201, and stored in the memory 1201 and can be stored in the at least one processor 1200. When the processor 1200 executes the computer program 1202, the steps in any of the foregoing embodiments of the cross-terminal screen recording method are implemented.

The terminal device 12 may include, but is not limited to, a processor 1200 and a memory 1201. Those skilled in the art can understand that FIG. 12 is only an example of the terminal device 12, and does not constitute a limitation on the terminal device 12. It may include more or fewer components than shown in the figure, or a combination of certain components, or different components. , For example, can also include input and output devices, network access devices, and so on.

The processor 1200 may be a central processing unit (Central Processing Unit, CPU), and the processor 1200 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application specific integrated circuits (Application Specific Integrated Circuits). , ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The memory 1201 may be an internal storage unit of the terminal device 12 in some embodiments, such as a hard disk or a memory of the terminal device 12. In other embodiments, the memory 1201 may also be an external storage device of the terminal device 12, for example, a plug-in hard disk equipped on the terminal device 12, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc. Further, the memory 1201 may also include both an internal storage unit of the terminal device 12 and an external storage device. The memory 1201 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 1201 can also be used to temporarily store data that has been output or will be output.

From the foregoing, the terminal device 12 may be a mobile phone, a tablet computer, a desktop computer, a wearable device, a vehicle-mounted device, a notebook computer, a smart TV, a smart speaker, or an ultra-mobile personal computer (UMPC). , Netbooks, personal digital assistants (personal digital assistants, PDAs) and other terminal devices with display screens. Take the terminal device 12 as a mobile phone as an example. FIG. 13 shows a block diagram of a part of the structure of a mobile phone provided by an embodiment of the present application. Referring to FIG. 13, the mobile phone includes: a radio frequency (RF) circuit 1310, a memory 1320, an input unit 1330, a display unit 1340, a sensor 1350, an audio circuit 1360, a wireless fidelity (WiFi) module 1370, and a processor 1380 , And power supply 1390 and other components. Those skilled in the art can understand that the structure of the mobile phone shown in FIG. 13 does not constitute a limitation on the mobile phone, and may include more or fewer components than those shown in the figure, or a combination of some components, or different component arrangements.

The following describes the components of the mobile phone in detail with reference to Figure 13:

The RF circuit 1310 can be used for receiving and sending signals during the process of sending and receiving information or talking. In particular, after receiving the downlink information of the base station, it is processed by the processor 1380; in addition, the designed uplink data is sent to the base station. Generally, the RF circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 1310 can also communicate with the network and other devices through wireless communication. The above-mentioned wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code Division) Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), Email, Short Messaging Service (SMS), etc.

The memory 1320 may be used to store software programs and modules. The processor 1380 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 1320. The memory 1320 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of mobile phones (such as audio data, phone book, etc.), etc. In addition, the memory 1320 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The input unit 1330 can be used to receive input digital or character information, and generate key signal input related to the user settings and function control of the mobile phone. Specifically, the input unit 1330 may include a touch panel 1331 and other input devices 1332. The touch panel 1331, also known as a touch screen, can collect the user's touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 1331 or near the touch panel 1331. Operation), and drive the corresponding connection device according to the preset program. Optionally, the touch panel 1331 may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 1380, and can receive and execute the commands sent by the processor 1380. In addition, the touch panel 1331 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1331, the input unit 1330 may also include other input devices 1332. Specifically, the other input device 1332 may include, but is not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick.

The display unit 1340 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The display unit 1340 may include a display panel 1341. Optionally, the display panel 1341 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc. Further, the touch panel 1331 can cover the display panel 1341. When the touch panel 1331 detects a touch operation on or near it, it transmits it to the processor 1380 to determine the type of touch event, and then the processor 1380 determines the type of the touch event. Type provides corresponding visual output on the display panel 1341. Although in FIG. 13, the touch panel 1331 and the display panel 1341 are used as two independent components to realize the input and input functions of the mobile phone, but in some embodiments, the touch panel 1331 and the display panel 1341 can be integrated. Realize the input and output functions of the mobile phone.

The mobile phone may also include at least one sensor 1350, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 1341 according to the brightness of the ambient light. The proximity sensor can close the display panel 1341 and/or when the mobile phone is moved to the ear. Or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when it is stationary. It can be used to identify mobile phone posture applications (such as horizontal and vertical screen switching, related Games, magnetometer posture calibration), vibration recognition related functions (such as pedometer, percussion), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that can be configured in mobile phones, I will not here Go into details.

The audio circuit 1360, the speaker 1361, and the microphone 1362 can provide an audio interface between the user and the mobile phone. The audio circuit 1360 can transmit the electric signal converted from the received audio data to the speaker 1361, which is converted into a sound signal for output by the speaker 1361; on the other hand, the microphone 1362 converts the collected sound signal into an electric signal, and the audio circuit 1360 After being received, it is converted into audio data, and then processed by the audio data output processor 1380, and then sent to, for example, another mobile phone via the RF circuit 1310, or the audio data is output to the memory 1320 for further processing.

WiFi is a short-distance wireless transmission technology. The mobile phone can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 1370. It provides users with wireless broadband Internet access. Although FIG. 13 shows the WiFi module 1370, it is understandable that it is not a necessary component of the mobile phone and can be omitted as needed without changing the essence of the invention.

The processor 1380 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. Various functions and processing data of the mobile phone can be used to monitor the mobile phone as a whole. Optionally, the processor 1380 may include one or more processing units; preferably, the processor 1380 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface, application programs, etc. , The modem processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 1380.

The mobile phone also includes a power supply 1390 (such as a battery) for supplying power to various components. Preferably, the power supply can be logically connected to the processor 1380 through a power management system, so that functions such as charging, discharging, and power management can be managed through the power management system.

Although not shown, the mobile phone may also include a camera. Optionally, the position of the camera on the mobile phone may be front-mounted or rear-mounted, which is not limited in the embodiment of the present application.

Optionally, the mobile phone may include a single camera, a dual camera, or a triple camera, etc., which is not limited in the embodiment of the present application.

For example, a mobile phone may include three cameras, of which one is a main camera, one is a wide-angle camera, and one is a telephoto camera.

Optionally, when the mobile phone includes multiple cameras, the multiple cameras may be all front-mounted, or all rear-mounted, or partly front-mounted and some rear-mounted, which is not limited in the embodiment of the present application.

In addition, although not shown, the mobile phone may also include an NFC chip, and the NFC chip may be arranged near the rear camera of the mobile phone.

In addition, although not shown, the mobile phone may also include a Bluetooth module, etc., which will not be repeated here.

FIG. 14 is a schematic diagram of the software structure of a mobile phone according to an embodiment of the present application. Taking the Android system as the mobile phone operating system as an example, in some embodiments, the Android system is divided into four layers, namely the application layer, the application framework layer (framework, FWK), the system layer, and the hardware abstraction layer. Layers and layers Through the software interface communication between.

As shown in FIG. 14, the application layer can be a series of application packages, and the application packages can include applications such as short message, calendar, camera, video, navigation, gallery, and call.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer.

As shown in Figure 14, the application framework layer can include a window manager, a resource manager, and a notification manager.

The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, take a screenshot, etc. The content provider is used to store and retrieve data and make these data accessible to applications. The data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, and so on. The notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or a scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompt text information in the status bar, sound a prompt sound, electronic device vibration, flashing indicator light, etc.

The application framework layer can also include:

A view system, which includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.

The phone manager is used to provide the communication function of the mobile phone. For example, the management of the call status (including connecting, hanging up, etc.).

The system layer can include multiple functional modules. For example: sensor service module, physical state recognition module, 3D graphics processing library (for example: OpenGL ES), etc.

The sensor service module is used to monitor the sensor data uploaded by various sensors at the hardware layer to determine the physical state of the mobile phone;

Physical state recognition module, used to analyze and recognize user gestures, faces, etc.;

The 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.

The system layer can also include:

The surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The hardware abstraction layer is the layer between hardware and software. The hardware abstraction layer can include display drivers, camera drivers, sensor drivers, etc., which are used to drive related hardware at the hardware layer, such as display screens, cameras, sensors, and so on.

The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in each of the foregoing method embodiments can be implemented.

The embodiments of the present application provide a computer program product. When the computer program product runs on a terminal device, the terminal device can implement the steps in the foregoing method embodiments when executed.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in this application can be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable storage medium may at least include: any entity or device capable of carrying computer program code to the device/equipment, recording medium, computer memory, read-only memory (Read-Only Memory, ROM), random access memory ( Random Access Memory, RAM), electric carrier signal, telecommunications signal and software distribution medium. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc. In some jurisdictions, in accordance with legislation and patent practices, computer-readable storage media cannot be electrical carrier signals and telecommunication signals.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed device/terminal device and method may be implemented in other ways. For example, the device/terminal device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A cross-terminal screen recording method, characterized in that it is applied to a first terminal, and the method includes:

Sending screen recording request information to the second terminal, where the screen recording request information is used to instruct the second terminal to send original audio data and original video data corresponding to the currently displayed content to the first terminal;

Receiving original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal;

Determining the target audio structure and the target video structure corresponding to the mixer in the first terminal;

Obtaining target audio data corresponding to the original audio data according to the target audio structure, and obtaining target video data corresponding to the original video data according to the target video structure;

The target audio data and the target video data are mixed stream processed by the stream mixer to obtain screen recording data.
The method according to claim 1, wherein the target audio data corresponding to the original audio data is obtained according to the target audio structure, and the target video corresponding to the original video data is obtained according to the target video structure The data includes:

Obtaining candidate audio data corresponding to the original audio data according to a preset audio structure, and obtaining candidate video data corresponding to the original video data according to the preset video structure;

Convert the candidate audio data into the target audio data according to the pre-established correspondence between the preset audio structure and the target audio structure, and convert the candidate audio data into the target audio data according to the pre-established preset video structure and the target audio structure. The corresponding relationship between the target video structures is used to convert the candidate video data into the target video data.
The method according to claim 2, wherein the obtaining candidate audio data corresponding to the original audio data according to a preset audio structure, and obtaining candidate video data corresponding to the original video data according to the preset video structure comprises :

Determine the original audio structure corresponding to the original audio data, and the original video structure corresponding to the original video data;

According to the pre-established correspondence between the original audio structure and the preset audio structure, the original audio data is converted into the candidate audio data, and according to the pre-established original video structure and the preset audio structure, the original audio data is converted into the candidate audio data. Assuming the corresponding relationship between the video structures, the original video data is converted into the candidate video data.
The method according to any one of claims 1 to 3, wherein the screen recording data is data in MP4 format.
The method according to any one of claims 1 to 4, wherein after receiving the original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal, the method further comprises:

The original video data is decoded by the video decoder in the first terminal, and the original video data obtained by the decoding is rendered on the display interface of the first terminal.
The method according to claim 5, wherein after receiving the original audio data and original video data corresponding to the content currently displayed by the second terminal sent by the second terminal, the method further comprises:

The original audio data is decoded by the audio decoder in the first terminal, and the original audio data obtained by the decoding is played by the sound playing device of the first terminal.
The method according to any one of claims 1-6, wherein the method further comprises:

If a screen recording stop instruction is detected on the first terminal, instruct the second terminal to stop sending original audio data and original video data, and save the screen recording data in the first terminal.
A cross-terminal screen recording method, characterized in that it is applied to a second terminal, and the method includes:

After receiving the screen recording request information of the first terminal, obtain the original audio data and the original video data corresponding to the content currently displayed on the second terminal;

Determining the target audio structure and the target video structure corresponding to the mixer in the first terminal;

Obtaining target audio data corresponding to the original audio data according to the target audio structure, and obtaining target video data corresponding to the original video data according to the target video structure;

The target audio data and the target video data are sent to the first terminal to instruct the first terminal to perform processing on the target audio data and the target video data through the mixer in the first terminal Mixed-stream processing to obtain screen recording data.
The method according to claim 8, wherein the target audio data corresponding to the original audio data is obtained according to the target audio structure, and the target video corresponding to the original video data is obtained according to the target video structure The data includes:

Obtaining candidate audio data corresponding to the original audio data according to a preset audio structure, and obtaining candidate video data corresponding to the original video data according to the preset video structure;

Convert the candidate audio data into the target audio data according to the pre-established correspondence between the preset audio structure and the target audio structure, and convert the candidate audio data into the target audio data according to the pre-established preset video structure and the target audio structure. The corresponding relationship between the target video structures is used to convert the candidate video data into the target video data.
The method according to claim 9, wherein the obtaining candidate audio data corresponding to the original audio data according to a preset audio structure, and obtaining candidate video data corresponding to the original video data according to the preset video structure comprises :

Determine the original audio structure corresponding to the original audio data, and the original video structure corresponding to the original video data;

According to the pre-established correspondence between the original audio structure and the preset audio structure, the original audio data is converted into the candidate audio data, and according to the pre-established original video structure and the preset audio structure, the original audio data is converted into the candidate audio data. Assuming the corresponding relationship between the video structures, the original video data is converted into the candidate video data.
The method according to any one of claims 8-10, wherein after receiving the screen recording request information of the first terminal, the original audio data and original audio data corresponding to the current display content of the second terminal are acquired. Video data includes:

After detecting the touch operation of the first terminal on the second terminal, the original audio data and the original video data corresponding to the content currently displayed by the second terminal are acquired.
The method according to any one of claims 8-11, wherein the method further comprises:

If an instruction to stop screen recording is detected on the second terminal, stop sending original audio data and original video data to the first terminal.
A cross-terminal screen recording method, which is characterized in that it includes:

The first terminal sends screen recording request information to the second terminal;

After receiving the screen recording request information of the first terminal, the second terminal acquires original audio data and original video data corresponding to the content currently displayed on the second terminal;

The second terminal obtains the candidate audio data corresponding to the original audio data according to the preset audio structure, and obtains the candidate video data corresponding to the original video data according to the preset video structure, and combines the candidate audio data with the Sending candidate video data to the first terminal;

The first terminal determines the target audio structure and the target video structure corresponding to the mixer in the first terminal, and obtains the target audio data corresponding to the candidate audio data according to the target audio structure, and according to the target Obtaining the target video data corresponding to the candidate video data by the video structure;

The first terminal performs stream mixing processing on the target audio data and the target video data through the mixer in the first terminal to obtain screen recording data.
The method according to claim 13, wherein the second terminal obtains the candidate audio data corresponding to the original audio data according to a preset audio structure, and obtains the candidate audio data corresponding to the original video data according to the preset video structure. Video data includes:

Determining, by the second terminal, an original audio structure corresponding to the original audio data, and an original video structure corresponding to the original video data;

The second terminal converts the original audio data into the candidate audio data according to the pre-established correspondence between the original audio structure and the preset audio structure, and according to the pre-established original video The corresponding relationship between the structure and the preset video structure is to convert the original video data into the candidate video data.
The method according to claim 13 or 14, wherein the first terminal obtains target audio data corresponding to the candidate audio data according to the target audio structure, and obtains the candidate video according to the target video structure The target video data corresponding to the data includes:

The first terminal converts the candidate audio data into the target audio data according to the pre-established correspondence between the preset audio structure and the target audio structure, and converts the candidate audio data into the target audio data according to the pre-established preset The corresponding relationship between the video structure and the target video structure is used to convert the candidate video data into the target video data.
A terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to make the terminal device The cross-terminal screen recording method according to any one of claims 1 to 8 or any one of claims 9 to 12 is realized.
A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the computer realizes any one of claims 1 to 8, or The cross-terminal screen recording method according to any one of claims 9 to 12.