CN106412687B

CN106412687B - Method and device for intercepting audio and video clips

Info

Publication number: CN106412687B
Application number: CN201510446683.8A
Authority: CN
Inventors: 陈俊峰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Yayue Technology Co ltd
Priority date: 2015-07-27
Filing date: 2015-07-27
Publication date: 2020-06-05
Anticipated expiration: 2035-07-27
Also published as: CN106412687A

Abstract

The embodiment of the invention discloses a method and a device for intercepting audio and video clips, which are used for improving the processing efficiency of intercepting the audio and video clips. In the method for intercepting audio and video clips provided by the embodiment of the invention, decoded audio data corresponding to an audio file currently being played is obtained according to an audio and video intercepting instruction from the beginning of the playing time as the intercepting starting time point, and decoded video data corresponding to a video intercepting area in the video file currently being played is obtained at the same time, and the decoded audio data and the decoded video data are stopped to be obtained until the playing time is the intercepting finishing time point; respectively carrying out file format coding on the obtained decoded audio data and the decoded video data according to an audio and video interception instruction from the interception ending time point to generate an audio fragment and a video fragment, and synthesizing the audio fragment and the video fragment to obtain an audio and video fragment; and outputting the audio and video clips according to the target purpose.

Description

Method and device for intercepting audio and video clips

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for intercepting audio and video clips.

Background

In recent years, multimedia information technology has developed rapidly, users are more and more accustomed to playing audio and video by holding a terminal in hand, when a user is very interested in a certain audio segment in a video and a video segment played synchronously by the audio segment in the video while watching and listening to the video, the user needs to intercept a certain audio segment from the played audio and video and store the certain audio segment and the video segment, for example, the user uses the terminal to watch a ball game video, and if the user is very interested in a certain video played in the video and an audio segment accompanied by the video, the user needs to store the certain audio segment and the video segment. In the prior art, audio processing and video processing are often separately and independently processed, for example, in the audio processing, a user may submit an interception command to a terminal, the terminal needs to stop the currently played video or audio and store the currently stopped audio content, the audio interception scheme is to capture the audio content in time, and the processing efficiency of audio capture is very low. For example, in video processing, screenshots of a plurality of video images need to be merged to obtain a captured video clip, the method for capturing the video clip is only suitable for video clips which need to be captured for a short time and are combined by a few video images, and if a user needs to capture a video clip with a large time span, a large number of video images need to be captured according to the method, so that the capturing efficiency of the video clip is very low. The prior art has low audio and video processing efficiency.

Disclosure of Invention

The embodiment of the invention provides a method and a device for intercepting audio and video clips, which are used for improving the processing efficiency of intercepting the audio and video clips.

In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:

in a first aspect, an embodiment of the present invention provides an audio and video clip intercepting method, including:

receiving an audio and video interception instruction sent by a user through a current playing terminal, wherein the audio and video interception instruction comprises: the user determines the interception start time point and the interception end time point of the audio and video to be intercepted, the video interception area defined by the user in the playing interface of the current playing terminal and the target use selected by the user;

starting from the moment that the playing time is the interception starting time point, acquiring decoded audio data corresponding to the currently playing audio file according to the audio and video interception instruction, acquiring decoded video data corresponding to the video interception area in the currently playing video file at the same time, stopping acquiring decoded audio data corresponding to the currently playing audio file until the playing time is the interception ending time point, and stopping acquiring decoded video data corresponding to the video interception area in the currently playing video file at the same time;

respectively carrying out file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction from the interception ending time point to generate an audio fragment and a video fragment, and synthesizing the audio fragment and the video fragment to obtain an audio and video fragment;

and outputting the audio and video clips according to the target purpose.

In a second aspect, an embodiment of the present invention further provides a terminal, including:

the receiving module is used for the user to send an audio and video intercepting instruction through the current playing terminal, and the audio and video intercepting instruction comprises the following steps: the user determines the interception start time point and the interception end time point of the audio and video to be intercepted, the video interception area defined by the user in the playing interface of the current playing terminal and the target use selected by the user;

a decoded data acquisition module, configured to acquire, according to the audio/video capture instruction, decoded audio data corresponding to the currently playing audio file from the time when the playing time is the capture start time point, and acquire decoded video data corresponding to the video capture area in the currently playing video file at the same time, and stop acquiring the decoded audio data corresponding to the currently playing audio file and stop acquiring the decoded video data corresponding to the video capture area in the currently playing video file until the playing time is the capture end time point;

the file coding module is used for respectively carrying out file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction from the interception ending time point to generate an audio segment and a video segment, and synthesizing the audio segment and the video segment to obtain an audio and video segment;

and the audio and video clip output module is used for outputting the audio and video clips according to the target purpose.

According to the technical scheme, the embodiment of the invention has the following advantages:

in the embodiment of the invention, when a user sends an audio and video interception command through a current playing terminal, an audio and video interception command is firstly received, and the audio and video interception command comprises the following steps: intercepting a starting time point, an intercepting ending time point, a video intercepting area defined by a user and a target use selected by the user, after a playing interface in the terminal starts to play an audio file and a video file, when the playing time reaches the intercepting starting time point, acquiring decoded audio data corresponding to the currently playing audio file, acquiring decoded video data corresponding to the video intercepting area in the currently playing video file, before the intercepting ending time point is not reached, continuously acquiring decoded audio data corresponding to the currently playing audio file, continuously acquiring decoded video data corresponding to the video intercepting area in the currently playing video file, acquiring a plurality of decoded audio data and a plurality of decoded video data according to an audio and video intercepting instruction, after the intercepting ending time point is reached, and respectively carrying out file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction, so that an audio fragment and a video fragment can be generated, the audio and video fragment can be obtained by synthesizing the audio fragment and the video fragment, and finally the audio and video fragment can be output according to the target purpose. In the embodiment of the invention, the audio segment to be intercepted is obtained by acquiring the decoded audio data corresponding to the audio file being played and then carrying out the file format coding on the decoded audio data, rather than obtaining the audio segment by capturing a plurality of audios to combine, similarly, the video segment to be intercepted is obtained by acquiring the decoded video data corresponding to the video file being played and then carrying out the file format coding on the decoded video data, rather than obtaining the video segment by capturing a plurality of video images to combine. In the embodiment of the invention, even if the audio and video clips with large time span need to be intercepted, the user only needs to set the intercepting starting time point and the intercepting ending time point, and the intercepting processing efficiency of the audio and video clips is also very high.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings.

Fig. 1 is a schematic flow block diagram of an audio/video clip interception method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating an obtaining manner of a video capture area according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an interception flow of an audio/video clip according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a process for decoding audio data according to an embodiment of the present invention;

fig. 5-a is a schematic structural diagram of an intercepting apparatus for audio and video clips according to an embodiment of the present invention;

fig. 5-b is a schematic structural diagram of another component of an apparatus for capturing audio/video clips according to an embodiment of the present invention;

fig. 5-c is a schematic structural diagram of another audio/video clip capture device according to an embodiment of the present invention;

fig. 5-d is a schematic structural diagram of another audio/video clip capture device according to an embodiment of the present invention;

fig. 5-e is a schematic structural diagram of another audio/video clip capture device provided in the embodiment of the present invention;

fig. 5-f is a schematic structural diagram of another audio/video clip capture device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal to which the method for capturing audio and video clips provided by the embodiment of the present invention is applied.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one skilled in the art from the embodiments given herein are intended to be within the scope of the invention.

The terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention and the above-described drawings are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The following are detailed below.

Referring to fig. 1, an embodiment of the method for capturing an audio/video clip according to the present invention can be specifically applied to a scene where an audio/video clip needs to be captured in an audio/video playing terminal, and the method for capturing an audio/video clip according to an embodiment of the present invention may include the following steps:

101. and receiving an audio and video interception instruction sent by a user through the current playing terminal.

Wherein, audio video interception instruction includes: the method comprises the steps that an interception starting time point and an interception ending time point of the audio and video needing to be intercepted are determined by a user, a video interception area defined by the user in a playing interface of a current playing terminal, and a target purpose selected by the user.

In the embodiment of the invention, when a user plays audio and video on the operation terminal and hears interesting audio and sees video synchronous with the audio, the user can operate the audio and video capturing button on the terminal, thereby triggering the terminal to execute the interception of the audio and video clips, for example, displaying an interception audio and video button on a touch screen of the terminal, when a user needs to intercept the audio and video, the user clicks an audio and video interception button on the touch screen, the user sends an audio and video interception instruction to the terminal, the audio and video interception instruction comprises an interception starting time point required by a user, when the user does not need to continuously intercept the audio and video, the user can click the audio and video intercepting button on the touch screen again, and the user can send an audio and video intercepting instruction to the terminal again, wherein the audio and video intercepting instruction comprises an intercepting finishing time point required by the user. Without limitation, in the embodiment of the present invention, a user may directly determine the audio/video duration to be intercepted, and then the user may send an audio/video interception instruction to the terminal, where the audio/video interception instruction includes an interception start time point and an interception end time point, so that the terminal may determine from which time point to begin to intercept the audio/video and how long a period of the audio/video should be intercepted.

In addition, in the embodiment of the present invention, when a user needs to intercept a partial picture area of a playing interface of a current playing terminal, but does not need to intercept a video picture of a whole playing interface, the user equipment may define a video interception area in the playing interface of the current playing terminal, and then does not intercept the picture outside the video interception area, and at this time, the user equipment may also carry the video interception area defined by the user from the playing interface in an audio/video interception instruction. In addition, the user can select a target purpose through the audio and video interception instruction to indicate that the terminal intercepts the audio and video clip and outputs the audio and video clip according to the specific target purpose, for example, the user archives the intercepted audio and video clip, or the intercepted audio and video clip is shared in a QQ space or WeChat after the arching, and the target purpose indicates the specific purpose of the audio and video clip which the user needs to output, so that the audio and video clip obtained by audio and video interception in the invention can meet the requirement of the user on the target purpose.

In some embodiments of the present invention, the audio/video interception instruction sent to the terminal by the user may include other information that the user needs to indicate the terminal, in addition to the interception start time point and the interception end time point, for example, the user may indicate that the terminal should output an audio/video clip meeting the requirement of what audio parameter, that is, in the present invention, the user may further output a corresponding audio/video clip to the intercepted and output audio/video clip according to the audio parameter required by the user, so as to meet more requirements of the user on the intercepted audio. For another example, the user can instruct the terminal to output the audio/video clips meeting the video parameter requirements, that is, the corresponding audio/video clips can be further output to the intercepted and output audio/video clips according to the video parameters required by the user in the invention, so that more requirements of the user on the intercepted audio/video can be met.

Specifically, in some embodiments of the present invention, the audio/video interception instruction may further specifically include a target audio file format selected by the user, that is, the user may instruct the terminal to output an audio/video clip whose audio parameter is the target audio file format, where the file format refers to a file format of the audio file itself, such as MP3, wma, ape, and the like, and the target audio file format indicates a specific file format that the user needs to output, and then the audio/video clip obtained by audio interception in the present invention may meet a requirement of the user on the target audio file format.

In some embodiments of the present invention, the audio/video capture instruction may further specifically include a target sampling frequency selected by the user, that is, the user may instruct the terminal to output an audio/video clip whose audio parameter is the target sampling frequency, where the sampling frequency refers to the number of samples extracted from the continuous signal per second in the audio file and may be represented by hertz (Hz), for example, the sampling frequency may be divided into multiple levels of 22.05KHz, 44.1KHz, 48KHz, and the target sampling frequency indicates a specific sampling frequency that the user needs to output, and then the audio/video clip obtained by audio capture in the present invention may meet the requirement of the user on the target sampling frequency.

In some embodiments of the present invention, the audio/video interception instruction may further specifically include a target audio format selected by the user, that is, the user may instruct the terminal to output an audio/video segment whose audio parameter is in the target audio format, where the audio format refers to an audio content encoding format of an audio file, such as PCM encoding, OGG encoding, and the like, and the target audio format indicates a specific audio format that the user needs to output, and then the audio/video segment obtained by audio interception in the present invention may meet a requirement of the user on the target audio format.

In some embodiments of the present invention, the audio/video capture instruction may further specifically include a target channel number selected by the user, that is, the user may instruct the terminal to output an audio/video segment whose audio parameter is the target channel number, where the channel number refers to the number of the sound systems capable of generating different sounds, for example, the channel number may be 2.0, 2.1, 4.1, 5.1, 7.1, and the like, for example, the channel number 5.1 indicates that the terminal has a left main channel, a right main channel, a center channel, a left surround channel, a right surround channel, and a subwoofer channel, and the user may select the target channel number as needed, and the target channel number indicates a specific channel number that the user needs to output, so that the audio/video segment obtained by audio capture in the present invention may meet the requirement of the user on the target channel number.

In some embodiments of the present invention, the audio/video intercepting instruction may further specifically include a target channel arrangement selected by the user, that is, the user may instruct the terminal to output an audio/video segment whose audio parameter is the target channel arrangement, where the channel arrangement refers to an arrangement manner of multiple channels in an audio file, for example, the channel arrangement may refer to 6 mono arrangements or one 5.1 channel arrangement, and the target channel arrangement indicates a specific channel arrangement that the user needs to output, and then the audio/video segment obtained by audio interception in the present invention may meet a requirement of the user on the target channel arrangement.

In some embodiments of the present invention, the audio/video capture instruction may further specifically include a target sampling point format selected by the user, that is, the user may instruct the terminal to output an audio/video segment whose audio parameter is the target sampling point format, where the sampling point format may include whether the audio data is in a fixed-point format or a floating-point format, and the target sampling point format indicates a specific sampling point format that the user needs to output, and then the audio/video segment obtained by audio capture in the present invention may meet the requirement of the user on the target sampling point format.

In some embodiments of the present invention, the audio/video interception instruction may further specifically include a target purpose selected by the user, that is, the user may instruct the terminal to output an audio/video clip with a specific purpose, where the target purpose refers to an output path of the intercepted audio file, for example, the output path may be an archived file or shared after archiving, and the target purpose indicates a specific purpose of the audio/video clip that the user needs to output, and then the audio/video clip obtained by audio interception in the present invention may meet a requirement of the user for the target purpose.

Specifically, in some embodiments of the present invention, the audio/video capture instruction may further specifically include a target video file format selected by the user, that is, the user may instruct the terminal to output an audio/video clip whose video parameter is in the target video file format, where the file format refers to a file format of the video file itself, for example, may be MP4, mkv, and the target file format indicates a specific file format that the user needs to output, and then the audio/video clip obtained by video capture in the present invention may meet a requirement of the user on the target video file format.

In some embodiments of the present invention, the audio/video capture instruction may further specifically include a target resolution selected by the user, that is, the user may instruct the terminal to output an audio/video clip whose video parameter is the target resolution, where the resolution refers to a setting of how much information is displayed in the video file, and both the width and the height are usually set by 16 times as a step unit, for example, 16 × n (n is 1,2, 3....) such as 176 × 144, 352 × 288, and the target resolution indicates a specific resolution that the user needs to output, and then the audio/video clip obtained by video capture in the present invention may meet a requirement of the user on the target resolution.

In some embodiments of the present invention, the audio/video capturing instruction may further specifically include a target video format selected by the user, that is, the user may instruct the terminal to output an audio/video segment whose video parameter is in the target video format, where the video format refers to a video content encoding format of a video file, for example, H264, and the target video format indicates a specific video format that the user needs to output, and then the audio/video segment obtained by video capturing in the present invention may meet the requirement of the user on the target video format.

In some embodiments of the present invention, the audio/video capture instruction may further specifically include a target video quality selected by the user, that is, the user may instruct the terminal to output an audio/video segment whose video parameter is the target video quality, where the video quality refers to a video transmission level requirement of a video file, and may represent complexity of a video format, for example, the video quality is divided into 3 levels or 5 levels, the user may select a required target video quality as level iii, and the target video quality indicates a specific video quality level that the user needs to output, and then the video segment obtained by video capture in the present invention may meet the requirement of the user on the target video quality. It should be noted that, in the present invention, the video quality may also include other parameters of the video, for example, the video quality may be used to represent the number of frames between key frames in a group of pictures (gop) of the video, the video quality may be used to represent a quantization coefficient (qp) of the video, which may determine the encoding compression rate and image precision of a quantizer, and the video quality may also be used to represent the configuration of the video, for example, the configuration includes main setting indexes such as baseline, main, high, and the like.

In some embodiments of the present invention, the audio/video capture instruction may further specifically include a target video frame rate selected by the user, that is, the user may instruct the terminal to output an audio/video segment whose video parameter is the target video frame rate, where the video frame rate refers to a video playing rate of a video file and indicates how many frames of pictures are played per second, for example, the video frame rate may be 30fps, the user may select a required target video frame rate of 20fps, and the target video frame rate indicates a specific video frame rate that the user needs to output, and then the audio/video segment obtained by video capture in the present invention may meet the requirement of the user on the target video frame rate.

It should be noted that, in the foregoing, various audio parameters included in the audio/video capture instruction received by the terminal in the present invention are described in detail, it is to be understood that the audio/video capture instruction in the present invention may further include one or more audio parameters described above, which one or more audio parameters specifically need to be selected by a user, and which one or more audio parameters specifically may be determined by combining with an application scenario.

102. And starting from the point that the playing time is the interception starting time, acquiring decoded audio data corresponding to the currently playing audio file according to the audio and video interception instruction, acquiring decoded video data corresponding to the video interception area in the currently playing video file at the same time, stopping acquiring decoded audio data corresponding to the currently playing audio file until the playing time is the interception ending time, and stopping acquiring decoded video data corresponding to the video interception area in the currently playing video file at the same time.

In the embodiment of the invention, after the terminal receives an audio and video interception instruction comprising an interception start time point, the terminal monitors a currently played audio file and a currently played video file in a playing screen of the terminal to obtain the progress of playing time, when the playing time reaches the interception start time point, the currently played playing time is the interception start time point, and from the interception start time point, the terminal obtains decoded audio data corresponding to the currently played audio file in real time. The terminal acquires the decoded video data corresponding to the video intercepting area in the video file currently being played in real time, and the terminal needs to acquire the decoded video data corresponding to the video intercepting area in the video file currently being played from the intercepting starting time point, and the acquisition of the decoded video data cannot stop before the video intercepting instruction containing the intercepting finishing time point is not received.

Taking audio playing as an example, the audio playing process is a process of decoding an audio file into original data and then playing the original data, and the interception start time point is used as a mark to obtain the audio file currently being played, where the audio file may be the audio file currently being played, or the audio file may be an audio file (i.e., audio accompanied by video image playing) played synchronously with the video image currently being played, and since the audio file is decoded by a software decoder or a hardware decoder into decoded audio data, according to the correspondence relationship before and after decoding, corresponding decoded audio data may be correspondingly found from the audio file currently being played, and the decoded audio data is usually in an original data format, for example, may be in a PCM encoding format. For example, the time axis of the playing time shows that an audio file of 4 minutes and 20 seconds is being played, if an interception start time point carried in an audio/video interception instruction received by the terminal is 4 minutes and 22 seconds, when the time axis of the current playing time is shifted to 4 minutes and 22 seconds, decoded audio data corresponding to the audio file being played at that time is obtained, and from 4 minutes and 22 seconds, the terminal needs to obtain the decoded audio data corresponding to the audio file being played at present all the time. For video playing, the video playing process is a process of decoding a video file into original data and then displaying the original data, and an interception start time point is used as a mark to obtain the video file currently being played, because the video file is decoded into decoded video data by a software decoder or a hardware decoder, according to the corresponding relation before and after decoding, the corresponding decoded video data can be correspondingly found from the video file currently being played, the decoded video data is usually in an original data format, consists of three components of Y (brightness), U (chroma) and V (chroma), and is usually used in the field of video compression, and the commonly used YUV decoded video data can be 420.

In the embodiment of the present invention, after receiving an audio/video capture instruction including a capture start time point, a terminal starts a process of acquiring decoded audio data and a process of acquiring decoded video data from the capture start time point, and when a play time of the capture end time point is not reached, the terminal needs to continuously execute the process of acquiring decoded audio data and continuously execute the process of acquiring decoded video data. When the terminal receives the time shaft containing the interception ending time point, the terminal monitors the time shaft of the playing time, and when the time shaft is rotated to the interception ending time point, the terminal does not acquire the decoded audio data corresponding to the currently playing audio file any more and does not acquire the decoded video data corresponding to the currently playing video file any more. It can be understood that, in the present invention, the decoded audio data acquired by the terminal is the same as the audio file in the playing terminal before and after the playing sequence, and in the present invention, the decoded video data acquired by the terminal is the same as the video file in the playing terminal before and after the playing sequence.

In some embodiments of the present invention, the step 102, according to the audio/video interception instruction, acquires decoded audio data corresponding to the currently playing audio file, and specifically may include the following steps:

a0, reading the decoded audio data from the memory of the current playing terminal according to the audio and video interception instruction.

Step 102, obtaining decoded video data corresponding to a video capture area in a video file currently being played according to a video capture instruction, specifically including the following steps:

a1, calculating the offset position between the video capture area and the playing interface of the current playing terminal;

a2, determining the coordinate mapping relation between the video capture area and the video image in the video file currently being played according to the calculated offset position;

and A3, reading the decoded video data corresponding to the video intercepting area from the frame buffer of the current playing terminal according to the coordinate mapping relation.

When the audio file is being played in the current playing terminal, the audio file is already decoded into decoded audio data through a software decoder or a hardware decoder, the terminal reads the decoded audio data from the memory, and then the terminal outputs the read decoded audio data to the sound equipment of the playing terminal for playing.

The terminal obtains a video capturing area defined by the user in the playing interface according to the adjustment condition of the video capturing frame by the user, so that the terminal can determine which part or all of the video pictures in the playing interface the user needs to capture. Referring to fig. 2, which is a schematic diagram illustrating an obtaining manner of a video capture area in an embodiment of the present invention, in fig. 2, an area a is a full screen area of a terminal, areas B to C are video playing areas, an area B is a playing interface, and an area C is a video capture area defined by a user. Of course, the position and area size of the area C may be adjusted by the user dragging the video capture box.

After the video capture area defined by the user is determined, step a1 is executed, and the terminal calculates the offset position between the video capture area and the playing interface of the current playing terminal, that is, the playing interface of the terminal is a rectangular frame, the video capture area is a rectangular frame, and the offset positions of the four corners of the video capture area relative to the four corners of the playing interface of the current playing terminal need to be calculated, so that the offset position between the video capture area and the playing interface of the current playing terminal can be determined. As shown in fig. 2, when the video file is played on the display screen, the video file may be played on the full screen, as shown in the area a in fig. 2, or may be played on a non-full screen, as shown in the area B in fig. 2. Any one of the regions B to a may be used. In any area, the user can draw a square area in the video playing area to be used as the video capturing area to be captured, and the offset positions of the defined area relative to the four corners of the video playing area can be calculated according to the pixel position relation.

After the offset position of the video capture area relative to the video playing interface is obtained, step a2 is executed, and according to the calculated offset position, the coordinate mapping relationship between the video capture area and the video image in the video file currently being played is determined. That is, the offset position of the video capture area calculated in step a1 with respect to the video playing interface, and there is a scaling relationship between the video playing interface and the original video image, and if the video playing interface is the same as the original video image, it is a one-to-one equal proportion, and if the user is operating the terminal, it is also possible that the original video image is enlarged or reduced to be displayed as the current video playing interface, and then the offset position of the calculated video capture area with respect to the video playing interface needs to be remapped to obtain the coordinate mapping relationship between the video capture area and the video image in the video file currently being played. For example, as shown in fig. 2, for the original video image coordinate mapping, since the areas B to C are uncertain, that is, the size of the video playing area is not necessarily equal to the size of the original video image, after the offset position is completed, the coordinate mapping relationship of the offset position in the original video image also needs to be calculated.

In some embodiments of the present invention, step a3 reads the decoded video data corresponding to the video capture area from the frame buffer of the current playing terminal according to the coordinate mapping relationship. Wherein, when the video file is being played in the current playing terminal, the video file has been decoded into decoded video data by a software decoder or a hardware decoder, the terminal reads the decoded video data from the frame buffer, then the terminal outputs the read decoded video data to a display screen to be displayed as a playing interface, the decoded video data corresponding to the video file being played at each playing time can be acquired in real time from the interception starting time point by means of the decoded video data stored in the frame buffer memory, after the decoded video data corresponding to the video file being played is acquired, carrying out proportion transformation according to the coordinate mapping relation to obtain decoded video data corresponding to the video intercepting area, and the decoded video data outside the video intercepting area in the playing interface is not in the range of the acquired decoded video.

It should be noted that, in some embodiments of the present invention, the terminal may further have another implementation manner for obtaining the decoded audio data corresponding to the currently playing audio file, for example, first obtaining a source file corresponding to the currently playing audio file, then re-decoding the source file, so as to generate decoded audio data, and according to this manner, also obtain the decoded audio data.

In some embodiments of the present invention, if the audio/video capture instruction further includes a target sampling frequency selected by the user and a target resolution selected by the user, before the step 103 starts from the capture end time point and respectively performs file format coding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction, the method for capturing an audio/video fragment provided by the present invention may further include the following steps:

b1, judging whether the original sampling frequency of the audio file corresponding to the acquired decoded audio data is the same as the target sampling frequency, and judging whether the original resolution and the target resolution of the video image in the video file corresponding to the acquired decoded video data are the same;

b2, if the original sampling frequency is different from the target sampling frequency, converting the sampling frequency of the audio file corresponding to the acquired decoded audio data to obtain the acquired decoded audio data containing the target sampling frequency; and if the original resolution is different from the target resolution, converting the resolution of the video image in the video file corresponding to the acquired decoded video data to obtain the acquired decoded video data containing the target resolution.

After the decoded audio data, the decoded video data, and the decoded video data are acquired in step 102, if the audio/video capture instruction received by the terminal further includes a target sampling frequency and a target resolution selected by the user, it indicates that the user needs to specify the sampling frequency of the captured audio clip and the sampling frequency of the captured video clip. Taking the example of sampling frequency conversion, the terminal may first obtain an original sampling frequency of an audio file from audio file header information of the audio file, where the original sampling frequency of the audio file is a sampling frequency adopted when the audio file played in a playing device of the terminal is played, if the original sampling frequency of the audio file needs to be adjusted by a user, a sampling frequency adjustment menu may be displayed on a display screen of the terminal, the user specifies a sampling frequency of an intercepted audio clip (i.e., a target sampling frequency carried in an audio/video interception instruction), after the original sampling frequency of the audio file is obtained, whether the target sampling frequency is the same as the original sampling frequency is determined, if the target sampling frequency is the same as the original sampling frequency, no sampling frequency conversion is required, if the target sampling frequency is not the same as the original sampling frequency, the sampling frequency conversion is required, specifically, a third party library (e.g., ffmpeg) may be called to implement conversion of the sampling frequency, so as to obtain the obtained decoded audio data including the target sampling frequency, and what is described here as the obtained decoded audio data including the target sampling frequency is subjected to file format encoding in the subsequent step 103, that is, the obtained decoded audio data in the step 103 is specifically the obtained decoded audio data including the target sampling frequency.

In some embodiments of the present invention, if the audio/video capture instruction further includes a target audio format selected by the user and a target video format selected by the user, before the step 103 starts from the capture end time point and separately performs file format coding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction, the method for capturing an audio/video fragment provided by the present invention may further include the following steps:

c1, judging whether the original audio format of the audio file corresponding to the obtained decoded audio data is the same as the target audio format, and judging whether the original video format of the video file corresponding to the obtained decoded video data is the same as the target video format;

c2, if the original audio format is different from the target audio format, converting the audio format of the audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target audio format; and if the original video format is different from the target video format, converting the video format of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video format.

Wherein, after the decoded audio data and the decoded video data are obtained in step 102, if the audio/video interception instruction received by the terminal further includes a target audio format and a target video format, it indicates that the user needs to specify the audio format and the video format of the intercepted audio clip, taking the conversion of the audio format as an example, the terminal may first obtain the original audio format of the audio file from the audio file header information of the audio file, where the original audio format of the audio file is the audio format when the audio file played in the playing device of the terminal is played, and if the user needs to adjust the original audio format of the audio file, an audio format adjustment menu may be displayed on the display screen of the terminal, and the user specifies the audio format of the intercepted audio clip (i.e. the target audio format carried in the intercepted audio/video instruction), after obtaining the original audio format of the audio file, judging whether the target audio format is the same as the original audio format, if the target audio format is the same as the original audio format, converting the audio format is not required, if the target audio format is not the same as the original audio format, converting the audio format, specifically, calling a third party library (for example, ffmpeg) to convert the audio format to obtain the obtained decoded audio data containing the target audio format, and then performing file format encoding in the subsequent step 103 is the obtained decoded audio data containing the target audio format, that is, the obtained decoded audio data in the step 103 is specifically the obtained decoded audio data containing the target audio format.

In some embodiments of the present invention, if the audio/video capture instruction further includes a target channel number selected by the user, step 103 may further include, from the capture end time point, before the file format coding is respectively performed on the obtained decoded audio data and the obtained decoded video data according to the audio/video capture instruction, the method for capturing an audio/video fragment provided by the present invention:

d1, judging whether the original channel number of the audio file corresponding to the acquired decoded audio data is the same as the target channel number;

and D2, if the original number of channels is different from the target number of channels, adjusting the number of channels of the audio file corresponding to the acquired decoded audio data to obtain the acquired decoded audio data containing the target number of channels.

Wherein, after the decoded audio data and the decoded video data are obtained in step 102, if the audio/video capturing instruction received by the terminal further includes the target number of channels, it indicates that the user needs to specify the number of channels of the captured audio segment, the terminal may first obtain the original number of channels of the audio file from the audio file header information of the audio file, the original number of channels of the audio file is the number of channels displayed when the audio file played in the display screen of the terminal is played, if the original number of channels of the audio file needs to be adjusted by the user, a channel number adjustment menu may be displayed on the display screen of the terminal, the user specifies the number of channels of the captured audio segment (i.e. the target number of channels carried in the audio/video capturing instruction), after obtaining the original number of channels of the audio file, it is determined whether the target number of channels is the same as the original number of channels, if the target number of channels is the same as the original number of channels, no further adjustment of the number of channels is necessary. If the target channel number is different from the original channel number, the channel number needs to be adjusted, for example, a mono channel is converted into a binaural channel. Specifically, a third party library (e.g., ffmpeg) may be called to implement conversion of the number of channels, so as to obtain the obtained decoded audio data including the target number of channels, and then the subsequent step 103 of performing file format encoding is the obtained decoded audio data including the target number of channels, which is described herein, that is, the obtained decoded audio data in step 103 is specifically the obtained decoded audio data including the target number of channels.

In some embodiments of the present invention, if the audio/video capture instruction further includes a target channel arrangement selected by the user and a target video quality selected by the user, before the step 103 starts from the capture end time point and separately performs file format coding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction, the method for capturing an audio/video fragment provided by the present invention may further include the following steps:

e1, judging whether the original sound track arrangement and the target sound track arrangement of the audio file corresponding to the obtained decoded audio data are the same, and judging whether the original video quality and the target video quality of the video file corresponding to the obtained decoded video data are the same;

e2, if the original sound track arrangement is different from the target sound track arrangement, converting the sound track arrangement of the audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sound track arrangement, and if the original video quality is different from the target video quality, adjusting the video quality of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video quality.

Wherein, after the decoded audio data and the decoded video data are obtained in step 102, if the audio/video capturing instruction received by the terminal further includes a target sound track arrangement and a target video quality selected by the user, it indicates that the user needs to specify the sound track arrangement of the captured audio segment, the terminal may first obtain an original sound track arrangement of the audio file from audio file header information of the audio file, the original sound track arrangement of the audio file is a sound track arrangement adopted when the audio file played in a playing device of the terminal is played, if the user needs to adjust the original sound track arrangement of the audio file, a sound track arrangement adjustment menu may be displayed on a display screen of the terminal, the user specifies the sound track arrangement of the captured audio segment (i.e. the target sound track arrangement carried in the audio/video capturing instruction), after obtaining the original sound track arrangement of the audio file, it is determined whether the target sound track arrangement is the same as the original sound track arrangement, if the target channel arrangement is the same as the original channel arrangement, the channel arrangement does not need to be converted, and if the target channel arrangement is different from the original channel arrangement, the channel arrangement needs to be converted, for example, one binaural channel is converted into 2 monaural channels, or one 5.1 channel is converted into 6 monaural channels, specifically, a third party library (e.g., ffmpeg) may be called to implement the conversion of the channel arrangement, so as to obtain the obtained decoded audio data including the target channel arrangement, and then the file format coding performed in the subsequent step 103 is the obtained decoded audio data including the target channel arrangement, that is, the obtained decoded audio data in the step 103 is specifically the obtained decoded audio data including the target channel arrangement.

If the audio/video capture instruction received by the terminal also comprises target video quality, it indicates that the user needs to specify the video quality of the captured audio/video clip, the terminal can firstly obtain the original video quality of the video image from the file header information of the video file, the original video quality of the video image in the video file is the video quality displayed when the video file played in the display screen of the terminal is played, if the user needs to adjust the original video quality of the video image in the video file, a video quality adjustment menu can be displayed on the display screen of the terminal, the user specifies the video quality of the captured video clip (namely the target video quality carried in the video capture instruction), after obtaining the original video quality of the video image in the video file, it is judged whether the target video quality is the same as the original video quality, if the target video quality is the same as the original video quality, no further adjustment of the video quality is necessary.

It should be noted that, in some embodiments of the present invention, if the currently playing audio file is an audio file that is played synchronously with the currently playing video image (i.e. audio that is played along with the video image), in this case, the currently playing audio file and the video image have a certain constraint relationship, so that the original channel arrangement of the currently playing audio file may also be obtained from the audio file header information of the currently playing video image, specifically, the video image has a certain constraint relationship with the channel arrangement of the audio file, for example, the video image in the mkv format may accommodate audio content in more than 2 channels, and the video image in the mp4 format may accommodate audio content in 2 channels.

In some embodiments of the present invention, if the audio/video capture instruction further includes a target floating point format selected by the user and a target video frame rate selected by the user, before the step 103 starts from the capture end time point and performs file format coding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction, the method for capturing an audio/video fragment provided by the present invention may further include the following steps:

f1, judging whether the original floating point format and the target floating point format of the audio file corresponding to the obtained decoded audio data are the same; judging whether the original video frame rate of the video file corresponding to the obtained decoded video data is the same as the target video frame rate;

and F2, if the original floating point format is different from the target floating point format, converting the floating point format of the audio file corresponding to the acquired decoded audio data to obtain the acquired decoded audio data containing the target floating point format. And F2, if the original video frame rate is different from the target video frame rate, converting the video frame rate of the video file corresponding to the acquired decoded video data to obtain the acquired decoded video data containing the target video frame rate.

Wherein, after the decoded audio data and the decoded video data are obtained in step 102, if the audio/video intercepting instruction received by the terminal further includes a target floating point format and a target video frame rate selected by the user, it indicates that the user needs to specify the floating point format and the video frame rate of the intercepted audio clip, and by taking the conversion of the floating point format as an example, the terminal may first obtain an original floating point format of the audio file from audio file header information of the audio file, where the original floating point format of the audio file is a floating point format adopted when the audio file played in a playing device of the terminal is played, and if the original floating point format of the audio file needs to be adjusted by the user, a floating point format adjustment menu may be displayed on a display screen of the terminal, and the user specifies the floating point format of the intercepted audio clip (i.e., the target floating point format carried in the intercepted audio/video instruction), after obtaining the original floating point format of the, judging whether the target floating point format is the same as the original floating point format, if the target floating point format is the same as the original floating point format, then no conversion of the floating point format is needed, if the target floating point format is not the same as the original floating point format, then conversion of the floating point format is needed, specifically, a third party library (for example, ffmpeg) may be called to implement conversion of the floating point format, and the obtained decoded audio data including the target floating point format is obtained, and then the file format coding performed in the subsequent step 103 is the obtained decoded audio data including the target floating point format described here, that is, the obtained decoded audio data in the step 103 is specifically the obtained decoded audio data including the target floating point format.

If the audio/video capturing instruction received by the terminal further includes a target video frame rate, it indicates that the user needs to specify the video frame rate of the captured audio/video segment, the terminal may first obtain an original video frame rate of the video image from file header information of the video file, where the original video frame rate of the video image in the video file is a video frame rate displayed when the video file played in a display screen of the terminal is played, and if the user needs to adjust the original video frame rate of the video image in the video file, a video frame rate adjustment menu may be displayed on the display screen of the terminal, and the user specifies the video frame rate of the captured video segment (i.e., the target video frame rate carried in the video capturing instruction), and after obtaining the original video frame rate of the video image in the video file, determines whether the target video frame rate is the same as the original video frame rate, and if the target video frame rate is the same as the original video frame, the video frame rate does not need to be converted, and if the target video frame rate is different from the original video frame rate, the video frame rate needs to be converted.

103. And respectively carrying out file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction from the interception ending time point to generate an audio fragment and a video fragment, and synthesizing the audio fragment and the video fragment to obtain the audio and video fragment.

In the embodiment of the present invention, in step 102, a plurality of decoded audio data and a plurality of decoded video data from the interception start time point to the interception end time point are obtained, when the interception end time point arrives, the terminal stops obtaining the decoded audio data, and at the same time stops obtaining the decoded video data, starting from the interception end time point, the terminal already obtains the decoded audio data corresponding to the audio file to be intercepted, obtains the decoded video data, then separately packages and encapsulates the obtained decoded audio data and decoded video data, obtains an audio segment and a video segment, and then synthesizes the audio segment and the video segment, so as to obtain the audio and video segment. The decoded audio data and the decoded video data obtained in step 102 are packaged into a file form, that is, the obtained decoded audio data and the decoded video data can be respectively subjected to file format coding, so that an audio/video clip that a user needs to intercept can be obtained, and the generated audio/video clip is obtained from an audio file and a video file played in a playing interface of the terminal.

104. And outputting the audio and video clips according to the target purpose.

That is to say, after the audio and video clips are acquired, the audio and video clips can be output to a specific purpose application according to the needs of the user, for example, the user archives the intercepted audio and video clips, or the archived audio and video clips are shared in a QQ space or a WeChat, and the target purpose indicates the specific purpose of the audio clip which the user needs to output, so that the audio and video clips acquired by the audio and video interception in the invention can meet the requirements of the user on the target purpose.

In some embodiments of the present invention, if the audio/video capture instruction further includes a target audio file format selected by the user and a target video file format selected by the user, step 103 respectively performs file format encoding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction from the capture end time point, which may specifically include the following steps:

g1, using a file synthesizer to encode the obtained decoded audio data into an audio clip meeting the target audio file format, and carrying audio file header information in the audio clip, wherein the audio file header information comprises: attribute information of the audio clip;

g2, encoding the obtained decoded video data into a video clip meeting the target video file format by using a file synthesizer, and carrying video file header information in the video clip, wherein the video file header information comprises: attribute information of the video clip.

Wherein, after the decoded audio data and the decoded video data are acquired in step 102, if the audio/video interception instruction received by the terminal further includes a target audio file format and a target video format, it means that the user needs to specify the file format of the captured audio/video clip, after the decoded audio data and the decoded video data are obtained in step 102, the obtained decoded audio data may be specifically encoded into an audio clip satisfying the target audio file format using a file synthesizer, specifically, a third-party library (e.g., ffmpeg) may be called to implement file format conversion, so as to obtain an audio clip satisfying a target audio file format, and when a file synthesizer is used, audio file header information is carried in the generated audio clip, where the audio file header information carries basic feature information of the audio clip, and the audio file header information includes: attribute information of the audio piece. For another example, a file synthesizer may be used to encode the obtained decoded audio data into an audio segment that satisfies the target file format, specifically, a third party library (e.g., ffmpeg) may be called to implement conversion of the file format, so as to obtain an audio segment that satisfies the target file format, when the file synthesizer is used, file header information is carried in the generated audio segment, where the file header information carries basic feature information of the audio segment, and for example, the file header information includes: attribute information of the audio piece.

As can be seen from the above description of the embodiments of the present invention, when a user sends an audio/video capture command through a current playing terminal, the user first receives an audio/video capture instruction, where the audio/video capture instruction includes: intercepting a starting time point, an intercepting ending time point, a video intercepting area defined by a user and a target use selected by the user, after a playing interface in the terminal starts to play an audio file and a video file, when the playing time reaches the intercepting starting time point, acquiring decoded audio data corresponding to the currently playing audio file, acquiring decoded video data corresponding to the video intercepting area in the currently playing video file, before the intercepting ending time point is not reached, continuously acquiring decoded audio data corresponding to the currently playing audio file, continuously acquiring decoded video data corresponding to the video intercepting area in the currently playing video file, acquiring a plurality of decoded audio data and a plurality of decoded video data according to an audio and video intercepting instruction, after the intercepting ending time point is reached, and respectively carrying out file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction, so that an audio fragment and a video fragment can be generated, the audio and video fragment can be obtained by synthesizing the audio fragment and the video fragment, and finally the audio and video fragment can be output according to the target purpose. In the embodiment of the invention, the audio segment to be intercepted is obtained by acquiring the decoded audio data corresponding to the audio file being played and then carrying out the file format coding on the decoded audio data, rather than obtaining the audio segment by capturing a plurality of audios to combine, similarly, the video segment to be intercepted is obtained by acquiring the decoded video data corresponding to the video file being played and then carrying out the file format coding on the decoded video data, rather than obtaining the video segment by capturing a plurality of video images to combine. In the embodiment of the invention, even if the audio and video clips with large time span need to be intercepted, the user only needs to set the intercepting starting time point and the intercepting ending time point, and the intercepting processing efficiency of the audio and video clips is also very high.

In order to better understand and implement the above-mentioned schemes of the embodiments of the present invention, the following description specifically illustrates corresponding application scenarios.

Taking the example that the user uses the QQ browser to play audio and synchronously play video as an example, when the user encounters favorite audio and video, the user can select to intercept all audio clips or part of audio clips in the whole audio file, and at the same time, needs to intercept video clips corresponding to the audio clips, and then stores the video clips locally or shares the video clips with friends. Please refer to fig. 3, which is a schematic diagram of an interception process of audio/video clips according to the present invention.

S1 calculation of offset position of video capture area

The playing device of the terminal plays audio files and video files, such as songs played by a sound system, and the screen synchronously displays videos corresponding to the songs. When the video file is played on the display screen of the terminal, the video file may be played on a full screen, as shown in an area a in fig. 2, or may be played on a non-full screen, as shown in an area B in fig. 2. Any one of the regions B to a may be used. Regardless of the area, the user can draw a square area in the video playing area to be used as the video capturing area to be captured, and first needs to calculate the offset positions of the four corners of the drawn area relative to the video playing area.

S2 coordinate mapping of original video image

Since the areas B to C are uncertain, that is, the size of the video playing area is not necessarily equal to the size of the original video image, after the offset position is completed, the coordinate mapping relationship of the offset position in the original video image needs to be calculated.

After S1 and S2 are completed, the following menu selections of P1, P2 and P3 are made, the following menu selections of P1, P2 and P3 are firstly provided for the user, and a menu needs to be provided on the display screen of the terminal to allow the user to select, specifically, the following menu is included:

p1, use selection: and determining whether the intercepted audio and video clips are only archived files or are shared after archiving.

P2, configuration selection: sampling frequency, audio format, number of channels, audio file format, sound channel arrangement, sampling point format, audio and video interception duration (namely interception starting time point and interception ending time point), resolution, video format, video quality, video file format and video frame rate.

P3, mode selection: and determining whether a single audio-video clip or a plurality of audio-video clips need to be intercepted.

S3, processing of decoded audio data and decoded video data

When the user clicks on the selection menu, the processing is started from the current time point by default. The audio playing process is a process of decoding an audio file into original data and playing the original data, and the original data is usually in a PCM format. The audio segment is synthesized from the original data, so that the link of re-decoding the source file can be saved, the processor resource of the terminal can be saved, and the electric quantity of the terminal can be saved.

When the user performs the area demarcating operation of S1, the processing is started from the current time point by default. The process of video playing is a process of decoding a video file into original data and then displaying the original data, and the original data is usually in YUV420 format. The video segment is synthesized from the original data, so that the link of re-decoding the source file can be saved, the processor resource of the terminal can be saved, and the electric quantity of the terminal can be saved.

As shown in fig. 4, a schematic view of a processing flow of decoded audio data and decoded video data provided in an embodiment of the present invention is provided, where the process may specifically include the following steps:

and m1, acquiring the target sampling frequency, the target audio format and the target channel number selected by the user, the target audio file format, the target channel arrangement, the intercepted audio length, the target sampling point format, the target resolution, the target video format, the target video quality, the target video file format and the target video frame rate from the audio and video interception instruction. According to different specific configurations, the following two different processes are divided, such as Q1 and Q2, which are described separately below.

Q1, for audio processing, when the following conditions are satisfied: the target sampling frequency is the same as the original sampling frequency, the target sound channel arrangement is the same as the original sound channel arrangement, the target audio format is the same as the original audio format (namely, the target encoder and the original decoder adopt the same compressed audio protocol), the number of the target sound channels is the same as the number of the original sound channels, and the target sampling point format and the original sampling point format are adopted. When the conditions are met, a Q1 judgment process can be selected, file format coding is respectively carried out on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction, an audio segment intercepted from the audio file is generated, and the process is equivalent to a copy mode. The Q1 process does not require decompressing the audio file, but merely repackages the decoded audio data into a new file format.

In Q1, for a video, when the following condition is satisfied: the target resolution is the same as the original resolution, the target video frame rate is the same as the original frame rate, the target video format is the same as the original video format (namely, the target encoder and the original decoder adopt the same compressed video protocol), and the target video quality is the same as the original video quality. When the conditions are met, a Q1 judgment process can be selected, file format coding is carried out on the obtained decoded video data according to a video interception instruction, and a video segment intercepted from a video file is generated, wherein the process is equivalent to a copy mode. The Q1 process does not require decompressing the video file but merely repackages the decoded video data into a new file format.

Specifically, the flow under the Q1 process is as follows:

step m3, according to the target audio file format, opening the file synthesizer, and generating audio file header information, where the audio file header information contains some basic features of the audio segment, such as the attribute of the audio segment, and the audio coding format used. And opening a file synthesizer according to the target file format, and generating file header information, wherein the file header information contains some basic characteristics of the video clip, such as the attribute of the video clip, and the adopted video coding format.

And m7, calling a file synthesizer, and carrying out file format coding on the decoded audio data according to a rule to obtain an audio clip, wherein the rule refers to that if the target audio file format selected by the user is an mp3 file, the finally coded audio clip is generated according to the audio organization mode of the mp3 file. And calling a file synthesizer, and carrying out file format coding on the coded video data according to a rule to obtain a video clip, wherein the rule refers to that if the target file format selected by a user is an mp4 file, the finally coded video clip is generated according to the video organization mode of the mp4 file.

Q2 for audio, any one of the conditions of Q1 is not satisfied, that is, at least one of the following conditions is satisfied: the target sampling frequency is different from the original sampling frequency, the target sound channel arrangement is different from the original frame rate, the target audio format is different from the original audio format (namely, the target encoder and the original decoder adopt different compressed audio protocols), the target sound channel number is different from the original sound channel number, the target sampling point format is different from the original sampling point format, and a Q2 judgment process is executed.

For video, any one of the conditions of Q1 is not satisfied, that is, at least one of the following conditions is satisfied: the target resolution is different from the original resolution, the target video frame rate is different from the original frame rate, the target video format is different from the original video format (namely, the target encoder and the original decoder adopt different compressed video protocols), the target video quality is different from the original video quality, and a Q2 judgment process is executed.

Specifically, the flow under the Q2 process is as follows:

step m2, according to the audio format to be coded, opening the coder; the encoder is turned on according to the video format that needs to be encoded.

And step m3, opening a file synthesizer according to the audio file format and generating audio file header information. And opening a file synthesizer according to the video file format and generating video file header information.

Step m4, the decoded audio data is obtained from the decoding step of the current playing process. And obtaining decoded video data from a decoding link in the current playing process.

And step m5, determining whether to perform frequency conversion, channel number conversion, channel arrangement conversion and sampling point format conversion according to the information obtained in the step m1, and if the Q2 is different from the audio parameters, performing conversion according to the requirements of the user. And determining whether to perform scaling processing according to the information obtained in the step m1, for example, a user defines a video capture area, compares the current player range of the video capture area to obtain a proportional relation, and calculates a size by combining the proportional relation with the original resolution, wherein if the size is different from the target resolution, scaling processing is required to be performed, so that the resolution of the output video segment meets the requirement. If not, no scaling is required.

Step m6, calling an encoder to encode the decoded audio data in an audio format according to a target audio format; and calling an encoder to encode the video format of the encoded video data according to the target video format.

And step m7, calling a file synthesizer, and coding the decoded audio data according to the target audio file format to generate an audio fragment. And calling a file synthesizer, coding the coded video data according to a target file format to generate a video segment, and synthesizing the audio segment and the video segment to obtain the audio and video file.

It should be noted that, in the present invention, the flow of processing the decoded audio data is synchronized with the playing process of the audio file, and if a plurality of audio segments are synthesized, the above Q1 or Q2 process is repeated. The flow of processing the coded video data in the invention is synchronous with the playing process of the video file, and if a plurality of video segments are synthesized, the Q1 or Q2 process is repeated.

S3, outputting of audio and video clips

And after the audio and video clips are synthesized, the user is prompted to succeed. According to the selection mode of the P1, if the file is archived, a third-party application is called to open an audio and video folder. If the sharing is carried out, the third-party application is called to carry out sharing, and the sharing can be carried out in a mode such as but not limited to WeChat and QQ.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.

Referring to fig. 4-a, an apparatus 500 for intercepting an audio/video clip according to an embodiment of the present invention may include: a receiving module 501, a decoded data obtaining module 502, a file encoding module 503, and an audio/video segment output module 504, wherein,

the receiving module 501 is configured to receive an audio/video interception instruction sent by a user through a current playing terminal, where the audio/video interception instruction includes: the user determines the interception start time point and the interception end time point of the audio and video to be intercepted, the video interception area defined by the user in the playing interface of the current playing terminal and the target use selected by the user;

a decoded data obtaining module 502, configured to obtain, according to the audio/video capture instruction, decoded audio data corresponding to the currently playing audio file from the time when the playing time is the capture start time point, and obtain decoded video data corresponding to the video capture area in the currently playing video file at the same time, until the playing time is the capture end time point, stop obtaining decoded audio data corresponding to the currently playing audio file, and stop obtaining decoded video data corresponding to the video capture area in the currently playing video file at the same time;

a file encoding module 503, configured to perform file format encoding on the obtained decoded audio data and the decoded video data respectively according to the audio/video interception instruction from the interception end time point, generate an audio segment and a video segment, and synthesize the audio segment and the video segment to obtain an audio/video segment;

an audio/video clip output module 504, configured to output the audio/video clip according to the target application

In some embodiments of the present invention, the decoded data obtaining module 502 is specifically configured to read the decoded audio data from a memory of a current playing terminal according to the audio/video capture instruction;

the decoded data obtaining module 502 is configured to calculate an offset position between the video capture area and a playing interface of the current playing terminal; determining the coordinate mapping relation between the video capture area and the video image in the video file which is currently played according to the calculated offset position; and reading decoded video data corresponding to the video intercepting area from the frame buffer of the current playing terminal according to the coordinate mapping relation.

In some embodiments of the present invention, if the audio/video capture instruction further includes a target audio file format selected by a user and a target video file format selected by the user, the file encoding module 503 is specifically configured to encode the obtained decoded audio data into an audio segment meeting the target audio file format by using a file synthesizer, and the audio segment carries audio file header information, where the audio file header information includes: attribute information of the audio clip; using a file synthesizer to encode the obtained decoded video data into a video segment meeting the target video file format, and carrying video file header information in the video segment, wherein the video file header information comprises: attribute information of the video clip.

In some embodiments of the present invention, please refer to fig. 5-b, if the audio/video capture instruction further includes a target sampling frequency selected by a user and a target resolution selected by the user, the device 500 for capturing audio/video clips further includes: a sampling frequency coordination module 505 and a resolution coordination module 506, wherein,

the sampling frequency coordination module 505 is configured to, from the interception end time point, before the file encoding module performs file format encoding on the obtained decoded audio data and the obtained decoded video data according to the audio/video interception instruction, determine whether an original sampling frequency of an audio file corresponding to the obtained decoded audio data is the same as the target sampling frequency; if the original sampling frequency is different from the target sampling frequency, converting the sampling frequency of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sampling frequency;

the resolution coordination module 506 is configured to, by the file encoding module, determine whether the original resolution of the video image in the video file corresponding to the obtained decoded video data is the same as the target resolution before the file format encoding is performed on the obtained decoded video data according to the video capture instruction from the capture end time point; and if the original resolution is different from the target resolution, converting the resolution of the video image in the video file corresponding to the acquired decoded video data to obtain the acquired decoded video data containing the target resolution.

In some embodiments of the present invention, please refer to fig. 5-c, if the audio/video capture instruction further includes a target audio format selected by the user and a target video format selected by the user, the device 500 for capturing audio/video clips further includes: a format coordination module 507, configured to, starting from the interception end time point, before the file coding module 503 performs file format coding on the obtained decoded audio data and the decoded video data according to the audio/video interception instruction, determine whether an original audio format of an audio file corresponding to the obtained decoded audio data is the same as the target audio format, and determine whether an original video format of a video file corresponding to the obtained decoded video data is the same as the target video format; if the original audio format is different from the target audio format, converting the audio format of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target audio format; and if the original video format is different from the target video format, converting the video format of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video format.

In some embodiments of the present invention, please refer to fig. 5-d, where the audio/video capture instruction further includes a target channel number selected by a user, the device 500 for capturing audio/video clips further includes: a channel number coordination module 508, configured to judge, by the file encoding module 503, from the interception end time point, whether the original channel number of the audio file corresponding to the obtained decoded audio data is the same as the target channel number before the obtained decoded audio data and the obtained decoded video data are respectively subjected to file format encoding according to the audio/video interception instruction; if the original channel number is different from the target channel number, adjusting the channel number of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target channel number.

In some embodiments of the present invention, please refer to fig. 5-e, where the audio/video capture instruction further includes a target channel arrangement selected by the user and a target video quality selected by the user, the device 500 for capturing audio/video clips further includes: a channel arrangement coordination module 509 and a video quality coordination module 510, wherein,

the sound channel arrangement coordination module 509 is configured to, from the interception end time point, before the file encoding module performs file format encoding on the obtained decoded audio data and the obtained decoded video data according to the audio/video interception instruction, determine whether an original sound channel arrangement of an audio file corresponding to the obtained decoded audio data is the same as the target sound channel arrangement; if the original sound channel arrangement is different from the target sound channel arrangement, converting the sound channel arrangement of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sound channel arrangement;

the video quality coordination module 510 is configured to, starting from the capture ending time point, judge whether the original video quality of the video file corresponding to the obtained decoded video data is the same as the target video quality before the file coding module performs file format coding on the obtained decoded video data according to the video capture instruction; if the original video quality is different from the target video quality, adjusting the video quality of a video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video quality.

In some embodiments of the present invention, please refer to fig. 5-f, where the audio/video capture instruction further includes a target sampling point format selected by a user and a target video frame rate selected by the user, the device 500 for capturing an audio/video clip further includes: a sample format coordination module 511 and a video frame rate coordination module 512, wherein,

the sampling point format coordination module 511 is configured to, before the file encoding module respectively performs file format encoding on the obtained decoded audio data and the obtained decoded video data according to the audio/video interception instruction from the interception end time point, determine whether an original sampling point format of an audio file corresponding to the obtained decoded audio data is the same as the target sampling point format; if the original sample point format is different from the target sample point format, converting the sample point format of the audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sample point format;

the video frame rate coordination module 512 is configured to, starting from the capture ending time point, determine, by the file encoding module, whether an original video frame rate of a video file corresponding to the obtained decoded video data is the same as the target video frame rate before performing file format encoding on the obtained decoded video data according to the video capture instruction; if the original video frame rate is different from the target video frame rate, converting the video frame rate of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video frame rate.

As can be seen from the above description of the embodiment of the present invention, when a user sends an audio/video capture command through a current playing terminal, the user first receives an audio/video capture command, where the audio/video capture command includes: intercepting a starting time point, an intercepting ending time point, a video intercepting area defined by a user and a target use selected by the user, after a playing interface in the terminal starts to play an audio file and a video file, when the playing time reaches the intercepting starting time point, acquiring decoded audio data corresponding to the currently playing audio file, acquiring decoded video data corresponding to the video intercepting area in the currently playing video file, before the intercepting ending time point is not reached, continuously acquiring decoded audio data corresponding to the currently playing audio file, continuously acquiring decoded video data corresponding to the video intercepting area in the currently playing video file, acquiring a plurality of decoded audio data and a plurality of decoded video data according to an audio and video intercepting instruction, after the intercepting ending time point is reached, and respectively carrying out file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction, so that an audio fragment and a video fragment can be generated, the audio and video fragment can be obtained by synthesizing the audio fragment and the video fragment, and finally the audio and video fragment can be output according to the target purpose. In the embodiment of the invention, the audio segment to be intercepted is obtained by acquiring the decoded audio data corresponding to the audio file being played and then carrying out the file format coding on the decoded audio data, rather than obtaining the audio segment by capturing a plurality of audios to combine, similarly, the video segment to be intercepted is obtained by acquiring the decoded video data corresponding to the video file being played and then carrying out the file format coding on the decoded video data, rather than obtaining the video segment by capturing a plurality of video images to combine. In the embodiment of the invention, even if the audio and video clips with large time span need to be intercepted, the user only needs to set the intercepting starting time point and the intercepting ending time point, and the intercepting processing efficiency of the audio and video clips is also very high.

As shown in fig. 6, for convenience of description, only the parts related to the embodiment of the present invention are shown, and details of the specific technology are not disclosed, please refer to the method part of the embodiment of the present invention. The terminal may be any terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a POS (Point of sales), a vehicle-mounted computer, etc., taking the terminal as the mobile phone as an example:

fig. 6 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present invention. Referring to fig. 6, the handset includes: radio Frequency (RF) circuit 610, memory 620, input unit 630, display unit 640, sensor 650, audio circuit 660, wireless fidelity (WiFi) module 670, processor 680, and power supply 690. Those skilled in the art will appreciate that the handset configuration shown in fig. 6 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 6:

the RF circuit 610 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 680; in addition, the data for designing uplink is transmitted to the base station. In general, the RF circuit 610 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 610 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to global system for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.

The memory 620 may be used to store software programs and modules, and the processor 680 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 620. The memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 630 may include a touch panel 631 and other input devices 632. The touch panel 631, also referred to as a touch screen, may collect touch operations of a user (e.g., operations of the user on the touch panel 631 or near the touch panel 631 by using any suitable object or accessory such as a finger or a stylus) thereon or nearby, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 631 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 680, and can receive and execute commands sent by the processor 680. In addition, the touch panel 631 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 630 may include other input devices 632 in addition to the touch panel 631. In particular, other input devices 632 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 640 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The display unit 640 may include a display panel 641, and optionally, the display panel 641 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 631 can cover the display panel 641, and when the touch panel 631 detects a touch operation thereon or nearby, the touch panel is transmitted to the processor 680 to determine the type of the touch event, and then the processor 680 provides a corresponding visual output on the display panel 641 according to the type of the touch event. Although in fig. 6, the touch panel 631 and the display panel 641 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 631 and the display panel 641 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 650, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 641 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 641 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuit 660, speaker 661, and microphone 662 can provide an audio interface between a user and a cell phone. The audio circuit 660 may transmit the electrical signal converted from the received audio data to the speaker 661, and convert the electrical signal into an audio signal through the speaker 661 for output; on the other hand, the microphone 662 converts the collected sound signals into electrical signals, which are received by the audio circuit 660 and converted into audio data, which are processed by the audio data output processor 680 and then transmitted via the RF circuit 610 to, for example, another cellular phone, or output to the memory 620 for further processing.

WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 670, and provides wireless broadband Internet access for the user. Although fig. 6 shows the WiFi module 670, it is understood that it does not belong to the essential constitution of the handset, and can be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 680 is a control center of the mobile phone, and connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 620 and calling data stored in the memory 620, thereby performing overall monitoring of the mobile phone. Optionally, processor 680 may include one or more processing units; preferably, the processor 680 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 680.

The handset also includes a power supply 690 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 680 via a power management system, such that the power management system may be used to manage charging, discharging, and power consumption.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present invention, the processor 680 included in the terminal further has a flow of controlling and executing the above intercepting method of the audio/video clips executed by the terminal.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus necessary general hardware, and may also be implemented by special hardware including special integrated circuits, special CPUs, special memories, special components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, the implementation of a software program is a more preferable embodiment for the present invention. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

In summary, the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the above embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the above embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for intercepting audio and video clips is characterized by comprising the following steps:

monitoring a currently played audio file and a currently played video file in a playing screen of the terminal, acquiring decoded audio data corresponding to the currently played audio file according to the audio and video intercepting instruction from the beginning of the playing time as the intercepting starting time point based on the target use selected by the user, and acquiring decoded video data corresponding to the video intercepting area in the currently played video file at the same time until the playing time is the intercepting ending time point, stopping acquiring the decoded audio data corresponding to the currently played audio file, and stopping acquiring the decoded video data corresponding to the video intercepting area in the currently played video file at the same time;

outputting the audio and video clips according to the target purpose;

wherein, the acquiring of the decoded audio data corresponding to the currently played audio file according to the audio/video interception instruction includes:

reading the decoded audio data from the memory of the current playing terminal according to the audio and video interception instruction;

the acquiring the decoded video data corresponding to the video capture area in the video file currently being played according to the video capture instruction includes:

calculating the offset position between the video intercepting area and the playing interface of the current playing terminal;

determining the coordinate mapping relation between the video capture area and the video image in the video file which is currently played according to the calculated offset position;

and reading decoded video data corresponding to the video intercepting area from the frame buffer of the current playing terminal according to the coordinate mapping relation.

2. The method according to claim 1, wherein if the audio/video capture instruction further includes a target audio file format selected by a user and a target video file format selected by the user, respectively performing file format encoding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction from the capture end time point, the method includes:

using a file synthesizer to encode the obtained decoded audio data into an audio segment meeting the target audio file format, and carrying audio file header information in the audio segment, where the audio file header information includes: attribute information of the audio clip;

using a file synthesizer to encode the obtained decoded video data into a video segment meeting the target video file format, and carrying video file header information in the video segment, wherein the video file header information comprises: attribute information of the video clip.

3. The method according to any one of claims 1 to 2, wherein if the audio/video capture instruction further includes a target sampling frequency selected by a user and a target resolution selected by the user, before the audio/video capture instruction respectively performs file format encoding on the acquired decoded audio data and the decoded video data from the capture end time point, the method further includes:

judging whether the original sampling frequency of an audio file corresponding to the obtained decoded audio data is the same as the target sampling frequency or not, and judging whether the original resolution of a video image in a video file corresponding to the obtained decoded video data is the same as the target resolution or not;

if the original sampling frequency is different from the target sampling frequency, converting the sampling frequency of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sampling frequency; and if the original resolution is different from the target resolution, converting the resolution of the video image in the video file corresponding to the acquired decoded video data to obtain the acquired decoded video data containing the target resolution.

4. The method according to any one of claims 1 to 2, wherein if the audio/video capture instruction further includes a target audio format selected by a user and a target video format selected by the user, the method further includes, before performing file format encoding on the acquired decoded audio data and the decoded video data according to the audio/video capture instruction from the capture end time point, respectively:

judging whether the original audio format of the audio file corresponding to the obtained decoded audio data is the same as the target audio format, and judging whether the original video format of the video file corresponding to the obtained decoded video data is the same as the target video format;

if the original audio format is different from the target audio format, converting the audio format of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target audio format;

and if the original video format is different from the target video format, converting the video format of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video format.

5. The method according to any one of claims 1 to 2, wherein if the audio/video capture instruction further includes a target channel number selected by a user, before the capturing ends and respectively performing file format encoding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction, the method further includes:

judging whether the original channel number of the audio file corresponding to the obtained decoded audio data is the same as the target channel number;

if the original channel number is different from the target channel number, adjusting the channel number of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target channel number.

6. The method according to any one of claims 1 to 2, wherein if the audio/video capture instruction further includes a target channel arrangement selected by a user and a target video quality selected by the user, the method further includes, from the capture end time point, before performing file format encoding on the acquired decoded audio data and the decoded video data according to the audio/video capture instruction, respectively:

judging whether the original sound channel arrangement of the audio file corresponding to the obtained decoded audio data is the same as the target sound channel arrangement or not, and judging whether the original video quality of the video file corresponding to the obtained decoded video data is the same as the target video quality or not;

if the original sound channel arrangement is different from the target sound channel arrangement, converting the sound channel arrangement of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sound channel arrangement, and if the original video quality is different from the target video quality, adjusting the video quality of a video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video quality.

7. The method according to any one of claims 1 to 2, wherein if the audio/video capture instruction further includes a target sampling point format selected by a user and a target video frame rate selected by the user, the method further includes, before performing file format encoding on the obtained decoded audio data and the decoded video data according to the audio/video capture instruction from the capture end time point, the method further includes:

judging whether the original sample point format of the audio file corresponding to the obtained decoded audio data is the same as the target sample point format, and judging whether the original video frame rate of the video file corresponding to the obtained decoded video data is the same as the target video frame rate;

if the original sample point format is different from the target sample point format, converting the sample point format of the audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sample point format, and if the original video frame rate is different from the target video frame rate, converting the video frame rate of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video frame rate.

8. An apparatus for intercepting audio/video clips, comprising:

the decoded data acquisition module is used for monitoring a currently played audio file and a currently played video file in a playing screen of the terminal, acquiring decoded audio data corresponding to the currently played audio file according to the audio and video interception instruction from the moment that the playing time is the interception starting time point based on the target purpose selected by the user, and simultaneously acquiring decoded video data corresponding to the video interception area in the currently played video file until the playing time is the interception ending time point, stopping acquiring decoded audio data corresponding to the currently played audio file, and simultaneously stopping acquiring decoded video data corresponding to the video interception area in the currently played video file;

the audio and video clip output module is used for outputting the audio and video clips according to the target purpose;

the decoded data acquisition module is specifically used for reading the decoded audio data from the memory of the current playing terminal according to the audio and video interception instruction;

the decoded data acquisition module is used for calculating the offset position between the video interception area and the playing interface of the current playing terminal; determining the coordinate mapping relation between the video capture area and the video image in the video file which is currently played according to the calculated offset position; and reading decoded video data corresponding to the video intercepting area from the frame buffer of the current playing terminal according to the coordinate mapping relation.

9. The apparatus according to claim 8, wherein if the audio/video capture instruction further includes a target audio file format selected by a user and a target video file format selected by the user, the file encoding module is specifically configured to encode the obtained decoded audio data into an audio segment that satisfies the target audio file format using a file synthesizer, and carry audio file header information in the audio segment, where the audio file header information includes: attribute information of the audio clip; using a file synthesizer to encode the obtained decoded video data into a video segment meeting the target video file format, and carrying video file header information in the video segment, wherein the video file header information comprises: attribute information of the video clip.

10. The apparatus according to any one of claims 8 to 9, wherein if the av interception instruction further includes a target sampling frequency selected by a user and a target resolution selected by the user, the apparatus further includes: a sampling frequency coordination module and a resolution coordination module, wherein,

the sampling frequency coordination module is used for judging whether the original sampling frequency of an audio file corresponding to the obtained decoded audio data is the same as the target sampling frequency or not before the file coding module respectively performs file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction from the interception ending time point; if the original sampling frequency is different from the target sampling frequency, converting the sampling frequency of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sampling frequency;

the resolution coordination module is configured to, starting from the interception end time point, judge whether an original resolution of a video image in a video file corresponding to the obtained decoded video data is the same as the target resolution before the file encoding module performs file format encoding on the obtained decoded video data according to the video interception instruction; and if the original resolution is different from the target resolution, converting the resolution of the video image in the video file corresponding to the acquired decoded video data to obtain the acquired decoded video data containing the target resolution.

11. The apparatus according to any one of claims 8 to 9, wherein if the av interception instruction further includes a target audio format selected by a user and a target video format selected by the user, the apparatus further includes: a format coordination module, configured to, starting from the interception end time point, before the file coding module performs file format coding on the obtained decoded audio data and the decoded video data according to the audio/video interception instruction, determine whether an original audio format of an audio file corresponding to the obtained decoded audio data is the same as the target audio format, and determine whether an original video format of a video file corresponding to the obtained decoded video data is the same as the target video format; if the original audio format is different from the target audio format, converting the audio format of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target audio format; and if the original video format is different from the target video format, converting the video format of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video format.

12. The apparatus according to any one of claims 8 to 9, wherein if the audio-video capture instruction further includes a target channel number selected by a user, the apparatus further includes: the channel number coordination module is used for judging whether the original channel number of the audio file corresponding to the obtained decoded audio data is the same as the target channel number or not before the file coding module respectively performs file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction from the interception ending time point; if the original channel number is different from the target channel number, adjusting the channel number of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target channel number.

13. The apparatus according to any one of claims 8 to 9, wherein if the av interception instruction further includes a target channel arrangement selected by a user and a target video quality selected by the user, the apparatus further comprises: a sound track arrangement coordination module and a video quality coordination module, wherein,

the sound channel arrangement coordination module is used for judging whether the original sound channel arrangement of the audio file corresponding to the obtained decoded audio data is the same as the target sound channel arrangement or not before the file coding module respectively performs file format coding on the obtained decoded audio data and the decoded video data according to the audio and video interception instruction from the interception ending time point; if the original sound channel arrangement is different from the target sound channel arrangement, converting the sound channel arrangement of an audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sound channel arrangement;

the video quality coordination module is used for judging whether the original video quality of a video file corresponding to the obtained decoded video data is the same as the target video quality or not before the file coding module performs file format coding on the obtained decoded video data according to the video interception instruction from the interception ending time point; if the original video quality is different from the target video quality, adjusting the video quality of a video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video quality.

14. The apparatus according to any one of claims 8 to 9, wherein if the av interception instruction further includes a target sampling point format selected by a user and a target video frame rate selected by the user, the apparatus further includes: a sample point format coordination module and a video frame rate coordination module, wherein,

the sampling point format coordination module is used for judging whether the original sampling point format of the audio file corresponding to the obtained decoded audio data is the same as the target sampling point format or not before the file coding module respectively codes the file formats of the obtained decoded audio data and the decoded video data from the interception finishing time point according to the audio and video interception instruction; if the original sample point format is different from the target sample point format, converting the sample point format of the audio file corresponding to the obtained decoded audio data to obtain the obtained decoded audio data containing the target sample point format;

the video frame rate coordination module is configured to, starting from the capture ending time point, judge whether an original video frame rate of a video file corresponding to the obtained decoded video data is the same as the target video frame rate before the file coding module performs file format coding on the obtained decoded video data according to the video capture instruction; if the original video frame rate is different from the target video frame rate, converting the video frame rate of the video file corresponding to the obtained decoded video data to obtain the obtained decoded video data containing the target video frame rate.

15. A storage medium for storing instructions which, when executed by a computer device, perform a method of intercepting audio-visual clips as claimed in any one of claims 1 to 7.