CN112885369A

CN112885369A - Audio processing method and audio processing device

Info

Publication number: CN112885369A
Application number: CN202110106764.9A
Authority: CN
Inventors: 张文文
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-06-01
Anticipated expiration: 2041-01-26
Also published as: CN112885369B

Abstract

The application discloses an audio processing method, an audio processing device, electronic equipment and a readable storage medium, and belongs to the field of computers. Analyzing the first audio data, obtaining at least one piece of sub audio data, generating a graph corresponding to the at least one piece of sub audio data according to attribute information of the at least one piece of sub audio data, displaying the graph according to the arrangement of the at least one piece of sub audio data in the playing time period of the first audio data on a playing time axis corresponding to the first audio data, receiving a first input of a user, responding to the first input, editing the graph corresponding to the at least one piece of sub audio data, and outputting second audio data according to the edited graph. According to the scheme, the audio data is displayed in a graphical mode, so that a user can visually and clearly know the audio data, the audio data is edited by editing the graph corresponding to the audio data, the edited audio data is output, the threshold of audio processing is greatly reduced, and the audio processing method is wide in adaptability and convenient to popularize.

Description

Audio processing method and audio processing device

Technical Field

The present application belongs to the field of computers, and in particular relates to an audio processing method, an audio processing apparatus, an electronic device, and a readable storage medium.

Background

With the rapid development of science and technology, the amount of information contained in audio is increasingly complex, and users usually need to process audio at personal will in order to obtain required information. In the related art, when audio is processed, professional audio software is generally used for audio processing.

In the process of implementing the application, the inventor finds that at least the following problems exist in the related art, professional audio software needs to spend time to learn audio processing, the audio is difficult to edit instantly by hands, the time is long, the threshold is high, and the application range is narrow.

Content of application

An object of the embodiments of the present application is to provide an audio processing method, an audio processing apparatus, an electronic device, and a readable storage medium, which can solve the problems of long duration, high threshold, and narrow application range of audio processing due to the need of using professional audio software to perform audio processing in the related art.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides an audio processing method, where the method includes:

analyzing the first audio data to obtain at least one piece of sub-audio data;

generating a graph corresponding to at least one piece of sub audio data according to the attribute information of the at least one piece of sub audio data;

on a playing time axis corresponding to the first audio data, displaying the graph according to the arrangement of the at least one sub-audio data in the playing time period of the first audio data;

receiving a first input of a user, and editing a graph corresponding to the at least one sub-audio data in response to the first input;

and outputting the second audio data according to the edited graph.

In a second aspect, an embodiment of the present application provides an audio processing apparatus, including:

the acquisition module is used for analyzing the first audio data and acquiring at least one piece of sub-audio data;

the generating module is used for generating a graph corresponding to at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data;

the display module is used for displaying the graph according to the arrangement of the at least one sub-audio data in the playing time period of the first audio data on a playing time axis corresponding to the first audio data;

the editing module is used for receiving a first input of a user and responding to the first input to edit a graph corresponding to the at least one piece of sub-audio data;

and the output module is used for outputting the second audio data according to the edited graph.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the audio processing method according to the first aspect.

In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the audio processing method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the audio processing method according to the first aspect.

The embodiment of the application provides an audio processing method, an audio processing device, electronic equipment and a readable storage medium, and the scheme enables a user to visually and clearly know audio data by displaying audio data in a graphical mode, meanwhile, the user can edit the audio data by editing the graph corresponding to the audio data and then output the edited audio data, so that the threshold of audio processing is greatly reduced, and the audio processing method, the audio processing device, the electronic equipment and the readable storage medium are wide in adaptability and convenient to popularize.

Drawings

Fig. 1 is a flowchart illustrating steps of an audio processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a cubic graph according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a sector graph according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps of an audio presentation method according to an embodiment of the present application;

FIG. 5 is a second schematic diagram of a cubic graph according to an embodiment of the present application;

FIG. 6 is a second schematic diagram of a sector graph according to an embodiment of the present application;

FIG. 7 is a third schematic diagram of a cubic graph according to an embodiment of the present application;

FIG. 8 is a third schematic diagram of a sector pattern provided by an embodiment of the present application;

FIG. 9 is a fourth illustration of a sector graph according to an embodiment of the present application;

FIG. 10 is one of the schematic diagrams of a graphical presentation provided by an embodiment of the present application;

FIG. 11 is a second schematic diagram of a graphical illustration provided by an embodiment of the present application;

FIG. 12 is a third schematic diagram of a graphical illustration provided by an embodiment of the present application;

FIG. 13 is a fourth illustration of a graphical representation provided by an embodiment of the present application;

FIG. 14 is a fifth pictorial illustration provided in accordance with an embodiment of the present application;

FIG. 15 is a sixth schematic view of a graphical illustration provided in accordance with an embodiment of the present application;

FIG. 16 is a seventh illustration of a diagrammatic representation of an embodiment of the present application;

FIG. 17 is an eighth schematic illustration of a graphical representation provided by an embodiment of the present application;

FIG. 18 is a ninth illustration of a diagrammatic representation of an embodiment of the present application;

fig. 19 is a block diagram illustrating an audio processing apparatus according to an embodiment of the present application;

fig. 20 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 21 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

When the audio file is processed, professional audio editing software is often required to be introduced, and a professional skilled in using the audio editing software performs audio processing, so that the admission threshold is high, and the user is inconvenient to immediately get on hand. Meanwhile, the audio editing software needs to be executed in a certain application environment, so that the requirement on the application environment is high, and the application range is narrow. The audio processing method, the audio processing device, the electronic device and the readable storage medium display the audio file on the visual interface in a visual mode, realize the processing of the audio file through the operation of the visual interface, and obtain the processed audio file.

The audio processing method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an audio processing method provided by an embodiment of the present application, where the method includes:

step 101, analyzing the first audio data to obtain at least one piece of sub-audio data.

In the embodiment of the present application, the first audio data is an audio file to be processed. The first audio data needing to be analyzed is an audio file needing to be subjected to audio processing. The essence of the first audio data is data obtained by sound conversion, and when the first audio data is analyzed, the first audio data can be analyzed according to sound characteristics. The sound characteristics include tone, timbre, loudness, and the like. The first audio data may be parsed with different tones and different timbres to obtain at least one sub-audio data.

Specifically, a tone waveform and a tone waveform corresponding to the first audio data are extracted, the tone waveform and the tone waveform are respectively compared with a preset tone waveform and a preset tone waveform to obtain a superposed part of the former and the latter, when the superposed part of the former and the latter exceeds a preset threshold, the superposed part is determined to be the acquired sub-audio data, and by analogy, the first audio data is traversed to obtain at least one piece of sub-audio data.

The information quantity obtained by analyzing the first audio data with different tones and different timbres is huge, but not all information is needed by a user, so when the first audio data is analyzed and the sub-audio data is obtained, the analysis and the obtained content can be set according to different actual requirements, the utilization rate of the finally obtained sub-audio data is ensured, and the convenience of audio processing is further realized. Specifically, the preset pitch waveform and the preset tone waveform may be different from each other in accuracy according to actual requirements, or the preset threshold may be adjusted according to actual requirements.

It should be noted that, when the first audio data is parsed, the parsing is performed in sequence according to the time information corresponding to the first audio data, that is, from the start time of the first audio data to the end time of the first audio data. Illustratively, if the start time of a first audio data is 0 hours 0 minutes 0 seconds, and the end time is 3 minutes 20 seconds, i.e. when performing parsing, parsing will be performed sequentially from 0 hours 0 minutes 0 seconds to 3 minutes 20 seconds. Meanwhile, when analyzing, the time information and the volume information corresponding to the sub audio data are recorded. For a pitch waveform, the more times the waveform is vibrated in the same time period, the higher the pitch; for tone waveform, the smoothness of waveform is different, and tone is different. For the implementation manner of analyzing the first audio data, the implementation manner may also be implemented according to related technologies in the art, and this embodiment of the present application is not described herein again and is not specifically limited thereto.

And 102, generating a graph corresponding to at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data.

In the embodiment of the application, in order to improve the convenience of audio processing, after at least one piece of sub-audio data is acquired, the at least one piece of sub-audio data is graphically displayed. The graphic mode is a mode of generating an image for displaying according to at least one piece of sub-audio data, and other information corresponding to at least one piece of sub-audio data can be visually displayed, so that a user is helped to quickly realize audio processing.

Generally, each sub audio data has its own attribute information, which may include profile information, play time, and the like of the sub audio data. The profile information is used to distinguish each sub audio data, and the playing time length is used to describe the playing time length of one sub audio data. When at least one sub audio data is displayed in a graph mode, a graph corresponding to the sub audio data can be generated according to the attribute information of the at least one sub audio data. And for at least one piece of sub-audio data, after generating a graph, displaying the graph according to the arrangement of the at least one piece of sub-audio data in the playing time period of the first audio data on the playing time axis corresponding to the first audio data. Thereby realizing the uniform graphic display of a plurality of sub audio data.

Firstly, generating a graph corresponding to at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data. That is, the graph corresponding to the sub audio data may represent the attribute information of the sub audio data, and when a graph corresponding to one sub audio data is obtained, the attribute information of the audio data may be obtained through the graph.

And 103, displaying the graph according to the arrangement of at least one piece of sub audio data in the playing time period of the first audio data on the playing time axis corresponding to the first audio data.

In this embodiment of the application, the first audio data corresponds to a play time axis, the play time axis represents time information of the first audio data, and on the play time axis, a user can intuitively obtain the time information of the first audio data, including but not limited to a start time and an end time of the first audio data. And on a playing time axis corresponding to the first audio data, the generated graph is displayed according to the arrangement of at least one piece of sub audio data in the playing time period of the first audio data. At this time, the user may acquire time information of each sub audio data including, but not limited to, a start time and an end time of each sub audio data through the play time axis.

It should be noted that the playing time axis is determined according to the time information of the first audio data, and represents a period of time from the start time to the end time of the first audio data, and the maximum time displayed by the playing time axis is not less than the end time of the first audio data. Therefore, the time information of the first audio data can be completely displayed by the playing time axis.

Optionally, the playback time axes are of at least one type, and each playback time axis corresponds to at least one type of three-dimensional graphics. Therefore, a three-dimensional graph corresponding to the at least one sub audio data and the playing time axis can be generated according to the attribute information of the at least one sub audio data, and different dimensions of the three-dimensional graph correspond to different attribute information. The three-dimensional graphics can be superposed on the playing time axis for displaying according to the playing time period of the at least one piece of sub audio data in the first audio data.

And 104, receiving a first input of a user, and editing a graph corresponding to at least one piece of sub-audio data in response to the first input.

In this embodiment, the first input may be a user's type of input such as clicking, long-pressing, sliding, voice, or gesture on a graph corresponding to the sub-audio data. Specifically, the first input from the user acts on the graph corresponding to the at least one sub audio data, and in response to the first input, the editing the graph corresponding to the at least one sub audio data includes, but is not limited to: deleting the graph corresponding to at least one sub audio data, moving the graph corresponding to at least one sub audio data, modifying the graph corresponding to at least one sub audio data, and adding a new graph corresponding to one sub audio data. When the graph corresponding to at least one piece of sub-audio data is edited, the at least one piece of sub-audio data is correspondingly edited, that is, the editing acting on the graph affects the editing of the sub-audio data.

It should be noted that the first input may be implemented by a cursor, a touch, or a physical button. The implementation mode of the first input is based on the application scene where the application is located, for example, when the application scene supports a touch screen, the first input from a user can be implemented by using touch; when the application scenario supports the external device, the first input from the user can be realized by using the cursor.

And 105, outputting second audio data according to the edited graph.

In the embodiment of the application, the user can perform the related operation for multiple times to perform the first input, the received first input is not limited to the number of times, and the second audio data is output until the user determines to finish the adjustment. That is, the second audio data is determined based on the edited graphic. It should be noted that the second audio data is different from the first audio data, the second audio data is an audio file obtained after editing according to the first input, and information contained in the second audio data and the first audio data may or may not coincide with each other; the second audio data may contain a larger amount of information than the first audio data, and may contain a smaller amount of information than the first audio data.

It should be noted that the output second audio data is stored in a storage space corresponding to a specified directory, where the directory may be specified by the system or by the user. Meanwhile, the directory during storage can be named according to the requirements of the user.

To sum up, the audio processing method provided by the embodiment of the application graphically and clearly shows the audio data, edits the audio data by modifying the graph corresponding to the audio data, and outputs the edited audio data, so that the application threshold of audio processing is greatly reduced, the adaptability is wide, and the popularization is convenient.

Optionally, when each playing time axis corresponds to at least one type of three-dimensional graph, at least one piece of sub-audio data and the three-dimensional graph corresponding to the playing time axis are generated according to the attribute information of at least one piece of sub-audio data, and different dimensions of the three-dimensional graph correspond to different attribute information. And overlapping the three-dimensional graph on a playing time axis for displaying according to the playing time period of the at least one piece of sub audio data in the first audio data.

It should be noted that, in the embodiment of the present application, there is at least one type of playback time axis, and each playback time axis corresponds to at least one type of three-dimensional graphics. The playback time axis may start from the start time of the first audio data, or may advance in any axis direction of the playback time axis from the zero point. The playback time axis may be linear or circular. The attribute information includes profile information of the sub-audio data and a play time length, the profile information is used for describing one sub-audio data and is convenient for a user to distinguish the sub-audio data from other sub-audio data, and the play time length is used for describing an occupation situation of one sub-audio data in a play time period of the first audio data.

Optionally, the play timeline includes at least one of a linear timeline and a circular timeline. In the case where the playback time axis is a linear time axis, the three-dimensional figure is a cubic figure. In the case where the playback time axis is a circular time axis, the three-dimensional figure is a sector figure.

When the playing time axis is a linear time axis and the three-dimensional graph is a cubic graph, the first surface of the cubic graph displays the brief description information of the sub-audio data, and the length of the first edge of the first surface represents the playing time length of at least one sub-audio data. In the cubic graphs displayed on the linear time axis, the extending direction of the first edge of each cubic graph is consistent with the extending direction of the linear time axis, and the orientation of the first surface of each cubic graph is the same.

Illustratively, referring to fig. 2, one of schematic diagrams of a cubic graph is shown, where starting from zero, the sub audio data obtained by parsing the first audio data is dog-called sub audio data, bird-called sub audio data, piano sub audio data, three sub audio data are displayed on a playing time axis in an overlapping manner, the playing duration of the dog-called sub audio data is (a-0), the playing duration of the bird-called sub audio data is (B-a), and the playing duration of the piano sub audio data is (C-B).

When the playing time axis is a circumference time axis, and the three-dimensional graph is a sector graph, the arc surface of the sector graph displays the brief introduction information of at least one piece of sub-audio data, and the angle relative to the arc surface represents the playing time length of at least one piece of sub-audio data. In the sector graphs displayed on the circumferential time axis, the circle center corresponding to the arc-shaped edge of each sector graph is consistent with the circle center of the circumferential time axis. Illustratively, referring to FIG. 3, a schematic diagram is shown in one of the quadrant graphs, where the circumferential time axis starts with the open circle, the filled circle ends, and the open circle represents zero. The sub audio data obtained by the first audio data analysis are the dog call sub audio data, the bird call sub audio data and the piano sub audio data, the playing time of the dog call sub audio data is less than EOF, the playing time of the bird call sub audio data is less than FOH, and the playing time of the piano sub audio data is less than GOI.

It should be noted that, on the circumferential time axis, the playing time length may be represented by an angle of the arc-shaped surface, or may be determined by an arc, that is, an arc opposite to the angle. Taking fig. 3 as an example, the circular time axis extends from the hollow circle to the solid circle, and different times are marked on the circular time axis, so the playing time duration ≦ EOF may be expressed by (F-E), the playing time duration ≦ FOH may be expressed by (H-F), and the playing time duration ≦ GOI may be expressed by (I-G). When determined with the arc corresponding to the arc plane, the circumference timeline will be divided with the circumference where the radius of the largest sector corresponds.

Optionally, referring to fig. 4, a flowchart illustrating steps of an audio presentation method provided in an embodiment of the present application is shown, where the method includes:

step 201, obtaining an audio type corresponding to at least one sub audio data.

Step 202, on the playing time axis corresponding to the first audio data, the graphics of the sub audio data of the same audio type are placed on the same layer for displaying.

In the embodiment of the application, at least one piece of sub-audio data can be integrated according to the audio type, so that the at least one piece of sub-audio data can be conveniently displayed in a graphic mode. And on a playing time axis corresponding to the first audio data, the graphs of the sub audio data of the same audio type are placed on the same layer for displaying, so that the sub audio data can be intuitively and clearly displayed.

Further, the audio types corresponding to the sub audio data are obtained by presetting, for example, the sub audio data are bird calling sub audio data and dog calling sub audio data, and the audio types corresponding to the two sub audio data are both poultry and animals, so, taking fig. 2 as an example, when the audio types of the bird calling sub audio data and the dog calling sub audio data shown in fig. 2 are the same, fig. 2 can also be graphically displayed in the manner shown in fig. 5. As shown in fig. 5, perpendicular to the play time axis is an audio type axis which distinguishes sub audio data belonging to different audio types, dog call sub audio data and bird call sub audio data in the same layer, and piano sub audio data in another layer. The sub-audio data are displayed in an overlapping mode after being classified according to the audio type, and information such as the playing time length of each sub-audio data is not influenced.

Taking fig. 3 as an example, when the audio types of the bird song audio data and the dog song audio data shown in fig. 3 are the same, fig. 3 can also be graphically illustrated in the manner of fig. 6. As shown in fig. 6, an audio type axis is perpendicular to the plane of the circumferential time axis, and the audio type axis distinguishes sub audio data belonging to different audio types, dog call audio data and bird call audio data in the same layer, and piano audio data in the other layer. The sub-audio data are displayed in an overlapping mode after being classified according to the audio type, and information such as the playing time length of each sub-audio data is not influenced.

It should be noted that the audio types of the bird and dog call audio data and the audio types of the dog call audio data are only exemplary illustrations of the embodiment of the present application, and in practical applications, the naming of the audio types may be based on actual requirements, and may be identified by numbers, letters, or words, as long as different audio types are distinguished. Meanwhile, the display of the different layers may be as shown in fig. 5 and 6, may be displayed after the audio type axis is touched, or may be displayed in other manners, such as text information. The embodiments of the present application are not particularly limited herein.

Optionally, in this embodiment, the attribute information may further include at least one of a tone, a volume, and a timbre of the sub-audio data. When the attribute information includes a tone, it is reflected that the acquired sub audio data is determined by the tone.

Referring to fig. 7, in a case where the three-dimensional graphic is a cube graphic, a length of a second side N of a first side M of the cube graphic represents a tone of at least one sub-audio data, a shading (hatching) in the first side M represents a tone of the at least one sub-audio data, and a length of a third side V of the cube graphic perpendicular to the first side M represents a volume of the at least one sub-audio data.

Referring to fig. 8, in the case that the three-dimensional graph is a sector graph, a height N 'of an arc surface M' of the sector graph represents a tone of at least one sub-audio data, a shading (hatching) in the arc surface M 'represents a tone of the at least one sub-audio data, and an extending distance V' of the sector toward a center of the circle represents a volume of the at least one sub-audio data.

When the sub audio data is a plurality of sub audio data, the vertical lengths from the circle center to the arc-shaped surface are different because of different sound volumes, and a user can clearly distinguish the sound volumes corresponding to the different sub audio data. For example, as shown in fig. 9, two pieces of sub-audio data are shown, and it can be seen from the figure that the shading of the arc-shaped surface of the two pieces of sub-audio data is different, the length from the circle center to the arc-shaped surface is different, and the height of the arc-shaped surface is different, that is, the tone of the two pieces of sub-audio data is different, the volume of the two pieces of sub-audio data is different, and the tone of the two pieces of sub-audio data is different.

It should be noted that, when the sub-audio data is graphically displayed by using the circular time axis and the sector graph, the at least one sub-audio data obtained from the first audio data forms a cylinder graph, which is essentially formed by stacking the sector graphs corresponding to the at least one sub-audio data. Meanwhile, in practical application, the tone can be identified by the shading and the tone can be identified by the color.

Illustratively, the first audio data is parsed to obtain moustache audio data, bird song audio data, dog song audio data, song word audio data, guitar audio data, harp audio data, suona song audio data, piano song audio data, and unknown song audio data, and according to the above steps, fig. 10, which shows a schematic diagram of a graphical presentation, can be obtained. It can be seen that the guitar audio data, harp audio data and suona audio data are of the same audio type.

It should be noted that, when the audio type corresponding to the sub audio data is preset, the playing time duration corresponding to the sub audio data may also be referred to, so that the playing time durations corresponding to the sub audio data of the same audio type may not coincide. The playing time length is not only a period of time length, but also carries the starting time and the ending time of one piece of sub-audio data, so that the graph corresponding to one piece of sub-audio data can accurately correspond to the time of the playing time axis. Or, if there is an overlap between the playing times of two sub-audio data of the same audio type, the sub-audio data cannot be placed on the same layer, and at this time, the sub-audio data are stacked. Alternatively, in the case where the respective sub audio data are graphically presented on the playback time axis, a time progress scale is displayed on the time axis.

Further, fig. 11 is a top view of fig. 10, and as shown in fig. 11, the positions indicated by black arrows are the beginning of the circumferential time axis and the end of the circumferential time axis, and gradually increase in the direction of the arrows. The circular time axis represents the playing time period of the first audio data, the starting time and the ending time of each sub-audio data in the circular time axis can be found in the first audio data, and the duration of one sub-audio data in the first audio data is obtained. As shown in fig. 11, the dotted line shows a circle of the circumferential time axis, where the time axis is a finite time axis, the maximum time value of the audio file is the tail end T2 of the time axis, the minimum time value T1 of the audio file is the head end of the audio file on the time axis, and the solid circle on the dotted line is the tail end of the audio file on the time axis. The first section is connected with the tail end in a front-back mode, the occupied space is small, the display content is rich, and the interface display utilization rate is improved.

With continued reference to fig. 11, V1 through V2 characterize the volume of each sub audio data, with the open circles on the line segments being the locations of minimum volume of the audio file and the filled circles on the line segments being the locations of maximum volume of the audio file. That is, the volume is represented by the distance from the circle to the center of the circle, the volume corresponding to each sub-audio data is in the range from V1 to V2, V2 is determined by the maximum volume of one category in the audio file, and V1 is set as the zero point according to the practical application. During practical application, the head end of the time axis is firstly displayed, a user can control the cylinder to rotate around the central rotating shaft of the cylinder, and different contents corresponding to different positions of the circumferential time axis are displayed around the time axis.

It should be noted that the first audio data includes at least one sub-audio data, and one sub-audio data is a sector graph, so the first audio data may be displayed in a graph manner by using a cylinder graph, but sectors of each sub-audio data may have differences, and the volumes, timbres, or tones of the sectors are different, so the surface of the actually obtained cylinder graph is in an uneven form, where the form may refer to the case shown in fig. 9, and details of the embodiment of the present application are not repeated herein.

Optionally, the content shown in fig. 10 can also be presented by a linear time axis and a cubic graph, and the presentation schematic diagram is shown in fig. 12. As can be seen from the figure, the volume of each sub audio data is different, and the irregularity in the volume axis direction is different, and at the same time, the sub audio data included in the first audio data is clearly shown by being superimposed according to the audio type.

In the embodiment of the present application, the profile information is not only used to distinguish each sub-audio data, but also can show the attribute information of one sub-audio data, that is, when the user touches one sub-audio data, the information of the volume, tone and playing time length of the sub-audio data is shown. The user can intuitively know various information contained in the sub audio data, and music knowledge of the user is enriched.

The user can perform a trigger operation on the graph, the trigger operation is realized by a cursor or a touch, and the trigger operation can perform functions including deleting, moving, adjusting volume and the like. The triggering operation executed by the user is the first input, and the first input directly acts on the corresponding graph to delete, move, adjust the volume and the like. Wherein, the first input takes one piece of sub-audio data as an action object.

Taking the graph shown in fig. 12 as an example, when the user selects the first input of "unknown" to perform deletion, the "unknown" cube graph of at least one piece of sub audio data will be deleted, as shown in fig. 13; on the basis of fig. 13, when the user selects "dog call" to perform the first input of moving to the target location, the position of the "dog call" cube graphic will move to the target location as the user wishes, as shown in fig. 14; when the user selects "erhu" to perform the first input of adjusting the volume to the target volume, the cubic graphic of "erhu" will be lengthened or shortened in the direction of the arrow in the volume axis to the target volume, as shown in fig. 15. When a plurality of first inputs are provided, the first inputs may be sequentially implemented in the order of the first inputs. Similarly, the first input may also be performed with respect to fig. 10, and the embodiment of the present application is not described herein again.

Optionally, before step 104, the method further comprises: receiving a fifth input of the third audio data by the user, and generating a graph corresponding to the third audio data in response to the fifth input, wherein the third audio data is used for inserting the first audio data or replacing part or all of the sub audio data.

In an embodiment of the application, the first input comprises introducing new sub audio data in addition to the current sub audio data into the first audio data, thus before this first input is made. The incoming sub-audio data needs to be processed to meet the user's needs. The third audio data is a part or all of the data for inserting the first audio data or for replacing the sub audio data. Accordingly, a fifth input of the third audio data by the user is received, and in response to the fifth input, a graphic corresponding to the third audio data is generated, which graphic should be of the same type as the graphic of the first audio data. Illustratively, on the basis of fig. 15, new sub audio data is added to the unknown target, the profile information of the sub audio data is "my monologue", and the added diagram is shown in fig. 16.

Optionally, after graphically displaying the at least one sub-audio data, the method further includes: and receiving a second input of the graph of any sub audio data from the user, and responding to the second input to play the sub audio data.

In the embodiment of the application, after at least one piece of sub-audio data included in the first audio data is graphically displayed, a second input of a user to a graph of any one piece of sub-audio data may be received, and the sub-audio data is played in response to the second input, so that the user is helped to know the at least one piece of sub-audio data included in the first audio data, and the familiarity of the user with the first audio data is improved.

It should be noted that, when the sub audio data is played in response to the second input, the graphical manner shown in fig. 16 may further include a time progress scale, as shown in fig. 17, where the time progress scale indicates K points, that is, the sub audio data is played from the K points.

Optionally, after graphically presenting the at least one sub-audio data, the method further comprises: and receiving a third input of the user, and responding to the third input, marking the sound position corresponding to the search keyword in the graph, wherein the third input comprises the search keyword.

When facing a class containing language information, such as lyrics, embodiments of the present application may also determine the target object in a language retrieval manner. The retrieved portion will be presented in a manner that distinguishes it from other content, e.g., the retrieved portion is shown shaded. Taking fig. 18 as an example, a shaded portion on "lyrics" can be shown for the result of the retrieval. At the same time, the timeline will also display this shadow, pointing automatically thereto, to facilitate the user's replacement or deletion of the shadow. By marking the sound positions corresponding to the search keywords in the graph, accurate information indication is provided, and a user is helped to edit more conveniently.

It should be noted that the third input is performed on the premise that the first audio data includes any language information. Meanwhile, the language retrieval mode can be realized by referring to the related technology in the field, and the embodiment of the application is not described herein again.

In summary, the audio processing method provided by the embodiment of the application graphically and clearly shows the audio data, and edits the audio data by modifying the graph corresponding to the audio data, so that the application threshold of audio processing is greatly reduced, and the audio processing method is wide in adaptability and convenient to popularize. Meanwhile, the provided graphic mode can be displayed in a personalized mode according to requirements, and display content is enriched.

It should be noted that, in the audio processing method provided in the embodiment of the present application, the execution main body may be an audio processing apparatus, or a control module in the audio processing apparatus for executing the loaded audio processing method. In the embodiment of the present application, an audio processing apparatus executes an audio processing method for loading, and the audio processing method provided in the embodiment of the present application is described.

Referring to fig. 19, which shows a block diagram of an audio processing apparatus 300 according to an embodiment of the present application, the audio processing apparatus 300 includes:

the obtaining module 301 is configured to analyze the first audio data and obtain at least one piece of sub-audio data.

The generating module 302 is configured to generate a graph corresponding to at least one piece of sub-audio data according to the attribute information of the at least one piece of sub-audio data.

The presentation module 303 is configured to display the graphics according to the arrangement of at least one piece of sub-audio data in the playing time period of the first audio data on the playing time axis corresponding to the first audio data.

The editing module 304 is configured to receive a first input from a user, and edit a graph corresponding to the at least one sub audio data in response to the first input.

And an output module 305, configured to output the second audio data according to the edited graph.

The audio processing apparatus provided in the embodiment of the present application, the obtaining module analyzes first audio data to obtain at least one piece of sub-audio data, the generating module generates a graph corresponding to the at least one piece of sub-audio data according to attribute information of the at least one piece of sub-audio data, the displaying module displays the graph according to a playing time interval arrangement of the at least one piece of sub-audio data on a playing time axis corresponding to the first audio data, the editing module receives a first input of a user and edits the graph corresponding to the at least one piece of sub-audio data in response to the first input, and the outputting module outputs second audio data according to the edited graph. Is convenient for popularization.

Optionally, the generating module 302 is further configured to:

generating at least one sub audio data and a three-dimensional graph corresponding to a playing time axis according to the attribute information of the at least one sub audio data, wherein different dimensions of the three-dimensional graph correspond to different attribute information;

the display module 303 is further configured to:

and overlapping the three-dimensional graph on a playing time axis for displaying according to the playing time period of the at least one piece of sub audio data in the first audio data.

Optionally, the playback time axis includes a linear time axis and a circular time axis, and each playback time axis corresponds to at least one three-dimensional graph; the attribute information includes: brief introduction information and playing duration of the sub audio data;

under the condition that the playing time axis is a linear time axis, the three-dimensional graph is a cubic graph, the first surface of the cubic graph displays the brief description information of at least one piece of sub audio data, and the length of the first edge of the first surface represents the playing time length of at least one piece of sub audio data;

in the cubic graphs displayed on the linear time axis, the extending direction of the first edge of each cubic graph is consistent with the extending direction of the linear time axis, and the orientation of the first surface of each cubic graph is the same;

under the condition that the playing time axis is a circumferential time axis, the three-dimensional graph is a sector graph, the arc-shaped surface of the sector graph displays the brief introduction information of at least one piece of sub audio data, and the angle relative to the arc-shaped surface represents the playing time length of at least one piece of sub audio data;

in the sector graphs displayed on the circumferential time axis, the circle center corresponding to the arc-shaped edge of each sector graph is consistent with the circle center of the circumferential time axis.

Optionally, the apparatus 300 further comprises:

and the obtaining module is used for obtaining the audio type corresponding to the at least one sub audio data.

The display module 303 is further configured to:

and on the playing time axis corresponding to the first audio data, the graphics of the sub audio data of the same audio type are put on the same layer for displaying.

Optionally, the apparatus 300 further comprises:

and the marking module is used for receiving a third input of the user and marking the sound position corresponding to the search keyword in the graph in response to the third input, wherein the third input comprises the search keyword.

The audio processing device that this application embodiment provided shows through with audio data imaging for the user can realize the edition to audio data through the figure that the edition audio data corresponds when audio data is known directly perceivedly clearly, exports the audio data after the edition again, greatly reduced audio frequency processing's threshold, adaptability is wide, the facilitate promotion. Meanwhile, the provided graphic mode can be displayed in a personalized mode according to requirements, and display content is enriched.

The audio processing device in the embodiment of the present application may be a device, and may also be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The audio processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The audio processing apparatus provided in the embodiment of the present application can implement each process implemented by the audio processing apparatus in the method embodiments of fig. 1 to fig. 18, and for avoiding repetition, details are not described here again.

Optionally, as shown in fig. 20, an electronic device 400 is further provided in this embodiment of the present application, and includes a processor 401, a memory 402, and a program or an instruction stored in the memory 402 and executable on the processor 401, where the program or the instruction is executed by the processor 401 to implement each process of the foregoing audio processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.

Fig. 21 is a schematic hardware configuration diagram of an electronic device 500 for implementing an embodiment of the present application.

The electronic device 500 includes, but is not limited to: a radio frequency unit 501, a network module 502, an audio output unit 503, an input unit 504, a sensor 505, a display unit 506, a user input unit 507, an interface unit 508, a memory 509, a processor 510, and the like.

Those skilled in the art will appreciate that the electronic device 500 may further include a power supply (e.g., a battery) for supplying power to various components, and the power supply may be logically connected to the processor 510 via a power management system, so as to implement functions of managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 21 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description thereof is omitted.

The processor 510 is configured to parse the first audio data to obtain at least one sub-audio data.

The display unit 506 is configured to generate a graph corresponding to the at least one sub-audio data according to the attribute information of the at least one sub-audio data.

The display unit 506 is further configured to display the graphics according to the arrangement of at least one piece of sub-audio data in the playing time period of the first audio data on the playing time axis corresponding to the first audio data.

The user input unit 507 is configured to receive a first input from a user, and edit a graphic corresponding to at least one piece of sub audio data in response to the first input.

An output unit 504, configured to output the second audio data according to the edited graph.

The embodiment of the application shows audio data graphically, so that a user can visually and clearly know the audio data, edit the audio data by editing the graph corresponding to the audio data and then output the edited audio data, the threshold of audio processing is greatly reduced, and the audio processing method is wide in adaptability and convenient to popularize.

Optionally, the display unit 506 is further configured to generate at least one sub-audio data and a three-dimensional graph corresponding to the playing time axis, where different dimensions of the three-dimensional graph correspond to different attribute information.

Optionally, the display unit 506 is further configured to display the three-dimensional graph superimposed on the playing time axis according to the playing time period of the at least one piece of sub audio data in the first audio data.

Optionally, the processor 510 is further configured to obtain an audio type corresponding to at least one sub audio data.

Optionally, the display unit 506 is further configured to display the graphics of the sub audio data of the same audio type on the same layer on the playing time axis corresponding to the first audio data.

Optionally, the display unit 506 is further configured to receive a third input from the user, and in response to the third input, mark a sound position in the graph corresponding to the search keyword, where the third input includes the search keyword.

The embodiment of the application also carries out personalized display according to the requirement through the provided graphic mode, thereby enriching the display content. The audio type is used for superposing the plurality of sub audio data, the sub audio data included in the first audio data is clearly shown, meanwhile, the brief introduction information is shown on the first surface of the cube graph or the fan-shaped graph, and a user can intuitively know various information included in the sub audio data, so that the music knowledge of the user is enriched. And the user is helped to know at least one sub audio data included in the first audio data by playing the sub audio data, so that the familiarity of the user with the first audio data is improved. And by marking the sound positions corresponding to the search keywords in the graph, accurate information indication is provided, and a user is helped to edit more conveniently.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the embodiment of the audio processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above-mentioned audio processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the description is omitted here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of audio processing, the method comprising:

analyzing the first audio data to obtain at least one piece of sub-audio data;

and outputting the second audio data according to the edited graph.

2. The method of claim 1, wherein the generating a graph corresponding to at least one sub audio data according to attribute information of the at least one sub audio data comprises:

generating a three-dimensional graph corresponding to the at least one piece of sub-audio data and the playing time axis according to attribute information of the at least one piece of sub-audio data, wherein different dimensions of the three-dimensional graph correspond to different attribute information;

the displaying the graph according to the arrangement of the playing time periods of the at least one sub-audio data in the first audio data on the playing time axis corresponding to the first audio data includes:

and overlapping the three-dimensional graph on the playing time axis for displaying according to the playing time period of the at least one piece of sub-audio data in the first audio data.

3. The method of claim 2, wherein the playback timeline includes a linear timeline and a circular timeline, each playback timeline corresponding to at least one three-dimensional graphic; the attribute information includes: brief introduction information and playing duration of the sub audio data;

under the condition that the playing time axis is a linear time axis, the three-dimensional graph is a cubic graph, a first surface of the cubic graph displays the brief description information of the at least one piece of sub-audio data, and the length of a first edge of the first surface represents the playing time length of the at least one piece of sub-audio data;

under the condition that the playing time axis is a circumferential time axis, the three-dimensional graph is a sector graph, an arc-shaped surface of the sector graph displays brief introduction information of the at least one piece of sub-audio data, and an angle relative to the arc-shaped surface represents the playing time length of the at least one piece of sub-audio data;

in the sector graphs displayed on the circumference time axis, the circle center corresponding to the arc-shaped edge of each sector graph is consistent with the circle center of the circumference time axis.

4. The method according to claim 1, wherein after generating the graph corresponding to the at least one sub-audio data according to the attribute information of the at least one sub-audio data, the method further comprises:

obtaining an audio type corresponding to at least one sub audio data;

and on a playing time axis corresponding to the first audio data, the graphs of the sub audio data of the same audio type are placed on the same layer for displaying.

5. The method of any of claims 1-4, wherein after said graphically presenting said at least one sub-audio data, said method further comprises:

and receiving a third input of the user, and responding to the third input, marking a sound position corresponding to the search keyword in the graph, wherein the third input comprises the search keyword.

6. An audio processing apparatus, characterized in that the apparatus comprises:

7. The apparatus of claim 6, wherein the generating module is further configured to:

the display module is further configured to:

8. The apparatus of claim 7, wherein the playback timeline comprises a linear timeline and a circular timeline, each playback timeline corresponding to at least one three-dimensional graphic; the attribute information includes: brief introduction information and playing duration of the sub audio data;

9. The apparatus of claim 6, further comprising:

the acquisition module is used for acquiring an audio type corresponding to at least one piece of sub-audio data;

the display module is further configured to:

10. The apparatus according to any one of claims 6-9, further comprising: