CN112182285A

CN112182285A - Data processing method and equipment

Info

Publication number: CN112182285A
Application number: CN202011053102.1A
Authority: CN
Inventors: 杨伟明; 赵伟峰
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-05

Abstract

The embodiment of the application discloses a data processing method and equipment, wherein the method comprises the following steps: acquiring target audio data from an audio set; extracting the characteristics of the target audio data to obtain the audio characteristics of the target audio data, and converting the audio characteristics into phonetic symbols according to the preset conversion relation between the audio characteristics and the phonetic symbols; generating a range difficulty coefficient of the target audio data according to the phonetic symbol; generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks; and predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient. By the method and the device, the efficiency of selecting the songs by the user can be improved.

Description

Data processing method and equipment

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data processing method and device.

Background

With the rapid development of the internet, the variety of music software is very diverse, the music software can provide services such as listening to songs and singing for users, for example, in the singing service, the current music software provides the same song recommendation service for all users, the classification of songs is rough, different users have different preferences for different songs, and only individual songs in the same song category may meet the requirements of users, for example, some people like to sing high tones, and some people like to slow songs, so that the users are difficult to select songs suitable for singing by themselves, and it takes a long time to select songs, thereby reducing the efficiency of selecting songs by users.

Disclosure of Invention

The embodiment of the application provides a data processing method and equipment, which can improve the efficiency of selecting songs by a user.

An aspect of the present application provides a data processing method, which may include:

acquiring target audio data from an audio set;

extracting the characteristics of the target audio data to obtain the audio characteristics of the target audio data, and converting the audio characteristics into phonetic symbols according to the preset conversion relation between the audio characteristics and the phonetic symbols;

generating a range difficulty coefficient of the target audio data according to the phonetic symbol;

generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks;

and predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient.

Wherein the obtaining target audio data from the audio set includes:

acquiring audio data to be processed from an audio set;

detecting the state information of the audio data to be processed according to the data volume and the audio characteristics of the audio data to be processed;

and if the state information of the audio data to be processed is in a normal state, preprocessing the audio data to be processed to obtain target audio data corresponding to the audio data to be processed.

Wherein, the detecting the state information of the audio data to be processed according to the data volume and the audio features of the audio data to be processed includes:

acquiring the data volume and the channel number of the audio data to be processed;

if the data volume of the audio data to be processed is greater than or equal to a data threshold value and the number of the channels is the same as the inherent number of the channels of the audio data to be processed, determining the state information of the audio data to be processed as a normal state;

and if the data volume of the audio data to be processed is smaller than a data threshold value or the number of the channels is different from the inherent number of the channels of the audio data to be processed, determining the state information of the audio data to be processed as an abnormal state.

The preprocessing the audio data to be processed to obtain target audio data corresponding to the audio data to be processed includes:

sampling the audio data to be processed to obtain the volume value of each sampling point;

and deleting the sampling points with the volume values smaller than the volume threshold value from the audio data to be processed to obtain target audio data.

Wherein the generating of the range difficulty coefficient of the target audio data according to the phonetic symbol includes:

obtaining a difference value between a maximum phonetic symbol and a minimum phonetic symbol in the target audio data, and generating a range span coefficient of the target audio data according to the difference value;

acquiring first-order difference information of the phonetic symbol of the target audio data, and determining a range change coefficient of the target audio data according to the first-order difference information;

determining a sum of the range span coefficient and the range change coefficient as a range difficulty coefficient of the target audio data.

Wherein the generating of the high pitch difficulty coefficient and the low pitch difficulty coefficient of the target audio data according to the note number comprises:

distinguishing a high-tone symbol and a low-tone symbol in the target audio data according to the tone symbol;

determining the difficulty coefficient of a high-pitch symbol in the target audio data according to a preset corresponding relation between the high-pitch symbol and the difficulty coefficient, and determining the difficulty coefficient of a low-pitch symbol in the target audio data according to a preset corresponding relation between a low-pitch symbol and the difficulty coefficient;

and obtaining a high pitch difficulty coefficient of the target audio data based on the difficulty coefficient of the high pitch symbol in the target audio data, and obtaining a low pitch difficulty coefficient of the target audio data based on the difficulty coefficient of the low pitch symbol in the target audio data.

Wherein said distinguishing a high-pitch symbol and a low-pitch symbol of the target audio data according to the tone symbol comprises:

acquiring a standard sound range corresponding to the target audio data, determining a note symbol corresponding to the highest tone in the standard sound range as a highest note symbol, and determining a note symbol corresponding to the lowest tone in the standard sound range as a lowest note symbol;

and determining the phonetic symbols higher than the highest-pitch symbol in the target audio data as high-pitch phonetic symbols, and determining the phonetic symbols smaller than the lowest-pitch phonetic symbol in the target audio data as low-pitch phonetic symbols.

Predicting the singing difficulty corresponding to the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient, wherein the predicting comprises:

acquiring weight coefficients corresponding to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient respectively;

generating a singing difficulty coefficient of the target audio data according to the range difficulty coefficient and the corresponding weight coefficient, the treble difficulty coefficient and the corresponding weight coefficient, and the bass difficulty coefficient and the corresponding weight coefficient; the singing difficulty coefficient is used for representing the singing difficulty of the target audio data.

An aspect of an embodiment of the present application provides a data processing apparatus, which may include:

the data acquisition unit is used for acquiring target audio data from the audio set;

the characteristic extraction unit is used for carrying out characteristic extraction on the target audio data to obtain the audio characteristics of the target audio data, and converting the audio characteristics into voice symbols according to the preset conversion relation between the audio characteristics and the voice symbols;

a first coefficient obtaining unit, configured to generate a range difficulty coefficient of the target audio data according to the phonetic symbol;

the second coefficient acquisition unit is used for generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks;

and the difficulty prediction unit is used for predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient.

Wherein the data acquisition unit includes:

the data acquisition subunit is used for acquiring the audio data to be processed from the audio set;

the state detection subunit is used for detecting the state information of the audio data to be processed according to the data volume and the audio characteristics of the audio data to be processed;

and the data preprocessing subunit is used for preprocessing the audio data to be processed to obtain target audio data corresponding to the audio data to be processed if the state information of the audio data to be processed is in a normal state.

Wherein the state detection subunit is specifically configured to:

Wherein the data preprocessing subunit is specifically configured to:

The first coefficient obtaining unit is specifically configured to:

Wherein the second coefficient acquisition unit includes:

a note distinguishing subunit, configured to distinguish a high-pitch symbol and a low-pitch symbol in the target audio data according to the pitch symbol;

a second coefficient obtaining subunit, configured to determine a difficulty coefficient of a high-pitched tone symbol in the target audio data according to a preset corresponding relationship between the high-pitched tone symbol and the difficulty coefficient, and determine a difficulty coefficient of a low-pitched tone symbol in the target audio data according to a preset corresponding relationship between the low-pitched tone symbol and the difficulty coefficient;

Wherein the note region subunit is specifically configured to:

Wherein the difficulty prediction unit is specifically configured to:

An aspect of the embodiments of the present application provides a computer-readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.

An aspect of an embodiment of the present application provides a computer device, including a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.

An aspect of an embodiment of the present application provides a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the above-mentioned method steps.

In the embodiment of the application, target audio data are acquired from an audio set; extracting the characteristics of the target audio data to obtain the audio characteristics of the target audio data, and converting the audio characteristics into phonetic symbols according to the preset conversion relation between the audio characteristics and the phonetic symbols; generating a range difficulty coefficient of the target audio data according to the phonetic symbol; generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks; and predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient. By setting the difficulty coefficient for the songs, the problem that the user needs to select the songs for a long time is avoided, and the efficiency of selecting the songs by the user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block diagram of a system architecture for data processing according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating an example of a data processing method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a block diagram of a data processing system according to an embodiment of the present invention. The server 10f establishes a connection with a user terminal cluster through the switch 10e and the communication bus 10d, and the user terminal cluster may include: user terminal 10a, user terminal 10 b. The audio set storage module 10g stores a plurality of audio data, the audio data includes audio data in a normal state and audio data in an abnormal state, the server 10f acquires audio data to be processed from the audio set, pre-processes the audio data to be processed, generates target audio data corresponding to the audio data to be processed, and performs feature extraction on the target audio data to obtain audio features of the target audio data, the server 10f converts the audio features into phonetic symbols according to a preset conversion relationship between the audio features and the phonetic symbols, and generating a register difficulty coefficient, a treble difficulty coefficient and a bass difficulty coefficient of the target audio data according to the tone marks, and predicting the singing difficulty of the target audio data by the server 10f according to the register difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient.

The user terminal related to the embodiment of the application comprises: terminal equipment such as tablet personal computers, smart phones, Personal Computers (PCs), notebook computers, palmtop computers and the like.

Referring to fig. 2, a flow chart of a data processing method according to an embodiment of the present application is schematically shown. As shown in fig. 2, the method of the embodiment of the present application may include the following steps S101 to S105.

S101, target audio data are obtained from the audio set.

In particular, the data processing device retrieves the target audio data from the audio collection, it being understood that, the data processing device may be the server 10f in fig. 1, the audio set including at least one audio data, the audio data being data in an audio format, for example, data in the format of MP3, MIDI, WMA, the audio set may include other data not belonging to audio data, the data processing device may filter out audio files through the file format, the target audio data may be any audio data in the audio set or obtained by pre-processing the audio data to be processed in the audio set, the audio data to be processed may be any audio data in the audio set, the audio data to be processed is preprocessed, so that the quality of the audio data to be processed can be improved, and the accuracy of determining the singing difficulty of the audio data is further improved. The audio data with lower volume value in the audio data is more difficult to determine the singing difficulty coefficient, therefore, the preprocessing mode is usually to delete the sampling point with lower volume value in the audio data to be processed, and the specific preprocessing mode is as follows: sampling the audio data to be processed to obtain the volume value of each sampling point, and deleting the sampling points with the volume values smaller than the volume threshold value from the audio data to be processed when the volume values of the sampling points are smaller than the volume threshold value; and determining the audio data to be processed after deleting the sampling points with the volume values smaller than the volume threshold value as target audio data.

S102, extracting the characteristics of the target audio data to obtain the audio characteristics of the target audio data, and converting the audio characteristics into phonetic symbols according to the preset conversion relation between the audio characteristics and the phonetic symbols.

Specifically, the data processing device performs feature extraction on the target audio data to obtain audio features of the target audio data, and converts the audio features into phonetic symbols according to a preset conversion relationship between the audio features and the phonetic symbols. The audio features may be fundamental frequency features of the audio files, the fundamental frequency features corresponding to pitches of the audio files, and the audio features are converted into tone symbols according to a preset conversion relationship between the audio features and the tone symbols.

The phonetic symbols being the Musical instrument digital interface (Musical instrument)t Digital Interface MIDI), specifically, the preset conversion relationship between the audio features and the phonetic symbols can be represented by a formula d of 69+12log₂(f/440) converting, wherein d is a phonetic symbol and f is an audio feature; it should be noted that, after feature extraction is performed on target audio data by using a known frame length and step length, multiple audio features are obtained, each audio feature corresponds to a respective sound symbol, the target audio data corresponds to multiple sound symbols, if the target audio data is a binaural audio, feature extraction may be performed on the audio data in each channel, and a sound symbol corresponding to the audio feature in each channel is obtained according to the above conversion formula.

S103, generating a range difficulty coefficient of the target audio data according to the phonetic symbol;

specifically, the data processing device generates a range difficulty coefficient of the target audio data according to the phonetic symbol, and it can be understood that the range difficulty coefficient is determined by a range span coefficient and a range variation coefficient.

The specific steps for determining the pitch range span coefficient are as follows: the data processing apparatus acquires a difference value between a maximum note number and a minimum note number in the target audio data, and a span of note numbers in MIDI. The span of the MIDI pitch symbol is the difference between the maximum pitch symbol and the minimum pitch symbol in the MIDI, for example, if the maximum pitch symbol in the MIDI is 108 and the minimum pitch symbol in the MIDI is 21, the span of the MIDI pitch symbol is 87. And determining the ratio of the difference value of the maximum phonetic symbol and the minimum phonetic symbol in the target audio data to the span of the phonetic symbols in the MIDI as the pitch range span coefficient of the target audio data.

The specific steps for determining the range change coefficient are as follows: and acquiring first-order difference information of the phonetic symbol of the target audio data, and determining the range change coefficient of the target audio data according to the first-order difference information.

And S104, generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks.

Specifically, the data processing device generates a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks. It can be understood that the data processing device obtains a standard range corresponding to the target audio data, and the standard range is a preset range and can be adjusted. And each tone in the sound range corresponds to one sound symbol, the sound symbol corresponding to the highest tone in the standard sound range is determined as the highest sound symbol, and the sound symbol corresponding to the lowest tone in the standard sound range is determined as the lowest sound symbol. For example, the standard register may be set to C3-G4, with the highest pitch of the standard register being G4, the note number corresponding to G4 being 67, and thus the highest note number being 67, and the lowest pitch of the standard register being C3, and the note number corresponding to C3 being 48, and thus the lowest note number being 48.

Determining the phonetic symbols higher than the most-phonetic symbol in the target audio data as high-phonetic symbols, and determining the phonetic symbols smaller than the least-phonetic symbol in the target audio data as low-phonetic symbols; and counting the number of each high tone symbol and the number of each low tone symbol in the target audio data, generating a high tone difficulty coefficient of the target audio data according to the number of the high tone symbols, and generating a low tone difficulty coefficient of the target audio data according to the number of the low tone symbols.

And S105, predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient.

Specifically, the data processing device predicts the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient. It is understood that the singing difficulty coefficient is used for representing the singing difficulty of the target audio data, and the singing difficulty coefficient can visually indicate the singing difficulty of the target audio data.

Specifically, the data processing device may set a weighting coefficient for the range difficulty coefficient, the treble difficulty coefficient, and the bass difficulty coefficient, the weighting coefficient may be determined according to an influence of the above difficulty coefficient in the audio data, and the weighting calculation may be performed according to the weighting coefficient, the range difficulty coefficient, the treble difficulty coefficient, and the bass difficulty coefficient, so as to obtain the singing difficulty coefficient of the target audio data. In the music application program, the singing difficulty coefficient of each song can be displayed to the user, and the user can select the song according to the singing difficulty coefficient.

In the embodiment of the application, target audio data are acquired from an audio set; extracting the characteristics of the target audio data to obtain the audio characteristics of the target audio data, and converting the audio characteristics into phonetic symbols according to the preset conversion relation between the audio characteristics and the phonetic symbols; generating a range difficulty coefficient of the target audio data according to the phonetic symbol; generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks; and predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient. By setting the difficulty coefficient for the songs, the problem that the user needs to select the songs for a long time is avoided, and the efficiency of selecting the songs by the user is improved. Alternatively, the songs may be classified according to the singing difficulty coefficient to obtain a list of songs with different singing difficulty coefficients. Of course, the singing difficulty coefficient can also be applied to other song processing scenarios.

Referring to fig. 3, a flow chart of a data processing method according to an embodiment of the present application is schematically shown. As shown in fig. 3, the method of the embodiment of the present application may include the following steps S201 to S208.

S201, acquiring audio data to be processed from an audio set;

specifically, the data processing device obtains the audio data to be processed from the audio set, and it is understood that the audio set includes at least one piece of audio data, the audio data is data in an audio format, for example, data in a format of MP3, MIDI, and WMA, the audio set may include other data that does not belong to the audio data, the data processing device may filter out the audio file through a file format, and the audio data to be processed may be any one piece of audio data in the audio set.

S202, detecting the state information of the audio data to be processed according to the data volume and the audio characteristics of the audio data to be processed;

specifically, the data processing device detects state information of the audio data to be processed according to the data volume and the audio features of the audio data to be processed, and it can be understood that the state information is used for representing whether the audio data to be processed is abnormal, and the state information includes a normal state and an abnormal state.

The specific method for judging the state information of the audio data to be processed is as follows: and when the data volume of the audio data to be processed is larger than or equal to a data volume threshold value and the number of the channels is the same as the inherent number of the channels of the audio data to be processed, determining the state information of the audio data to be processed as a normal state. The data volume threshold may be preset, the number of channels of the audio data to be processed may be obtained by a tag of the audio file, and the inherent number of channels is determined by an attribute of the audio file, for example, it is determined that the number of channels of the audio data to be processed is 2 by the tag of the audio data to be processed, if it is determined that the audio data to be processed is a two-channel file by the attribute of the audio data to be processed, that is, the inherent number of channels is 2, the number of channels of the audio data to be processed is the same as the inherent number of channels of the audio data to be processed, and if it is determined that the audio data to be processed is a single-channel file by the attribute of the audio data to be processed, that is, the inherent number of channels is 1, the number of channels of the audio data.

And when the data volume of the audio data to be processed is smaller than a data volume threshold value or the number of the channels is different from the inherent number of the channels of the audio data to be processed, determining the state information of the audio data to be processed as an abnormal state.

According to the method for judging the state information, the audio data to be processed which simultaneously meet the conditions of the data volume and the channel number is in a normal state, and otherwise, the audio data is in an abnormal state.

S203, when the state information of the audio data to be processed is in a normal state, preprocessing the audio data to be processed to generate target audio data corresponding to the audio data to be processed.

Specifically, when the state information of the audio data to be processed is in a normal state, that is, when the data amount of the audio data to be processed is greater than or equal to a data threshold and the number of channels is the same as the inherent number of channels of the audio data to be processed, the data processing device preprocesses the audio data to be processed to generate target audio data corresponding to the audio data to be processed.

The data processing device preprocesses the audio data to be processed as follows:

sampling the audio data to be processed to obtain the volume value of each sampling point, and deleting the sampling points with the volume values smaller than the volume threshold value from the audio data to be processed when the volume values of the sampling points are smaller than the volume threshold value; and determining the audio data to be processed after deleting the sampling points with the volume values smaller than the volume threshold value as target audio data.

Specifically, the data processing device samples the audio data to be processed by adopting a preset frequency, acquires the volume value of each sampling point, and deletes the sampling point with the volume value smaller than the volume threshold value from the audio data to be processed when the volume value of the sampling point is smaller than the volume threshold value. It is understood that the volume threshold is preset, for example, the volume threshold is set to-60 db, the volume value of each sampling point is obtained, and when the volume value of a sampling point is smaller than-60 db, the sampling point with the volume value smaller than the volume threshold is deleted from the audio data to be processed. It should be noted that, the audio data with a lower volume value is more difficult to determine the singing difficulty coefficient, so that deleting the sampling point with the lower volume value in the audio data to be processed can improve the quality of the audio data to be processed, and further improve the accuracy of determining the singing difficulty of the audio data, and finally, determining the audio data to be processed after deleting the sampling point with the volume value smaller than the volume threshold value as the target audio data.

S204, when the state information of the audio data to be processed is in an abnormal state, marking the audio data to be processed as abnormal data.

Specifically, when the state information of the audio data to be processed is in an abnormal state, that is, when the data amount of the audio data to be processed is smaller than a data threshold value, or the number of channels is different from the inherent number of channels of the audio data to be processed, the audio data to be processed is marked as abnormal data, if the audio data to be processed is marked as abnormal data, the singing difficulty of the data to be processed is not determined, and for the abnormal data, a user can modify the data and process the abnormal data into a normal state, so that the singing difficulty can be determined subsequently, or the abnormal data is deleted from an audio set.

S205, extracting the characteristics of the target audio data to generate the audio characteristics of the target audio data, and converting the audio characteristics into phonetic symbols according to the preset conversion relation between the audio characteristics and the phonetic symbols.

Specifically, the data processing device performs feature extraction on the target audio data to generate audio features of the target audio data, and converts the audio features into sound symbols according to preset conversion relations between the audio features and the sound symbols.

Specifically, the preset conversion relationship between the audio feature and the phonetic symbol can be represented by the formula d of 69+12log₂(f/440) converting, wherein d is a phonetic symbol and f is an audio feature; it should be noted that, after feature extraction is performed on target audio data by using a known frame length and step length, a plurality of audio features are obtained, each audio feature corresponds to a respective note, and the target audio data corresponds to a plurality of notesIf the target audio data is a binaural audio, feature extraction may be performed on the audio data in each channel, and a phonetic symbol corresponding to the audio feature in each channel may be obtained according to the above conversion formula.

S206, generating a range difficulty coefficient of the target audio data according to the phonetic symbol;

specifically, the range difficulty coefficient is determined by a range span coefficient and a range change coefficient, and the specific steps of determining the range span coefficient are as follows:

the data processing device obtains the difference value between the maximum note symbol and the minimum note symbol in the target audio data and the span of the MIDI pitch symbol, wherein the span of the MIDI pitch symbol is the difference value between the maximum note symbol and the minimum note symbol in the MIDI, for example, if the maximum note symbol in the MIDI is 108 and the minimum note symbol in the MIDI is 21, the span of the MIDI pitch symbol is 87, and the ratio of the difference value between the maximum note symbol and the minimum note symbol in the target audio data and the span of the MIDI pitch symbol is determined as the pitch span coefficient of the target audio data;

the specific steps for determining the range difficulty coefficient are as follows:

acquiring first-order difference information of the phonetic symbol of the target audio data, and determining a range change coefficient of the target audio data according to the first-order difference information; determining a sum of the range span coefficient and the range change coefficient as a range difficulty coefficient of the target audio data.

S207, generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks;

specifically, the data processing device obtains a standard pitch range corresponding to the target audio data, where the standard pitch range is a preset pitch range and can be adjusted, a tone in each pitch range corresponds to one note, a maximum value in the standard pitch range is determined as a maximum note, a minimum value in the standard pitch range is determined as a minimum note, and for example, the standard pitch range may be set to C3-G4, the maximum note of the standard pitch range is G4, the minimum note of the standard pitch range is C3, the note corresponding to C3 is 48, the note corresponding to G4 is 67, the note higher than the maximum note in the target audio data is determined as a high note, and the note lower than the minimum note in the target audio data is determined as a low note;

and generating a high tone difficulty coefficient comparison table of a high tone symbol and a difficulty coefficient according to the highest tone symbol, and generating a low tone difficulty coefficient comparison table of a low tone symbol and a difficulty coefficient according to the lowest tone symbol. The following description is made with the standard range C3-G4, please refer to table 1, where table 1 shows the correspondence between the high pitch and the difficulty factor, and as shown in table 1, with the selected highest pitch G4 as the reference, the difficulty factor of G4 is set to 1, and the difficulty factor is increased by 1 per liter of tone, for example, a4 is two semitones higher than G4, so the difficulty factor of each a4 is 3; referring to table 2, table 2 shows the correspondence between the bass tone and the difficulty factor, and as shown in table 2, the difficulty factor of C3 is set to 1 with the selected lowest tone C3 as the reference, and the difficulty factor is increased by 1 every time the tone is decreased.

TABLE 1

midi voice symbol	Coefficient of difficulty
		108(C8)	42
…	…
		69(A4)	3
68(G4#)	2
		67(G4)	1

TABLE 2

midi voice symbol	Coefficient of difficulty
		48(C3)	1
...	...
		23(B0)	26
22(A0#)	27
		21(A0)	28

Obtaining a high pitch difficulty coefficient of a high pitch symbol in the target audio data according to the high pitch difficulty coefficient comparison table; specifically, the number of each high-pitch symbol in the target audio data is counted, and according to the number of the high-pitch symbols and the difficulty coefficient of the high-pitch symbols, a formula is adopted: the treble difficulty coefficient is G4 difficulty coefficient G4 number + G4 difficulty coefficient G4 number + a4 difficulty coefficient a4 number + … + C8 difficulty coefficient C8 number, and the treble difficulty coefficient of the target audio data is obtained. The tone and the note number have a corresponding relationship, and the note number difficulty coefficient can be regarded as the difficulty coefficient of the tone.

Obtaining a bass difficulty coefficient of bass symbols in the target audio data according to the bass difficulty coefficient comparison table, specifically, counting the number of each bass symbol in the target audio data, and adopting a formula according to the number of the bass symbols and the difficulty coefficient of the bass symbols: and obtaining the bass difficulty coefficient of the target audio data by the bass difficulty coefficients of A0 difficulty coefficients of A0 number + A0 difficulty coefficients of A0# number + B0 difficulty coefficients of B0 number + … + C3 difficulty coefficients of C3 number. The tone and the note number have a corresponding relationship, and the note number difficulty coefficient can be regarded as the difficulty coefficient of the tone.

And S208, predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient.

Specifically, the data processing device obtains weighting coefficients corresponding to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient respectively, and the weighting coefficients are determined according to the influence of the difficulty coefficients in the audio data. The singing difficulty coefficient of the target audio data may be preset and generated according to the weighting coefficient, the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient, and specifically, a formula may be adopted: and generating a singing difficulty coefficient, namely weight 1, high pitch difficulty coefficient + weight 2, low pitch difficulty coefficient + weight 3, and a range difficulty coefficient. The singing difficulty coefficient is used for representing the singing difficulty of the target audio data, the singing difficulty coefficient can visually indicate the singing difficulty of the songs, the singing difficulty coefficient of each song can be displayed to a user in a music application program, and the user can select the songs according to the singing difficulty coefficient.

The following description will be made with reference to fig. 4 for a specific implementation scenario provided in the embodiments of the present application, as shown in fig. 4. In the music software, a server can predict the singing difficulty of audio data (songs) in a song library, an audio set in the server is used for storing the audio data, and the audio set comprises at least one piece of audio data. The server acquires audio data to be processed from an audio set, and detects state information of the audio data to be processed according to the data volume and audio features of the audio data to be processed. The state information comprises a normal state and an abnormal state, when the state information of the audio data to be processed is in the abnormal state, the audio data to be processed is marked as abnormal data, and meanwhile, the singing difficulty of the abnormal data cannot be predicted; and when the state information of the audio data to be processed is in a normal state, preprocessing the audio data to be processed to generate target audio data corresponding to the audio data to be processed. The method comprises the steps of extracting features of target audio data to obtain audio features of the target audio data, converting the audio features into phonetic symbols according to preset conversion relations between the audio features and the phonetic symbols, generating a range difficulty coefficient, a treble difficulty coefficient and a bass difficulty coefficient of the target audio data according to the phonetic symbols, determining weight coefficients corresponding to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient respectively according to the influence of the difficulty coefficients in the audio data, and performing weighted calculation according to the weight coefficients corresponding to the difficulty coefficients and the difficulty coefficients to generate a singing difficulty coefficient of the target audio data. In one particular scenario, the user may select an appropriate song based on the singing difficulty factor.

In the embodiment of the application, the audio data to be processed is obtained from the audio set; preprocessing the audio data to be processed to generate target audio data corresponding to the audio data to be processed; performing feature extraction on the target audio data to generate audio features of the target audio data, and converting the audio features into phonetic symbols according to a preset conversion relation between the audio features and the phonetic symbols; generating a range difficulty coefficient of the target audio data according to the phonetic symbol; generating a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the tone marks; and predicting the singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient. By setting the difficulty coefficient for the songs, the problem that the user needs to select the songs for a long time is avoided, and the efficiency of selecting the songs by the user is improved. Alternatively, the songs may be classified according to the singing difficulty coefficient to obtain a list of songs with different singing difficulty coefficients. Of course, the singing difficulty coefficient can also be applied to other song processing scenarios.

Referring to fig. 5, a schematic structural diagram of a data processing apparatus is provided in an embodiment of the present application. The data processing device may be a computer program (comprising program code) running on a computer device, e.g. an application software; the device can be used for executing the corresponding steps in the method provided by the embodiment of the application. As shown in fig. 5, the data processing apparatus 1 according to the embodiment of the present application may include: the system comprises a data acquisition unit 11, a feature extraction unit 12, a first coefficient acquisition unit 13, a second coefficient acquisition unit 14 and a difficulty prediction unit 15.

A data acquisition unit 11 configured to acquire target audio data from an audio set;

the feature extraction unit 12 is configured to perform feature extraction on the target audio data to obtain an audio feature of the target audio data, and convert the audio feature into a phonetic symbol according to a preset conversion relationship between the audio feature and the phonetic symbol;

a first coefficient obtaining unit 13, configured to generate a range difficulty coefficient of the target audio data according to the phonetic symbol;

a second coefficient obtaining unit 14, configured to generate a high pitch difficulty coefficient and a low pitch difficulty coefficient of the target audio data according to the note number;

a difficulty predicting unit 15, configured to predict a singing difficulty of the target audio data according to the range difficulty coefficient, the treble difficulty coefficient, and the bass difficulty coefficient.

Referring to fig. 5, the data obtaining unit 11 according to the embodiment of the present application may include: a data acquisition subunit 111, a state detection subunit 112, and a data preprocessing subunit 113;

a data obtaining subunit 111, configured to obtain audio data to be processed from the audio set;

a state detection subunit 112, configured to detect state information of the audio data to be processed according to the data volume and the audio features of the audio data to be processed;

and the data preprocessing subunit 113 is configured to, if the state information of the audio data to be processed is a normal state, perform preprocessing on the audio data to be processed to obtain target audio data corresponding to the audio data to be processed.

The data obtaining subunit 111 is specifically configured to:

The data preprocessing subunit 113 is specifically configured to:

The first coefficient obtaining unit 13 is specifically configured to:

Referring to fig. 5, the second coefficient obtaining unit 14 according to the embodiment of the present disclosure may include: a note distinguishing subunit 141, a second coefficient acquisition subunit 142;

a note region sub-unit 141 for distinguishing a high-pitch symbol and a low-pitch symbol in the target audio data according to the note symbols;

a second coefficient obtaining subunit 142, configured to determine a difficulty coefficient of a high-pitched tone symbol in the target audio data according to a preset corresponding relationship between the high-pitched tone symbol and the difficulty coefficient, and determine a difficulty coefficient of a low-pitched tone symbol in the target audio data according to a preset corresponding relationship between the low-pitched tone symbol and the difficulty coefficient;

The note region subunit 141 is specifically configured to:

The difficulty prediction unit 15 is specifically configured to:

Referring to fig. 6, a schematic structural diagram of a computer device is provided in an embodiment of the present application. As shown in fig. 6, the computer apparatus 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may also include a standard wired interface or a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The Memory 1005 may be a Random Access Memory (RAM) or a non-volatile Memory (NVM), such as at least one disk Memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 6, the memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a data processing application program.

In the computer apparatus 1000 shown in fig. 6, a network interface 1004 may provide a network communication function, and a user interface 1003 is mainly used as an interface for providing input for a user; the processor 1001 may be configured to call a data processing application stored in the memory 1005, so as to implement the description of the data processing method in the embodiment corresponding to any one of fig. 2 to fig. 4, which is not described herein again.

It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the data processing method in the embodiment corresponding to any one of fig. 2 to fig. 4, and may also perform the description of the data processing device in the embodiment corresponding to fig. 5, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer-readable storage medium, where a computer program executed by the aforementioned data processing apparatus is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the data processing method in any one of the embodiments corresponding to fig. 2 to fig. 4 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. As an example, program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network, which may comprise a block chain system.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, an NVM or a RAM.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A data processing method, comprising:

acquiring target audio data from an audio set;

2. The method of claim 1, wherein the obtaining target audio data from the audio set comprises:

acquiring audio data to be processed from an audio set;

3. The method of claim 2, wherein the detecting the state information of the audio data to be processed according to the data volume and the audio features of the audio data to be processed comprises:

4. The method according to claim 2, wherein the preprocessing the audio data to be processed to obtain target audio data corresponding to the audio data to be processed comprises:

5. The method of claim 1, wherein generating the range difficulty factor for the target audio data from the phonetic symbols comprises:

6. The method of claim 1, wherein generating the high-pitch difficulty coefficient and the low-pitch difficulty coefficient of the target audio data according to the note number comprises:

7. The method of claim 6, wherein said distinguishing a high-tone symbol and a low-tone symbol of the target audio data according to the tone symbol comprises:

8. The method of claim 1, wherein predicting the singing difficulty corresponding to the target audio data according to the register difficulty coefficient, the treble difficulty coefficient and the bass difficulty coefficient comprises:

9. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any of claims 1-8.

10. A computer device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-8.