CN111950266A

CN111950266A - Data processing method and device and data processing device

Info

Publication number: CN111950266A
Application number: CN201910365352.XA
Authority: CN
Inventors: 黎明超; 韩秦; 李茜; 李瑞星; 郑亚鑫; 葛晓娟
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2020-11-17

Abstract

The embodiment of the invention provides a data processing method and device and a device for data processing. The method specifically comprises the following steps: determining the word segmentation corresponding to the text and the attribute of the word segmentation aiming at the text corresponding to the first audio data; the attributes include: a time attribute, a location attribute, and a language attribute; processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word, so as to obtain a second audio unit corresponding to the word; the processing of the first audio unit comprises: repeating at least one of processing, stretching processing, frequency processing, and channel processing; and obtaining second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation. The embodiment of the invention can change the expression form of the word segmentation in the text, thereby enhancing the entertainment effect of the audio data.

Description

Data processing method and device and data processing device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and an apparatus for data processing.

Background

With the development of computer technology, terminals such as mobile phones and tablet computers are widely used, and functions provided by corresponding terminals are more and more abundant, and common functions include: multimedia functions, instant messaging functions, and the like.

Taking the multimedia function as an example, the terminal may support not only the playing of the audio but also the recording of the audio, for example, the terminal may record a song sung by the user so as to facilitate the subsequent playing of the song sung by the user.

Currently, audio recorded by a terminal is often expressed in an old and tedious form, and thus cannot meet the specific entertainment requirements of the user.

Disclosure of Invention

The embodiment of the invention provides a data processing method and device and a data processing device, which can change the expression form of word segmentation in a text so as to enhance the entertainment effect of audio data.

In order to solve the above problem, an embodiment of the present invention discloses a data processing method, including:

determining the word segmentation corresponding to the text and the attribute of the word segmentation aiming at the text corresponding to the first audio data; the attributes include: a time attribute, a location attribute, and a language attribute;

processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word, so as to obtain a second audio unit corresponding to the word; the processing of the first audio unit comprises: repeating at least one of processing, stretching processing, frequency processing, and channel processing;

and obtaining second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation.

On the other hand, the embodiment of the invention discloses a data processing device, which comprises:

the word segmentation and attribute determination module is used for determining the word segmentation and the attribute of the word segmentation corresponding to the text aiming at the text corresponding to the first audio data; the attributes include: a time attribute, a location attribute, and a language attribute;

the audio unit processing module is used for processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word, so as to obtain a second audio unit corresponding to the word; the processing of the first audio unit comprises: repeating at least one of processing, stretching processing, frequency processing, and channel processing; and

and the second audio data determining module is used for obtaining second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation.

In yet another aspect, an embodiment of the present invention discloses an apparatus for data processing, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors includes instructions for:

In yet another aspect, an embodiment of the invention discloses a machine-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

the embodiment of the invention executes at least one of the following processing to the first audio unit corresponding to the word segmentation in the text corresponding to the first audio data: repeating the processing, stretching processing, frequency processing and sound channel processing, changing the expression form of the word segmentation in the text, and further enhancing the entertainment effect of the audio data.

Furthermore, in the embodiment of the present invention, in the processing process of the first audio unit, the attribute of the participle is considered, and the attribute may include: the time attribute, the position attribute and the language attribute can improve the matching degree between the processing process of the first audio unit and the linguistic rule, and further can enhance the processing effect of the first audio unit.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic representation of an application environment for a data processing method of an embodiment of the present invention;

FIG. 2 is a flow chart of steps of a first embodiment of a data processing method of the present invention;

FIG. 3 is a flowchart illustrating steps of a second embodiment of a data processing method according to the present invention;

FIG. 4 is a block diagram of an embodiment of a data processing apparatus of the present invention;

FIG. 5 is a block diagram of an apparatus 800 for data processing of the present invention; and

fig. 6 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Aiming at the technical problems of old and dull effect of the common style of the audio recorded by the terminal in the related technology, the embodiment of the invention provides a data processing scheme, which specifically comprises the following steps: determining the word segmentation corresponding to the text and the attribute of the word segmentation aiming at the text corresponding to the first audio data; the attributes include: a time attribute, a location attribute, and a language attribute; processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word, so as to obtain a second audio unit corresponding to the word; the processing of the first audio unit comprises: repeating at least one of processing, stretching processing, frequency processing, and channel processing; and obtaining second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation.

In this embodiment of the present invention, the audio frequency may refer to a sound wave with a sound frequency that can be heard by human ears within a preset frequency range, where the preset frequency range may include: 20Hz to 20 kHz.

The audio data may refer to sound wave data. The audio data may include: analog audio data, or digital audio data. Wherein digital audio data may be used to represent audio data in binary form. Conversion may be performed between digital audio data and analog audio data, for example, an audio file as analog audio data may be converted into digital audio data; alternatively, analog audio data converted from digital audio data may be output to an audio output device, and the audio output device may play the analog audio data.

In order to enhance the entertainment effect of the audio data, the embodiment of the invention processes the first audio data to obtain the second audio data. The granularity of processing the first audio data is specifically as follows: and processing the first audio unit by the first audio unit corresponding to the word segmentation of the text in the first audio data to obtain a processed second audio unit, and further obtaining second audio data according to the second audio unit.

The processing of the first audio unit may specifically include: at least one of the repetition process, the stretch process, the frequency process, and the vocal tract process.

The repeated processing can be used for repeating the first audio unit corresponding to a word segmentation for at least one time so as to change the expression form of the word segmentation and further enhance the entertainment effect of the audio data. For example, the text a corresponding to the first audio data includes: "Are you ok", which is repeated for the first audio unit corresponding to any of the participles in text a to provide a novel form of expression.

The stretching process may be used to stretch the duration information of the first audio unit corresponding to a certain word segmentation, that is, the duration information of the first audio unit may be increased. Because the expression form of the word segmentation can be changed by adding the duration information, the entertainment effect of the audio data can be enhanced.

The frequency processing may be used to perform frequency adjustment on the duration information of the first audio unit corresponding to a certain word segmentation. Frequency is the number of sound waves passing a given point per second and may be in hertz. The frequency may affect the pitch, which may refer to the pitch of the sound. The frequency processing can adjust the tone of the first audio unit, so that the expression form of the word segmentation can be changed, and the entertainment effect of the audio data can be enhanced.

The channel processing may be used to set channel parameters of the first audio unit. The sound channel may refer to a propagation channel of sound, and the sound channel may reflect propagation changes of sound in space, so that the playing effect of sound may be affected. The above-described channel processing can achieve a stereo sound change effect. The sound channel of the embodiment of the invention can comprise: at least two sound channels, the sound propagation between different sound channels can realize the stereo sound change effect. For example, the channels may include: the stereo sound system comprises a left sound channel and a right sound channel, wherein the left sound channel and the right sound channel can respectively play the same or different sounds to generate a stereo sound change effect from left to right or from right to left and the like. In addition to the left and right channels, the channels may include: center channel, rear channel, etc.

To sum up, the embodiment of the present invention performs at least one of the following processes on the first audio unit corresponding to the branch word: in the repetition processing, the stretching processing, the frequency processing and the vocal tract processing, the expression form of the word segmentation can be changed, and the entertainment effect of the audio data is enhanced.

The applicable scenes of the embodiment of the invention can comprise: multimedia scenes, etc. The first audio data may correspond to multimedia content, and the multimedia content may include: audio or video, etc. The embodiment of the invention processes the first audio data to obtain the second audio data, can change the expression form of the word segmentation corresponding to the first audio data, and further enhances the entertainment effect of the audio data.

The data processing method provided by the embodiment of the present invention can be applied to the application environment shown in fig. 1, as shown in fig. 1, the client 100 and the server 200 are located in a wired or wireless network, and the client 100 and the server 200 perform data interaction through the wired or wireless network.

Optionally, the client 100 may run on a device, for example, the client 100 may be an APP (Application program) running on the device, such as a multimedia processing APP, an instant messaging APP, an input method APP, or an APP carried by an operating system, and the embodiment of the present invention does not limit a specific APP corresponding to the client. Alternatively, the client 100 described above may provide a question-and-answer function that can quickly provide answers in response to an operation by the user.

Optionally, the device may include a screen, the screen may be used to display content, and the content may include: UI (User Interface), etc. The above devices may specifically include but are not limited to: smart phones, tablet computers, electronic book readers, MP3 (Moving Picture Experts Group Audio Layer III) players, MP4 (Moving Picture Experts Group Audio Layer IV) players, laptop portable computers, car-mounted computers, desktop computers, set-top boxes, smart televisions, wearable devices, smart speakers, and the like. It is to be understood that the embodiments of the present invention are not limited to the specific devices.

In an embodiment of the present invention, a client may receive first audio data uploaded or recorded by a user, and provide second audio data corresponding to the first audio data to the user, where the first audio data may be processed by the client or a server.

Method embodiment one

Referring to fig. 2, a flowchart illustrating steps of a first embodiment of a data processing method according to the present invention is shown, which may specifically include the following steps:

step 201, determining word segmentation corresponding to a text and the attribute of the word segmentation for the text corresponding to first audio data; the attributes may specifically include: a time attribute, a location attribute, and a language attribute;

step 202, processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word, so as to obtain a second audio unit corresponding to the word; the processing of the first audio unit may specifically include: repeating at least one of processing, stretching processing, frequency processing, and channel processing;

and 203, obtaining second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation.

At least one step of the method embodiment shown in fig. 2 may be executed by a client or a server, and it is to be understood that the embodiment of the present invention does not limit the specific execution subject of the method embodiment shown in fig. 2.

In step 201, the first audio data may be audio data uploaded by a user, or the first audio data may be audio data recorded by the user, or the first audio data may be audio data stored in a multimedia library. It should be noted that the corresponding expression form of the first audio data may include: the speaking form and/or the singing form, for example, the user may speak a piece of text, or the user may sing a piece of text, and the embodiment of the present invention does not impose a limitation on the specific first audio data.

The embodiment of the invention can adopt a voice recognition technology to determine the text corresponding to the first audio data. If the first audio data is marked as S, a series of processing is carried out on S to obtain a corresponding voice characteristic sequence O, and the voice characteristic sequence O is marked as O ═ { O }₁，O₂，…，O_i，…，O_TIn which O is_iIs the ith (i is a natural number) speech feature, and T is the total number of speech features. The sentence corresponding to the first audio data S can be regarded as a word string composed of a plurality of words, and is denoted by W ═ W₁，w₂，…，w_n}. The process of speech recognition is to find the most probable word string W 'according to the known speech feature sequence O, where W' may be used to represent the speech recognition result corresponding to the first audio data, i.e. the text corresponding to the first audio data, and n may be a natural number.

Specifically, the speech recognition is a model matching process, in which a speech model is first established according to the speech characteristics of a person, and a template required for the speech recognition is established by extracting required features through analysis of input first audio data; the process of recognizing the first audio data of the user is a process of comparing the characteristics of the first audio data with the template, and finally determining the characteristics and the matched optimal template, thereby obtaining a voice recognition result. The specific speech recognition algorithm may adopt a training and recognition algorithm based on a statistical hidden markov model, or may adopt other algorithms such as a training and recognition algorithm based on a neural network, a recognition algorithm based on dynamic time warping matching, and the like.

In practical application, a word segmentation method can be adopted to determine the word segmentation corresponding to the text, the word segmentation method can segment a sentence into a single word, and the process is a process of recombining continuous sentences into a word sequence according to a certain standard. The word segmentation method may specifically include: the word segmentation method based on character string matching, the word segmentation method based on understanding, the word segmentation method based on statistics and the like can be understood, and the embodiment of the invention does not limit the specific word segmentation method.

The attributes of the participles may include: a time attribute, a location attribute, and a language attribute.

Wherein the time attribute may be used to characterize the time taken to express the participle. The time attribute may be characterized by duration information of the first audio unit corresponding to the word segmentation. It is understood that the duration information of the first audio unit may include: the whole word segmentation corresponds to the duration information of the first audio unit, or the partial syllables of the word segmentation correspond to the duration information of the first audio unit. For example, a part of the syllable for "ok" may include: the syllable corresponding to "o" or the syllable corresponding to "k".

The position attribute may refer to a position of the participle in the sentence, such as a beginning position of the sentence, a position in the sentence, or an end position of the sentence.

The language attribute may reflect the rules of linguistics. The language attributes include: part of speech, and/or sentence component. Parts of speech may include: verbs, nouns, or moods, etc. The sentence components may include: subjects, predicates, objects, and the like. Of course, parts of speech and sentence components are only used as alternative examples of the language attribute, and actually, those skilled in the art may adopt other language attributes according to the actual application requirements, for example, the language attributes may also include: entity attribute, etc., wherein an entity may refer to things that exist objectively and are distinguishable from each other, and an entity may be a specific person, thing, or an abstract concept or connection.

In step 202, the first audio data may refer to audio data corresponding to a complete text; the first audio unit may be a portion of the first audio data, i.e., a portion of the audio data corresponding to the word in the text in the first audio data.

According to the embodiment of the invention, the first audio unit corresponding to the word in the first audio data can be processed according to the attribute of the word. In practical application, the processing can be performed on all the participles or on the first audio units corresponding to part of the participles. The first audio unit corresponding to a word may not be processed, or the first audio unit corresponding to a word may correspond to one or more kinds of processing.

In an optional embodiment of the present invention, the step 202 of processing the first audio unit corresponding to the word segmentation in the first audio data may specifically include: determining target word segmentation according to the attribute of the word segmentation; the attribute corresponding to the target word segmentation accords with a preset condition; and processing a first audio unit corresponding to the target word segmentation in the first audio data.

The embodiment of the invention can determine the target word segmentation with the attribute meeting the preset condition and process the first audio unit corresponding to the target word segmentation.

Alternatively, different processes may correspond to the same or different preset conditions. Taking the repeat process or the stretch process as an example, the corresponding preset condition may include at least one of the following properties: the position of the beginning of the sentence, the position of the end of the sentence, the language and the word, the subject and the like.

The repetition process may be used to repeat the first audio unit corresponding to a certain word segmentation for at least one time, and the specific number of repetitions may be determined by those skilled in the art according to the actual application requirements.

The stretching process may be used to stretch the duration information of the first audio unit corresponding to a certain word segmentation. The information of the stretched time length can be determined by those skilled in the art according to the requirements of practical application. Alternatively, the stretching process may update the duration information to the preset duration value, or the stretching process may update the duration information to a preset multiple of the original duration value, such as 1.5 times, 2 times, 3 times, and the like.

Alternatively, the time length information of the entire word segmentation corresponding to the first audio unit may be stretched. For example, the entire participle "we", or the entire participle "ok" is stretched corresponding to the duration information of the first audio unit.

Optionally, the time length information of the part of the syllable corresponding to the participle corresponding to the first audio unit may be stretched. For example, a part of the syllable for "ok" may include: the syllable corresponding to "o" or the syllable corresponding to "k" may be stretched for the duration information of the syllable corresponding to "o" of "ok". Taking the word segmentation as "we" as an example, the time length information of the syllable corresponding to "me" of "we" can be stretched.

Alternatively, the preset duration value may be obtained according to the note duration value. The note types may include: full notes, quarter notes, eighth notes, sixteenth notes, thirty-second notes, and the like. The note duration, also referred to as note value or pitch value, is used to express the relative duration between notes. The duration of one full note is equal to the duration of two half notes; the duration of one full note is equal to the duration of four quarter notes and the duration of one full note is equal to the duration of eight octants; one full note has a duration equal to sixteen sixteenth notes and one full note has a duration equal to thirty-two thirty-second notes. The predetermined duration value may be a multiple of the shortest note in the second audio data.

Alternatively, the frequency processing may correspond to all the participles, that is, the frequency processing may be performed for the first audio units corresponding to all the participles. Frequency processing may be used to turn up or down the frequency.

Optionally, the channel processing may correspond to all the participles, that is, the channel processing may be performed on the first audio units corresponding to all the participles. The channel processing may be used to set channel parameters of the first audio unit. The channel parameters may include: left channel parameters, or right channel parameters, or center channel parameters, or back channel parameters, etc.

Optionally, adjacent participles correspond to different vocal tract parameters. Different sound channel parameters can generate the change of sound in space, and further, the stereo sound change effect can be realized. The text is assumed to include in order: the participle 1, the participle 2, the participle … i, the participle … n, i, n can be natural numbers, and the participle i and the participle (i +1) can correspond to different vocal tract parameters.

Optionally, the second audio unit and the background music corresponding to the same participle correspond to different vocal tract parameters. Background music (BGM), also called companion music and side music. The background music may include: instrumental performance accompanied by setback singing. The background music may provide a tempo and a key, and the effect thereof may include at least one of the following effects: help singers to master rhythm; arouse the interest of audiences and improve the participation; to a certain extent, set off the atmosphere of the music.

For the same word segmentation, the corresponding second audio unit and the corresponding background music can correspond to different sound channel parameters, so that the change of sound in space can be generated, and the change effect of stereo sound can be further realized.

Of course, it is only an alternative embodiment that adjacent segmented words correspond to different vocal tract parameters, and in fact, adjacent segmented words may correspond to the same vocal tract parameters. Similarly, the second audio unit corresponding to the same word segmentation and the background music corresponding to different sound channel parameters are only used as an optional embodiment, and actually, the second audio unit corresponding to the same word segmentation and the background music may correspond to the same sound channel parameters.

In an application example of the present invention, the text a corresponding to the first audio data includes: "Are you ok", the following processing can be performed on the first audio unit corresponding to the participle in the text a to change the expression form of the participle in the text:

for example, the segmentation may be repeated twice for a first audio unit corresponding to the segmentation "Are", and a corresponding second audio unit corresponding to the segmentation "Are Are Are"; for another example, the stretching process may be performed on the first audio unit corresponding to the word "you", for example, the duration information is updated to 2 times of the original duration value; for another example, the frequency of the first audio unit corresponding to any participle can be adjusted up or down; alternatively, the channel parameters of the first audio unit corresponding to any participle may be set.

It should be noted that, in the case of performing multiple kinds of processing on the first audio unit corresponding to the word segmentation, the order between the multiple kinds of processing of the first audio unit is not limited in the embodiment of the present invention. For example, the order between the various processes of the first audio unit may include: the repetition process, the stretch process, the frequency process, the channel process, and the like.

Step 202 is to process the first audio unit corresponding to the word segmentation in the first audio data, and may specifically include: and processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word and the background music corresponding to the word so as to enable an obtained second audio unit to be matched with the background music.

Alternatively, the background music may correspond to a note file such as MIDI (Musical Instrument Digital Interface), the note file may record corresponding numbers for the notes of the background music, and the note file may be a binary file.

According to the embodiment of the invention, the first audio unit can be processed according to the attribute of the word segmentation and the background music corresponding to the word segmentation, so that the obtained second audio unit can be matched with the background music, and the matching degree between the second audio unit and the background music can be further increased.

Alternatively, the second audio unit may correspond to a first frequency sequence and the background music may correspond to a second frequency sequence, the first frequency sequence matching the second frequency sequence.

A note typically corresponds to a particular frequency. For example, the corresponding frequencies of notes in key C are as follows: do corresponds to 261.5HZ, re corresponds to 293.5HZ, mi corresponds to 329.5HZ, fa corresponds to 349HZ, so corresponds to 392HZ, la corresponds to 440HZ, xi corresponds to 494HZ, and the like. Therefore, the embodiment of the invention can determine the second frequency sequence according to the mapping relationship between the notes and the frequencies and the note file corresponding to the background music. The second frequency sequence may be arranged in time order from first to last, with the frequencies corresponding to the notes being arranged. For example, the second frequency sequence may sequentially include: f1, f2, f3, f4, and the like.

Optionally, the matching of the first frequency sequence and the second frequency sequence may include: the first frequency sequence matches the total time value of the second frequency sequence. Optionally, the matching of the first frequency sequence and the second frequency sequence may include: the first frequency sequence matches the note type of the second frequency sequence. Optionally, the matching of the first frequency sequence and the second frequency sequence may include: the first frequency sequence matches the frequency of the second frequency sequence at the same location, and so on.

For example, the frequency sequence a sequentially includes: f1, f2, f3 and f4, wherein the frequency sequence B sequentially comprises: f1, f2, f3 and f4, wherein the frequency sequence C sequentially comprises: f1, f2, f3, f 5; then frequency series a matches frequency series B and frequency series a does not match frequency series C.

In the embodiment of the invention, the background music corresponding to different participles can be mutually independent. A plurality of types of background music may be pre-stored by the background music library, and the information of the background music may include: duration information, note type, frequency sequence, etc. Optionally, the background music corresponding to the word segmentation may be obtained from the background music library according to the attribute of the word segmentation. The attribute of the word may be matched with the information of the background music, for example, the time attribute of the word is matched with the duration information of the background music, and the like.

In the embodiment of the invention, the background music corresponding to different participles can be correlated with each other. For example, the background music corresponding to a plurality of adjacent participles is correlated, for example, a plurality of participles in a sentence respectively correspond to a background music piece in a background music; alternatively, a plurality of sentences in a piece of text respectively correspond to pieces of background music in a piece of background music, and so on. It can be understood that, those skilled in the art can determine the background music corresponding to the word segmentation in the text according to the actual application requirements, and the embodiment of the present invention does not limit the specific background music.

In step 203, second audio units corresponding to the multiple word segments may be fused according to the position attributes of the word segments, so as to obtain second audio data.

The second audio unit may characterize the processing results of the first audio unit. It can be understood that if the first audio unit is not processed, step 203 may fuse the second audio units corresponding to the multiple participles and the first audio units corresponding to the participles according to the position attribute of the participles to obtain second audio data.

In an optional embodiment of the present invention, the step 203 obtaining second audio data according to the position attribute of the word segmentation and the second audio unit corresponding to the word segmentation specifically may include: mixing a second audio unit corresponding to the word segmentation and background music corresponding to the word segmentation to obtain mixed audio data; and fusing the mixed audio data corresponding to the multiple word segments according to the position attributes of the word segments to obtain second audio data. The embodiment of the invention can mix the second audio unit and the corresponding background music, and then fuse the mixed audio data corresponding to a plurality of participles according to the position attribute of the participles.

It is to be understood that, in other embodiments of the present invention, the second audio units corresponding to the multiple participles may be fused according to the position attributes of the participles to obtain fused audio data, and then the fused audio data is mixed with the background music to obtain the second video data.

In practical applications, the second audio data may be output. For example, the server may output the second audio data to the client, or the client may output the second audio data to the user, and so on.

To sum up, the data processing method according to the embodiment of the present invention performs at least one of the following processes on a first audio unit corresponding to a word in a text corresponding to first audio data: the expression form of the word segmentation in the text can be changed in the repeated processing, the stretching processing, the frequency processing and the vocal tract processing, so that the entertainment effect of the audio data is enhanced.

Method embodiment two

Referring to fig. 3, a flowchart illustrating steps of a second embodiment of the data processing method of the present invention is shown, which may specifically include the following steps:

step 301, determining a word segmentation corresponding to a text and an attribute of the word segmentation for the text corresponding to first audio data; the attributes may specifically include: a time attribute, a location attribute, and a language attribute;

step 302, according to the attribute of the word segmentation, performing repeated processing and stretching processing on a first audio unit corresponding to the word segmentation in the first audio data to obtain a second audio unit A corresponding to the word segmentation;

step 303, according to the time attribute of the word segmentation and the information of the background music, performing frequency processing and sound channel processing on the second audio unit a corresponding to the word segmentation to obtain a second audio unit B corresponding to the word segmentation;

step 304, mixing the second audio unit B corresponding to the word segmentation and the background music corresponding to the word segmentation to obtain mixed audio data;

and 305, fusing the mixed audio data corresponding to the multiple participles according to the position attributes of the participles to obtain second audio data.

In step 302, the repetition process may repeat the first audio unit a preset number of times, for example, repeat "Are" in "Are you ok" 2 times, and so on. The stretching process may add duration information of the first audio unit, for example, an original duration value of "you" in "Are you ok" is t, a target duration value after the stretching process is 2t, and the like. It is to be understood that the order of execution of the repetition process and the stretching process is not limited in the embodiments of the present invention, and they may be executed sequentially or sequentially.

In step 303, the background music may be used as a basis for frequency processing, and the embodiment of the invention may adjust the frequency of the second audio unit a according to the frequency of the background music, so that the frequency of the second audio unit a matches the frequency of the background music. Alternatively, the second audio unit may correspond to a first frequency sequence and the background music may correspond to a second frequency sequence, the first frequency sequence matching the second frequency sequence.

In summary, the data processing method according to the embodiment of the present invention performs frequency processing on the second audio unit a corresponding to the segmented word by using the tempo and the tone provided by the background music, so that the matching degree between the second audio unit B and the background music can be improved, the interest of the audience can be stimulated, and the atmosphere of the music can be emphasized to a certain extent.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 4, a block diagram of a data processing apparatus according to an embodiment of the present invention is shown, which may specifically include:

a word segmentation and attribute determination module 401, configured to determine, for a text corresponding to first audio data, a word segmentation corresponding to the text and an attribute of the word segmentation; the attributes include: a time attribute, a location attribute, and a language attribute;

an audio unit processing module 402, configured to process, according to the attribute of the word segmentation, a first audio unit corresponding to the word segmentation in the first audio data to obtain a second audio unit corresponding to the word segmentation; the processing of the first audio unit comprises: repeating at least one of processing, stretching processing, frequency processing, and channel processing; and

a second audio data determining module 403, configured to obtain second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for data processing, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for: .

Fig. 5 is a block diagram illustrating an apparatus 800 for data processing in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice data processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency data processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 6 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the data processing method shown in fig. 2 or fig. 3 or fig. 4.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a data processing method, the method comprising: determining the word segmentation corresponding to the text and the attribute of the word segmentation aiming at the text corresponding to the first audio data; the attributes include: a time attribute, a location attribute, and a language attribute; processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word, so as to obtain a second audio unit corresponding to the word; the processing of the first audio unit comprises: repeating at least one of processing, stretching processing, frequency processing, and channel processing; and obtaining second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation.

The embodiment of the invention discloses A1 and a data processing method, wherein the method comprises the following steps:

A2, the method of A1, the language attribute comprising: part of speech, and/or sentence component.

A3, the processing of the corresponding first audio unit of the participle in the first audio data according to the method of A1, comprising:

determining target word segmentation according to the attribute of the word segmentation; the attribute corresponding to the target word segmentation accords with a preset condition;

and processing a first audio unit corresponding to the target word segmentation in the first audio data.

A4, the processing of the corresponding first audio unit of the participle in the first audio data according to the method of A1, comprising:

and processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word and the background music corresponding to the word so as to enable an obtained second audio unit to be matched with the background music.

A5, according to the method of A4, the second audio unit corresponding to a first frequency sequence and the background music corresponding to a second frequency sequence, the first frequency sequence matching the second frequency sequence.

A6, according to the method of A4, obtaining second audio data according to the position attribute of the participle and a second audio unit corresponding to the participle, including:

mixing a second audio unit corresponding to the word segmentation and background music corresponding to the word segmentation to obtain mixed audio data;

and fusing the mixed audio data corresponding to the multiple word segments according to the position attributes of the word segments to obtain second audio data.

A7, according to the method A4, the adjacent participles correspond to different vocal tract parameters; and/or the second audio unit and the background music corresponding to the same participle correspond to different sound channel parameters.

The embodiment of the invention discloses B8 and a data processing device, wherein the device comprises:

B9, the apparatus of B8, the language attribute comprising: part of speech, and/or sentence component.

B10, the apparatus of B8, the audio unit processing module comprising:

the target word segmentation determining module is used for determining target word segmentation according to the attribute of the word segmentation; the attribute corresponding to the target word segmentation accords with a preset condition; and

and the target word segmentation audio processing module is used for processing a first audio unit corresponding to the target word segmentation in the first audio data.

B11, the apparatus of B8, the audio unit processing module comprising:

and the audio unit processing module is used for processing a first audio unit corresponding to the word in the first audio data according to the attribute of the word and the background music corresponding to the word, so that an obtained second audio unit is matched with the background music.

B12, the apparatus of B11, the second audio unit corresponding to a first sequence of frequencies and the background music corresponding to a second sequence of frequencies, the first sequence of frequencies matching the second sequence of frequencies.

B13, the apparatus of B11, the second audio data determination module comprising:

the mixing module is used for mixing the second audio unit corresponding to the word segmentation and the background music corresponding to the word segmentation to obtain mixed audio data; and

and the fusion module is used for fusing the mixed audio data corresponding to the multiple word segments according to the position attributes of the word segments so as to obtain second audio data.

B14, according to the device of B11, adjacent participles correspond to different vocal tract parameters; and/or the second audio unit and the background music corresponding to the same participle correspond to different sound channel parameters.

The embodiment of the invention discloses C15, an apparatus for data processing, the apparatus is applied to a server, the apparatus comprises a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors comprise instructions for:

C16, the apparatus of C15, the language attribute comprising: part of speech, and/or sentence component.

C17, the processing the corresponding first audio unit of the participle in the first audio data according to the apparatus of C15, comprising:

C18, the processing the corresponding first audio unit of the participle in the first audio data according to the apparatus of C15, comprising:

C19, the apparatus of C18, the second audio unit corresponding to a first sequence of frequencies, the background music corresponding to a second sequence of frequencies, the first sequence of frequencies matching the second sequence of frequencies.

C20, obtaining second audio data according to the position attribute of the word segmentation and the second audio unit corresponding to the word segmentation according to the apparatus of C18, including:

C21, according to the device of C18, adjacent participles correspond to different vocal tract parameters; and/or the second audio unit and the background music corresponding to the same participle correspond to different sound channel parameters.

Embodiments of the present invention disclose D22, a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of a 1-a 7.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The data processing method, the data processing apparatus and the apparatus for data processing provided by the present invention are described in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the description of the above embodiments is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of data processing, the method comprising:

2. The method of claim 1, wherein the language attribute comprises: part of speech, and/or sentence component.

3. The method of claim 1, wherein processing the corresponding first audio unit of the participle in the first audio data comprises:

4. The method of claim 1, wherein processing the corresponding first audio unit of the participle in the first audio data comprises:

5. The method of claim 4, wherein the second audio unit corresponds to a first frequency sequence and the background music corresponds to a second frequency sequence, and wherein the first frequency sequence matches the second frequency sequence.

6. The method of claim 4, wherein obtaining second audio data according to the position attribute of the word segmentation and a second audio unit corresponding to the word segmentation comprises:

7. The method of claim 4, wherein adjacent segments correspond to different vocal tract parameters; and/or the second audio unit and the background music corresponding to the same participle correspond to different sound channel parameters.

8. A data processing apparatus, characterized in that the apparatus comprises:

9. An apparatus for data processing, the apparatus being applied to a server, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs configured to be executed by the one or more processors comprise instructions for:

10. A machine-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a data processing method as claimed in one or more of claims 1 to 7.