CN108242238B - Audio file generation method and device and terminal equipment - Google Patents

Audio file generation method and device and terminal equipment Download PDF

Info

Publication number
CN108242238B
CN108242238B CN201810028134.2A CN201810028134A CN108242238B CN 108242238 B CN108242238 B CN 108242238B CN 201810028134 A CN201810028134 A CN 201810028134A CN 108242238 B CN108242238 B CN 108242238B
Authority
CN
China
Prior art keywords
sound
file
audio file
background music
text content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810028134.2A
Other languages
Chinese (zh)
Other versions
CN108242238A (en
Inventor
李丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN201810028134.2A priority Critical patent/CN108242238B/en
Publication of CN108242238A publication Critical patent/CN108242238A/en
Application granted granted Critical
Publication of CN108242238B publication Critical patent/CN108242238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/179Human faces, e.g. facial parts, sketches or expressions metadata assisted face recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/021Background music, e.g. for video sequences, elevator music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/155Musical effects

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to the technical field of audio processing, and discloses an audio file generation method, an audio file generation device and terminal equipment, wherein the method comprises the following steps: recording external human voice to obtain a sound file; converting the sound file into text content; performing semantic analysis on the text content to obtain emotional characteristics corresponding to the sound file; obtaining background music and sound effect which are matched with the sound file according to the emotional characteristics; adding background music and sound effects to the sound file to obtain an audio file; by implementing the embodiment of the invention, the background music and the sound effect are added to the sound file by combining the emotional characteristics of the sound file, so that the auditory perception of the sound file is improved.

Description

Audio file generation method and device and terminal equipment
Technical Field
The invention relates to the technical field of audio processing, in particular to an audio file generation method and device and terminal equipment.
Background
Recording software available on the market satisfies the music dream of many users, for example, a user can record a sound file through the recording software, then retouch the sound file, store the sound file into an audio file in a format such as a general MP3 format, and then play the sound file on other audio playing software.
However, recording software on the market only performs simple coloring on recorded sound files, such as noise removal, reverberation and the like, and the playing effect of the finally obtained audio file is monotonous and dry, and the auditory effect is poor.
Disclosure of Invention
The embodiment of the invention discloses an audio file generation method, an audio file generation device and terminal equipment, which are used for solving the technical problem of monotonous interference of the existing recorded audio file.
The invention discloses an audio file generating method in a first aspect, which comprises the following steps:
recording external human voice to obtain a sound file;
converting the sound file into text content;
performing semantic analysis on the text content to obtain emotional characteristics corresponding to the sound file;
obtaining background music and sound effect which are matched with the sound file according to the emotional characteristics;
and adding the background music and the sound effect to the sound file to obtain an audio file.
As an optional implementation manner, in the first aspect of the present invention, the method further includes:
periodically shooting a face image of a recording user in the process of recording the external voice to obtain a sound file;
analyzing the facial image to obtain the expression characteristics of the recording user;
acquiring emotion characteristics corresponding to the expression characteristics;
the method for acquiring the background music and the sound effect matched with the sound file by taking the emotional characteristics as the basis comprises the following steps:
and acquiring background music and sound effect which are matched with the sound file according to the emotional characteristics and the emotional characteristics.
As an optional implementation manner, in the first aspect of the present invention, the performing semantic analysis on the text content to obtain an emotion feature corresponding to the sound file includes:
identifying a sentence break point of the text content;
dividing the text content into a plurality of short sentences by taking the sentence break points as the basis;
analyzing the short sentence semantics of the short sentence or extracting short sentence keywords of the short sentence;
identifying the emotional characteristics of each short sentence according to the short sentence semantics or the short sentence keywords of each short sentence;
judging whether at least two continuous short sentences with the same emotional characteristics exist in the short sentences;
if the short sentences exist, the at least two short sentences are taken as one short sentence;
sequencing the emotional characteristics of each short sentence according to the position of the short sentence in the text content to obtain the emotional characteristics of the sound file;
the method for acquiring the background music and the sound effect matched with the sound file by taking the emotional characteristics as the basis comprises the following steps:
cutting the sound file into subfiles with a plurality of playing time lengths according to the short sentences of the text contents, wherein the subfiles correspond to the short sentences one by one;
sequentially acquiring background music and sound effects matched with the subfiles;
adding the background music and the sound effect to the sound file to obtain an audio file comprises:
and adding the matched background music and sound effect to each subfile in sequence to obtain the audio file.
As an alternative implementation manner, in the first aspect of the present invention, the converting the sound file into text content includes:
detecting a sound pause position in the sound file;
and converting the sound file into text content according to the sound pause position, wherein the sound pause position is used as a sentence break point of the text content.
As an optional implementation manner, in the first aspect of the present invention, after the adding the background music and the sound effect to the sound file and obtaining an audio file, the method further includes:
detecting whether a saving instruction for the audio file is received;
when the saving instruction is received, detecting whether a saving path aiming at the audio file is received or not;
when the saving path is received, the audio file and the generation time association of the audio file are saved in a storage area corresponding to the saving path;
and when the saving path is not received, saving the audio file and the generation time association of the audio file to a storage area corresponding to a default path.
A second aspect of the present invention discloses an audio file generating apparatus, which may include:
the recording unit is used for recording external voices to obtain sound files;
a conversion unit for converting the sound file into text content;
the first acquisition unit is used for carrying out semantic analysis on the text content to acquire emotional characteristics corresponding to the sound file;
the second acquisition unit is used for acquiring background music and sound effect which are matched with the sound file according to the emotional characteristics;
and the third acquisition unit is used for adding the background music and the sound effect to the sound file to obtain an audio file.
As an alternative embodiment, in the second aspect of the present invention, the apparatus further comprises:
the shooting unit is used for periodically shooting and recording the face image of the user in the process of recording the external voice by the recording unit to obtain the sound file;
the analysis unit is used for analyzing the facial image to obtain the expression characteristics of the recording user;
the emotion acquisition unit is used for acquiring emotion characteristics corresponding to the expression characteristics;
the second obtaining unit is used for obtaining the background music and the sound effect which are matched with the sound file according to the emotional characteristics, and the method specifically comprises the following steps:
and the second acquisition unit is used for acquiring background music and sound effect which are matched with the sound file according to the emotional characteristic and the emotional characteristic.
As an optional implementation manner, in the second aspect of the present invention, the manner that the first obtaining unit is configured to perform semantic analysis on the text content to obtain the emotion feature corresponding to the sound file is specifically:
the first acquisition unit is used for identifying the sentence break points of the text content; dividing the text content into a plurality of short sentences by taking the sentence break points as the basis; analyzing the short sentence semantics of the short sentence or extracting short sentence keywords of the short sentence; identifying the emotional characteristics of each short sentence according to the short sentence semantics or the short sentence keywords of each short sentence; judging whether at least two continuous short sentences with the same emotional characteristics exist in the short sentences; and, if present, regarding the at least two phrases as one of the phrases; sequencing the emotional characteristics of each short sentence according to the position of the short sentence in the text content to obtain the emotional characteristics of the sound file;
the second obtaining unit is used for obtaining the background music and the sound effect which are matched with the sound file according to the emotional characteristics, and the method specifically comprises the following steps:
the second obtaining unit is configured to cut the sound file into subfiles with a plurality of playing durations according to the short sentence of the text content, where the subfiles correspond to the short sentence one to one; sequentially acquiring background music and sound effects matched with the subfiles;
the third obtaining unit is configured to add the background music and the sound effect to the sound file, and the manner of obtaining the audio file specifically includes:
and the third acquisition unit is used for sequentially adding the matched background music and sound effect to each subfile to acquire the audio file.
As an alternative embodiment, in the second aspect of the present invention, the conversion unit includes:
a position detection unit for detecting a sound pause position in the sound file;
and the text conversion unit is used for converting the sound file into text contents according to the sound pause positions, and the sound pause positions are used as sentence break points of the text contents.
As an alternative embodiment, in the second aspect of the present invention, the apparatus further comprises:
the instruction detection unit is used for detecting whether a storage instruction for the audio file is received or not after the third acquisition unit adds the background music and the sound effect to the sound file to obtain the audio file;
a path detection unit, configured to detect whether a saving path for the audio file is received when the instruction detection unit receives the saving instruction;
the first saving unit is used for saving the audio file and the generation time association of the audio file to a storage area corresponding to the saving path when the path detection unit receives the saving path;
and the second saving unit is used for saving the audio file and the generation time association of the audio file to a storage area corresponding to a default path when the path detection unit does not receive the saving path.
A third aspect of the present invention discloses a terminal device, which may include:
an audio file generating apparatus according to a second aspect of the present invention.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, after recording external voice to obtain a voice file, firstly converting the voice file into text content, then carrying out semantic analysis on the text content to obtain the emotional characteristics corresponding to the voice file, taking the emotional characteristics as the basis, obtaining the background music and the sound effect which are matched with the voice file, and then adding the background music and the sound effect into the voice file to obtain the audio file; it can be seen that, by the embodiment of the invention, when the audio file is produced, after the audio file is recorded, the audio file can be converted into the text content, then the background music for enriching and enhancing the feeling and the sound effect for color rendering can be obtained by analyzing the semantics of the text content, the audio file is processed, and the processed audio file has stronger auditory feeling and better effect.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart illustrating an audio file generating method according to an embodiment of the present invention;
FIG. 2 is another schematic flow chart of an audio file generating method according to an embodiment of the disclosure;
FIG. 3 is another schematic flow chart of a method for generating an audio file according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an audio file generating apparatus according to an embodiment of the disclosure;
FIG. 5 is a schematic structural diagram of an audio file generating apparatus according to an embodiment of the disclosure;
FIG. 6 is a schematic structural diagram of an audio file generating apparatus according to an embodiment of the disclosure;
FIG. 7 is a schematic diagram of another structure of an audio file generating apparatus according to an embodiment of the disclosure;
fig. 8 is a schematic structural diagram of a terminal device disclosed in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first", "second", "third", and the like in the description and the claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses an audio file generation method, which is used for improving the auditory perception of an audio file so as to obtain a more vivid auditory effect. The embodiment of the invention also correspondingly discloses an audio file generating device and terminal equipment.
The technical solution of the present invention will be described in detail with reference to the specific embodiments.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of an audio file generating method according to an embodiment of the present invention; as shown in fig. 1, an audio file generation method may include:
101. an external human voice is recorded to obtain a sound file.
The execution main body of the embodiment of the invention is terminal equipment, recording software is arranged in the terminal equipment, and the technical scheme of the invention is realized based on the recording software. Further, a microphone built in the terminal device is used for recording external human voice to obtain a sound file.
As an optional implementation manner, recording an external human voice to obtain a sound file specifically includes: recording external voice to obtain an initial sound file; and carrying out noise filtration on the initial sound file to obtain the sound file. It can be understood that when recording the voice, it is not possible to avoid mixing other noises, such as the speaking voice of other people, the car sound around, the noise generated by the terminal device, etc., and in this embodiment, the noise filtering is performed on the recorded initial voice file to obtain a voice file with purer voice.
102. The sound file is converted into text content.
As an alternative embodiment, the sound file is converted sentence by sentence according to each sentence of speech.
103. And carrying out semantic analysis on the text content to obtain the emotional characteristics corresponding to the sound file.
Semantic analysis is carried out, and emotions which the sound file wants to express, such as sadness, happiness, excitement and the like, can be analyzed semantically.
104. And obtaining background music and sound effect which are matched with the sound file according to the emotional characteristics.
The terminal equipment stores background music and sound effects corresponding to various emotional characteristics, the background music, the sound effects and the emotional characteristics are stored in an associated mode, a plurality of background music and a plurality of sound effects associated with the emotional characteristics are found according to the association relation, then at least one background music is determined from the plurality of background music, and at least one sound effect is determined from the plurality of sound effects.
105. And adding the background music and the sound effect to the sound file to obtain an audio file.
The format of the audio file includes but is not limited to digital audio formats such as WAV, MP3, WMA, etc., and the audio file can be exported from the terminal device and then imported into any playing software supporting these digital audio formats for playing.
As an optional implementation, adding the background music and the sound effect to the sound file, and obtaining the audio file specifically includes: synthesizing the sound file and the background music into an initial audio file; and adding the sound effect to the initial audio file to obtain the audio file.
Further, the synthesizing of the initial audio file by the sound file and the background music specifically includes: and adding background music on the playing time track of the sound file to ensure that the starting point and the end point of the playing time of the background music are consistent with the sound file.
Or, as another optional implementation, adding the background music and the sound effect to the sound file, and obtaining the audio file specifically includes: the sound effect is added to the sound file to obtain an initial sound file, and then the initial sound file and the background music are synthesized into an audio file.
In some implementable modes, adding the background music and the sound effect to the sound file, and detecting whether a storage instruction for the audio file is received or not after the audio file is obtained; when the saving instruction is received, whether a saving path aiming at the audio file is received or not is detected; when the saving path is received, the audio file and the generation time association of the audio file are saved in a storage area corresponding to the saving path; and when the saving path is not received, the audio file and the generation time association of the audio file are saved in a storage area corresponding to the default path. In this embodiment, in the terminal device, a storage area is set by default for saving the audio file, and the storage area corresponds to a path, which is referred to as a default path in the embodiment of the present invention, and meanwhile, the user may also select the storage area, and if the user reselects the storage area, the storage area is saved into the storage area, and if the user adopts the default storage area, the audio file is saved into the default storage area. In addition, when an audio file is saved, the generation time of the audio file will also be saved, and the saving name of the audio file may be determined by the user.
In the embodiment of the invention, after recording external voice to obtain a voice file, firstly converting the voice file into text content, then carrying out semantic analysis on the text content to obtain the emotional characteristics corresponding to the voice file, taking the emotional characteristics as the basis, obtaining the background music and the sound effect which are matched with the voice file, and then adding the background music and the sound effect into the voice file to obtain the audio file; it can be seen that, by the embodiment of the invention, when the audio file is produced, after the audio file is recorded, the audio file can be converted into the text content, then the background music for enriching and enhancing the feeling and the sound effect for color rendering can be obtained by analyzing the semantics of the text content, the audio file is processed, and the processed audio file has stronger auditory feeling and better effect.
Example two
Referring to fig. 2, fig. 2 is another schematic flow chart of an audio file generating method according to an embodiment of the present invention; as shown in fig. 2, an audio file generation method may include:
201. an external human voice is recorded to obtain a sound file.
202. A sound pause position in the sound file is detected.
As an optional implementation manner, detecting a sound pause position in the sound file specifically includes: detecting the time length without sound in the sound file, judging whether the time length is greater than or equal to the preset time length, determining the position corresponding to the time length as a sound pause position in the sound file when the time length is greater than or equal to the preset time length, and marking. And when the duration is less than the preset duration, ending the process. By the embodiment, the sound pause position in the sound file can be simply and quickly determined.
203. The audio file is converted into text content according to the audio pause position, and the audio pause position is used as a period break point of the text content.
And taking the voice pause positions in the voice file as punctuation points of the converted text content, then taking the voice pause positions as the basis, taking the voice in the middle of the two voice pause positions as the voice of a short sentence, and converting the voice file into the text content one by one.
204. A period break is identified for the text content.
It can be understood that when converting a sound file into text content, the punctuation of the text content is marked, and then the punctuation can be found only by looking for the mark.
205. And dividing the text content into a plurality of short sentences by taking the sentence break point as a basis.
It will be appreciated that the first short sentence from the beginning to the first punctuation, the last short sentence from the last short sentence to the last text content, and that the short sentences in the middle position are the text content corresponding to two adjacent punctuation points.
206. And analyzing the short sentence semantics of the short sentence or extracting short sentence keywords of the short sentence.
The emotional characteristics of each short sentence are analyzed by analyzing the semantics of the short sentence or extracting keywords.
207. And identifying the emotional characteristics of each short sentence according to the short sentence semanteme or the short sentence keywords of each short sentence.
208. Judging whether at least two continuous short sentences with the same emotional characteristics exist in the short sentences; wherein if so, go to step 209; if not, go to 210.
It should be noted that if the emotional characteristics of two or more phrases in succession are the same, the phrases are regarded as one phrase.
209. And taking the at least two short sentences as one short sentence.
Wherein, step 209 is completed and the process goes to step 210.
210. And sequencing the emotional characteristics of each short sentence according to the position of the short sentence in the text content to obtain the emotional characteristics of the sound file.
211. And cutting the sound file into a plurality of subfiles with playing time length according to the short sentence of the text content, wherein the subfiles correspond to the short sentence one by one.
The method comprises the steps that a sound file is stored in a database, wherein the sound corresponding to the short sentences occupies a certain playing time length in the sound file, and the playing time length of the sound corresponding to each short sentence in the sound file is determined before background music and sound effects are configured on the sound file.
212. And sequentially acquiring background music and sound effects matched with the subfile.
And obtaining background music and sound effect matched with the emotional characteristics according to the emotional characteristics of the short sentence corresponding to the subfile.
In the embodiment of the invention, because the emotional characteristics of the two adjacent subfiles are different, different background music and sound effects can be configured according to different emotional characteristics, so as to obtain more proper background music and sound effects.
213. And adding the matched background music and sound effect to each subfile in sequence to obtain an audio file.
It should be noted that the playing time length of each subfile is already determined after the recording is completed, and when adding the background music and the sound effect to the subfile, it is necessary to note that the playing time lengths of the subfile, the background music and the sound effect are consistent. If the background music is too long, a portion of the background music may be clipped to be synthesized with the subfile, i.e., the clipped background music may be added on the play time track of the subfile.
In the embodiment, when an audio file is produced, after the audio file is recorded, the audio file can be converted into text content according to a sound pause position, a punctuation point in the text content corresponds to the sound pause position, the text content is divided into a plurality of short sentences through the punctuation point, the emotional characteristics of each short sentence are analyzed, the short sentences are summarized and sorted through the emotional characteristics, so that the emotional characteristics of any two adjacent short sentences in the short sentences existing in the text content are different, then background music and sound effects are configured on subfiles corresponding to each short sentence, the audio file is obtained, the auditory effect of the audio file is enhanced through abundant background music, and the audio file is colorized through the sound effects, so that the effect of the audio file is more vivid and pleasant.
EXAMPLE III
Referring to fig. 3, fig. 3 is another schematic flow chart of an audio file generating method according to an embodiment of the disclosure; as shown in fig. 3, an audio file generation method may include:
301. an external human voice is recorded to obtain a sound file.
302. The sound file is converted into text content.
303. And carrying out semantic analysis on the text content to obtain the emotional characteristics corresponding to the sound file.
After step 303 is executed, the process goes to step 307.
304. Periodically shooting a face image of a recording user.
In the embodiment of the invention, in the process of recording the sound file, the face image is also shot periodically by using the camera arranged on the terminal equipment. It should be noted that, in order to accurately reflect the facial expression of the user, the period set in the embodiment of the present invention is very small, and the face image can be captured at short intervals to obtain multiple frames of face images, and then the facial expression can be analyzed according to the multiple frames of face images. It will be appreciated that for higher accuracy, a smaller periodicity is better.
305. And analyzing the facial image to obtain the expression characteristics of the recording user.
And further obtaining the expression characteristics of the user through the facial image. It can be understood that when the user expresses the content by sound, the corresponding expression is usually presented on the face, and in the embodiment of the invention, the background music and the sound effect are further added to the sound file by combining the emotion to be expressed by the sound and the expression of the face of the user, so as to obtain more accurate background music and sound effect.
306. And acquiring the emotional characteristics corresponding to the expression characteristics.
Step 306 is executed and the flow goes to 307.
Furthermore, the emotion of the user is determined by the expression, and the emotion is more visualized.
307. And obtaining background music and sound effect which are matched with the sound file according to the emotional characteristic and the emotional characteristic.
It can be understood that the terminal device specifically combines the emotional characteristic and the emotional characteristic to pre-store the associated background music and sound effect, and combines the emotional characteristic and the emotional characteristic to search the matched background music and sound effect from the pre-stored background music and sound effect of the terminal device.
308. And adding background music and sound effects to the sound file to obtain an audio file.
In this embodiment, the terminal device obtains a sound file by recording an external human voice, and captures a face image through a camera while recording the sound file. The method comprises the steps of converting a sound file into text content to obtain emotional characteristics of the sound file, analyzing a face image to obtain emotional characteristics of a user, adding more adaptive background music and sound effects to the sound file by combining the emotional characteristics and the emotional characteristics to obtain an audio file, and enabling the auditory effect of the finally obtained audio file to be more vivid and pleasant.
Example four
Referring to fig. 4, fig. 4 is a schematic structural diagram of an audio file generating device according to an embodiment of the present invention; as shown in fig. 4, an audio file generating apparatus may include:
a recording unit 410 for recording an external voice to obtain a sound file;
a conversion unit 420 for converting the sound file into text content;
a first obtaining unit 430, configured to perform semantic analysis on the text content to obtain an emotional feature corresponding to the sound file;
a second obtaining unit 440, configured to obtain background music and sound effects adapted to the sound file according to the emotional features;
and a third obtaining unit 450, configured to add the background music and the sound effect to the sound file, and obtain an audio file.
In the embodiment of the present invention, after the recording unit 410 records an external human voice to obtain a sound file, the converting unit 420 converts the sound file into a text content, then the first obtaining unit 430 performs semantic analysis on the text content to obtain an emotional feature corresponding to the sound file, the second obtaining unit 440 obtains background music and a sound effect adapted to the sound file based on the emotional feature, and then the third obtaining unit 450 adds the background music and the sound effect to the sound file to obtain an audio file; it can be seen that, by the embodiment of the invention, when the audio file is produced, after the audio file is recorded, the audio file can be converted into the text content, then the background music for enriching and enhancing the feeling and the sound effect for color rendering can be obtained by analyzing the semantics of the text content, the audio file is processed, and the processed audio file has stronger auditory feeling and better effect.
It will be appreciated that the audio file generation apparatus shown in fig. 4 may be used to perform the audio file generation method shown in steps 101-105.
EXAMPLE five
Referring to fig. 5, fig. 5 is another schematic structural diagram of an audio file generating device according to an embodiment of the present invention; the audio file generating apparatus shown in fig. 5 is obtained by optimizing the audio file generating apparatus shown in fig. 4, and the audio file generating apparatus shown in fig. 5 further includes:
a photographing unit 510 for periodically photographing a face image of a recording user during a process in which the recording unit 410 records an external human voice to obtain a sound file;
an analyzing unit 520, configured to analyze the facial image to obtain an expression feature of the recording user;
an emotion obtaining unit 530, configured to obtain an emotion feature corresponding to the expression feature;
the second obtaining unit 440 is configured to obtain the background music and the sound effect adapted to the sound file according to the emotional characteristic, specifically:
the second acquisition unit is used for acquiring background music and sound effect which are matched with the sound file according to the emotional characteristic and the emotional characteristic.
It will be appreciated that the audio file generation apparatus shown in fig. 5 may be used to perform the audio file generation method shown in steps 301-308.
Referring to fig. 4 or fig. 5, in some embodiments, the manner for performing semantic analysis on the text content by the first obtaining unit 430 to obtain the emotion feature corresponding to the sound file is specifically:
the first obtaining unit 430 is configured to identify a sentence break point of the text content; dividing the text content into a plurality of short sentences by taking the sentence break point as a basis; analyzing the short sentence semantics of the short sentence or extracting short sentence keywords of the short sentence; identifying the emotional characteristics of each short sentence according to the short sentence semantics or the short sentence keywords of each short sentence; judging whether at least two continuous short sentences with the same emotional characteristics exist in the short sentences; and, if present, treating at least two phrases as one phrase; sequencing the emotional characteristics of each short sentence according to the position of the short sentence in the text content to obtain the emotional characteristics of the sound file;
the second obtaining unit 440 is configured to obtain the background music and the sound effect adapted to the sound file according to the emotional characteristic, specifically:
the second obtaining unit 440 is configured to cut the sound file into subfiles with a plurality of playing durations according to the short sentence of the text content, where the subfiles correspond to the short sentence one by one; sequentially acquiring background music and sound effects matched with the subfiles;
the third obtaining unit 450 is configured to add background music and sound effects to the sound file, and the manner of obtaining the audio file specifically includes:
the third obtaining unit 450 is configured to add the adapted background music and sound effect to each subfile in sequence to obtain the audio file.
EXAMPLE six
Referring to fig. 6, fig. 6 is another schematic structural diagram of an audio file generating device according to an embodiment of the present invention; the audio file generating apparatus shown in fig. 6 is optimized based on the audio file generating apparatus shown in fig. 4, and in the audio file generating apparatus shown in fig. 6, the converting unit 420 includes:
a position detection unit 610 for detecting a sound pause position in the sound file;
a text converting unit 620, configured to convert the sound file into text content according to the sound pause position, where the sound pause position is used as a period break point of the text content.
In conjunction with fig. 4 and 6 described above, may be used to perform the audio file generation method shown in steps 201-213.
EXAMPLE seven
Referring to fig. 7, fig. 7 is another schematic structural diagram of an audio file generating device according to an embodiment of the present invention; the audio file generating apparatus shown in fig. 7 is obtained by optimizing the audio file generating apparatus shown in fig. 4, and the audio file generating apparatus shown in fig. 7 further includes:
an instruction detecting unit 710, configured to detect whether a saving instruction for an audio file is received after the third obtaining unit 450 adds background music and a sound effect to the audio file to obtain the audio file;
a path detection unit 720, configured to detect whether a saving path for the audio file is received when the instruction detection unit 710 receives a saving instruction;
a first saving unit 730, configured to, when the path detecting unit 720 receives the saving path, save the audio file and the generation time association of the audio file in the storage area corresponding to the saving path;
a second saving unit 740, configured to, when the path detecting unit 720 does not receive the saving path, save the audio file and the generation time association of the audio file in the storage area corresponding to the default path.
Example eight
Referring to fig. 8, fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present invention; the terminal device shown in fig. 8 may include: an audio file generating apparatus as described in any of figures 4 to 7.
Please refer to the detailed description of the above method embodiments and apparatus embodiments, which will not be repeated herein.
It can be seen that, by implementing the terminal device, when an audio file is made, after the audio file is recorded, the audio file can be converted into text content, then background music for enriching and enhancing feelings and sound effects for color rendering are obtained by analyzing the semantics of the text content, the audio file is processed, and the processed audio file has stronger auditory feelings and better effects.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.
The audio file generation method, the audio file generation device and the terminal device disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (9)

1. An audio file generation method, comprising:
recording external human voice to obtain a sound file;
converting the sound file into text content;
performing semantic analysis on the text content to obtain emotional characteristics corresponding to the sound file;
obtaining background music and sound effect which are matched with the sound file according to the emotional characteristics;
adding the background music and the sound effect to the sound file to obtain an audio file;
the semantic analysis of the text content to obtain the corresponding emotional characteristics of the sound file includes:
identifying a sentence break point of the text content;
dividing the text content into a plurality of short sentences by taking the sentence break points as the basis;
analyzing the short sentence semantics of the short sentence or extracting short sentence keywords of the short sentence;
identifying the emotional characteristics of each short sentence according to the short sentence semantics or the short sentence keywords of each short sentence;
judging whether at least two continuous short sentences with the same emotional characteristics exist in the short sentences;
if the short sentences exist, the at least two short sentences are taken as one short sentence;
sequencing the emotional characteristics of each short sentence according to the position of the short sentence in the text content to obtain the emotional characteristics of the sound file;
the method for acquiring the background music and the sound effect matched with the sound file by taking the emotional characteristics as the basis comprises the following steps:
cutting the sound file into subfiles with a plurality of playing time lengths according to the short sentences of the text contents, wherein the subfiles correspond to the short sentences one by one;
sequentially acquiring background music and sound effects matched with the subfiles;
adding the background music and the sound effect to the sound file to obtain an audio file comprises:
and adding the matched background music and sound effect to each subfile in sequence to obtain the audio file.
2. The method of claim 1, further comprising:
periodically shooting a face image of a recording user in the process of recording the external voice to obtain a sound file;
analyzing the facial image to obtain the expression characteristics of the recording user;
acquiring emotion characteristics corresponding to the expression characteristics;
the method for acquiring the background music and the sound effect matched with the sound file by taking the emotional characteristics as the basis comprises the following steps:
and acquiring background music and sound effect which are matched with the sound file according to the emotional characteristics and the emotional characteristics.
3. The method of claim 1 or 2, wherein converting the sound file into text content comprises:
detecting a sound pause position in the sound file;
and converting the sound file into text content according to the sound pause position, wherein the sound pause position is used as a sentence break point of the text content.
4. The method according to claim 1 or 2, wherein the adding the background music and the sound effect to the sound file, after obtaining an audio file, the method further comprises:
detecting whether a saving instruction for the audio file is received;
when the saving instruction is received, detecting whether a saving path aiming at the audio file is received or not;
when the saving path is received, the audio file and the generation time association of the audio file are saved in a storage area corresponding to the saving path;
and when the saving path is not received, saving the audio file and the generation time association of the audio file to a storage area corresponding to a default path.
5. An audio file generation apparatus, comprising:
the recording unit is used for recording external voices to obtain sound files;
a conversion unit for converting the sound file into text content;
the first acquisition unit is used for carrying out semantic analysis on the text content to acquire emotional characteristics corresponding to the sound file;
the second acquisition unit is used for acquiring background music and sound effect which are matched with the sound file according to the emotional characteristics;
the third acquisition unit is used for adding the background music and the sound effect to the sound file to obtain an audio file;
the first obtaining unit is configured to perform semantic analysis on the text content to obtain an emotion feature corresponding to the sound file, and the method for obtaining the emotion feature corresponding to the sound file specifically includes:
the first acquisition unit is used for identifying the sentence break points of the text content; dividing the text content into a plurality of short sentences by taking the sentence break points as the basis; analyzing the short sentence semantics of the short sentence or extracting short sentence keywords of the short sentence; identifying the emotional characteristics of each short sentence according to the short sentence semantics or the short sentence keywords of each short sentence; judging whether at least two continuous short sentences with the same emotional characteristics exist in the short sentences; and, if present, regarding the at least two phrases as one of the phrases; sequencing the emotional characteristics of each short sentence according to the position of the short sentence in the text content to obtain the emotional characteristics of the sound file;
the second obtaining unit is used for obtaining the background music and the sound effect which are matched with the sound file according to the emotional characteristics, and the method specifically comprises the following steps:
the second obtaining unit is configured to cut the sound file into subfiles with a plurality of playing durations according to the short sentence of the text content, where the subfiles correspond to the short sentence one to one; sequentially acquiring background music and sound effects matched with the subfiles;
the third obtaining unit is configured to add the background music and the sound effect to the sound file, and the manner of obtaining the audio file specifically includes:
and the third acquisition unit is used for sequentially adding the matched background music and sound effect to each subfile to acquire the audio file.
6. The apparatus of claim 5, further comprising:
the shooting unit is used for periodically shooting and recording the face image of the user in the process of recording the external voice by the recording unit to obtain the sound file;
the analysis unit is used for analyzing the facial image to obtain the expression characteristics of the recording user;
the emotion acquisition unit is used for acquiring emotion characteristics corresponding to the expression characteristics;
the second obtaining unit is used for obtaining the background music and the sound effect which are matched with the sound file according to the emotional characteristics, and the method specifically comprises the following steps:
and the second acquisition unit is used for acquiring background music and sound effect which are matched with the sound file according to the emotional characteristic and the emotional characteristic.
7. The apparatus of claim 5 or 6, wherein the conversion unit comprises:
a position detection unit for detecting a sound pause position in the sound file;
and the text conversion unit is used for converting the sound file into text contents according to the sound pause positions, and the sound pause positions are used as sentence break points of the text contents.
8. The apparatus of claim 5 or 6, further comprising:
the instruction detection unit is used for detecting whether a storage instruction for the audio file is received or not after the third acquisition unit adds the background music and the sound effect to the sound file to obtain the audio file;
a path detection unit, configured to detect whether a saving path for the audio file is received when the instruction detection unit receives the saving instruction;
the first saving unit is used for saving the audio file and the generation time association of the audio file to a storage area corresponding to the saving path when the path detection unit receives the saving path;
and the second saving unit is used for saving the audio file and the generation time association of the audio file to a storage area corresponding to a default path when the path detection unit does not receive the saving path.
9. A terminal device, comprising:
an audio file generation apparatus as claimed in any one of claims 5 to 8.
CN201810028134.2A 2018-01-11 2018-01-11 Audio file generation method and device and terminal equipment Active CN108242238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810028134.2A CN108242238B (en) 2018-01-11 2018-01-11 Audio file generation method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810028134.2A CN108242238B (en) 2018-01-11 2018-01-11 Audio file generation method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN108242238A CN108242238A (en) 2018-07-03
CN108242238B true CN108242238B (en) 2019-12-31

Family

ID=62699647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810028134.2A Active CN108242238B (en) 2018-01-11 2018-01-11 Audio file generation method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN108242238B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109712604A (en) * 2018-12-26 2019-05-03 广州灵聚信息科技有限公司 A kind of emotional speech synthesis control method and device
WO2020151008A1 (en) * 2019-01-25 2020-07-30 Microsoft Technology Licensing, Llc Automatically adding sound effects into audio files
CN110587621B (en) * 2019-08-30 2023-06-06 深圳智慧林网络科技有限公司 Robot, robot-based patient care method, and readable storage medium
CN110598612B (en) * 2019-08-30 2023-06-09 深圳智慧林网络科技有限公司 Patient nursing method based on mobile terminal, mobile terminal and readable storage medium
CN110598611B (en) * 2019-08-30 2023-06-09 深圳智慧林网络科技有限公司 Nursing system, patient nursing method based on nursing system and readable storage medium
CN110558997A (en) * 2019-08-30 2019-12-13 深圳智慧林网络科技有限公司 Robot-based accompanying method, robot and computer-readable storage medium
CN113596241B (en) * 2021-06-24 2022-09-20 北京荣耀终端有限公司 Sound processing method and device
CN114537271A (en) * 2022-02-09 2022-05-27 广州小鹏汽车科技有限公司 Control method, vehicle, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402982A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Loud reading system with selectable background sounds and realization method of system
CN202855297U (en) * 2012-07-30 2013-04-03 西北工业大学 Background music control device based on expression
CN105070283A (en) * 2015-08-27 2015-11-18 百度在线网络技术(北京)有限公司 Singing voice scoring method and apparatus
CN105812927A (en) * 2014-12-30 2016-07-27 深圳Tcl数字技术有限公司 Method for backing up scene atmosphere and television
CN106125566A (en) * 2016-08-05 2016-11-16 易晓阳 A kind of household background music control system
CN106557298A (en) * 2016-11-08 2017-04-05 北京光年无限科技有限公司 Background towards intelligent robot matches somebody with somebody sound outputting method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201513095A (en) * 2013-09-23 2015-04-01 Hon Hai Prec Ind Co Ltd Audio or video files processing system, device and method
CN104778216B (en) * 2015-03-20 2017-05-17 广东欧珀移动通信有限公司 Method and device for processing songs with preset styles

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402982A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Loud reading system with selectable background sounds and realization method of system
CN202855297U (en) * 2012-07-30 2013-04-03 西北工业大学 Background music control device based on expression
CN105812927A (en) * 2014-12-30 2016-07-27 深圳Tcl数字技术有限公司 Method for backing up scene atmosphere and television
CN105070283A (en) * 2015-08-27 2015-11-18 百度在线网络技术(北京)有限公司 Singing voice scoring method and apparatus
CN106125566A (en) * 2016-08-05 2016-11-16 易晓阳 A kind of household background music control system
CN106557298A (en) * 2016-11-08 2017-04-05 北京光年无限科技有限公司 Background towards intelligent robot matches somebody with somebody sound outputting method and device

Also Published As

Publication number Publication date
CN108242238A (en) 2018-07-03

Similar Documents

Publication Publication Date Title
CN108242238B (en) Audio file generation method and device and terminal equipment
JP4600828B2 (en) Document association apparatus and document association method
US10977299B2 (en) Systems and methods for consolidating recorded content
CN110517689B (en) Voice data processing method, device and storage medium
CN107464555B (en) Method, computing device and medium for enhancing audio data including speech
JP3676969B2 (en) Emotion detection method, emotion detection apparatus, and recording medium
JP6446993B2 (en) Voice control device and program
CN111145777A (en) Virtual image display method and device, electronic equipment and storage medium
JP2018072650A (en) Voice interactive device and voice interactive method
US20210118464A1 (en) Method and apparatus for emotion recognition from speech
CN110750996B (en) Method and device for generating multimedia information and readable storage medium
CN108305611B (en) Text-to-speech method, device, storage medium and computer equipment
JP2005342862A (en) Robot
JP5083033B2 (en) Emotion estimation device and program
CN111415651A (en) Audio information extraction method, terminal and computer readable storage medium
CN114708869A (en) Voice interaction method and device and electric appliance
CN114125506B (en) Voice auditing method and device
WO2022041192A1 (en) Voice message processing method and device, and instant messaging client
CN112885318A (en) Multimedia data generation method and device, electronic equipment and computer storage medium
CN109635151A (en) Establish the method, apparatus and computer equipment of audio retrieval index
CN112235183B (en) Communication message processing method and device and instant communication client
CN115472185A (en) Voice generation method, device, equipment and storage medium
CN113920996A (en) Voice interaction processing method and device, electronic equipment and storage medium
JP6044490B2 (en) Information processing apparatus, speech speed data generation method, and program
CN114514576A (en) Data processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant