CN109040834A

CN109040834A - A kind of short audio computer-aided production method and system

Info

Publication number: CN109040834A
Application number: CN201810919491.8A
Authority: CN
Inventors: 范晓安; 胡蓓蓓
Original assignee: Archimedes (shanghai) Media Co Ltd
Current assignee: Archimedes (shanghai) Media Co Ltd
Priority date: 2018-08-14
Filing date: 2018-08-14
Publication date: 2018-12-18
Anticipated expiration: 2038-08-14
Also published as: CN109040834B

Abstract

The invention discloses a kind of short audio computer-aided production method and system.This method comprises: calculating multi-dimensional programme content characteristic information of the audio program to be processed on varigrained time slice；The multi-dimensional programme content characteristic information corresponding for each time slice of audio program merges；Displaying is patterned to audio program to be processed according to the fused multi-dimensional programme content characteristic information and corresponding time slice, the editing of short audio is carried out for editing personnel reference, listens to and confirm.Short audio auxiliary production method and system provided by the invention, can assist editing personnel quickly to produce the short audio of needs, improve the efficiency of short audio production, and reduce production cost；It can reduce the probability that artificial editing short audio bring omits premium content simultaneously.

Description

A kind of short audio computer-aided production method and system

Technical field

The invention discloses a kind of short audio computer-aided production method and system, are related to short audio clip field.It is logical Cross the method provided by the present invention and system, can help audio clips personnel be quickly found out interested audio fragment play out, Editing improves the efficiency of short audio production；Artificial editing short audio bring production cost is reduced simultaneously and is omitted in high-quality The probability of appearance.

Background technique

The audio stream of entire broadcast program generally comprises various types of audio contents, such as advertisement, music, voice.It is short Audio is often a certain segment in complete programs with premium content.Existing broadcast short audio production, mainly passes through people Work broadcast listening program audio stream, analysis of program content, therefrom editing goes out several segments short audio and gives short audio and suitably marks Label, title and abstract.The artificial main flow for extracting short audio has: program is listened to, premium content is found, editing and labeling are retouched It states.Program, which is listened to, refers to artificial broadcast listening programme content；Premium content discovery refers to according to scheduled content auditing rule, determines The temporal information of content to be extracted out；Editing and labeling description refer to temporal information of the record short audio in complete programs, with And the corresponding label of corresponding short audio and description are given according to short audio content.

Artificial broadcast listening program leads to short audio inefficient low output.One grade of broadcast program completely is listened to, needs to spend The time of hour grade.In face of the broadcast program of magnanimity, human ear listens to the complete disclosure analysis for not being able to satisfy broadcast program.Due to needing It puts into a large amount of artificial, could analyze, extract short audio, lead to the average output higher cost of short audio, while listen to The comparison of audio content cannot be intuitively carried out at same time point in the process, the modes such as broadcasting dragging during listening to, Yi Zao It is omitted at programme content, to increase the probability for omitting premium content.As it can be seen that existing artificial broadcast short audio produces skill Art has the shortcomings that low efficiency, at high cost, easy omission premium content.

Summary of the invention

In order to solve deficiency present in existing short audio production, it is raw that the present invention provides a kind of short audio area of computer aided Production method, this method specifically include: based on many algorithms and parameter, extracting audio program to be processed in varigrained timeslice Multi-dimensional programme content characteristic information in section；The multi-dimensional programme content characteristic letter corresponding to audio program different time point Breath is being merged；According to the fused multi-dimensional programme content characteristic information and corresponding time slice to sound to be processed Frequency program is patterned displaying, carries out the editing of short audio for editing personnel reference, listens to and confirm.

Further, the multi-dimensional programme content characteristic information includes: that audio types (such as music, pure voice, have music The different types such as voice, the outfield voice of background), and the feature such as music specifying information further segmented, advertisement and repetition Segment, sound bite correspond to text information, and the keyword of text information extraction, speaker ID, speaker are corresponded to sound bite Mood, speaker's gender and age identification, extract theme and text snippet to speech recognition result and features described above are respectively right Answer the time point of audio fragment.

Further, wherein the multi-dimensional programme content characteristic information corresponding for audio program different time point into Row fusion, specifically includes: handling conflicting characteristic index, removes the feature significantly deviated from the broadcast standard of broadcast program The feature being mutually authenticated in logic is carried out information fusion, obtains final content characteristic by index.By the way that multidimensional characteristic is carried out After fusion, the corresponding feature on audio each time point position will not will logically generate conflict, and generation that can be relatively good The main audio-frequency information in the point of audio program described in table.

Further, the present invention also provides a kind of short audio computer assisted production system, which includes consisting of Part: signature analysis layer is used for according to many algorithms and parameter, analysis, extracts audio program to be processed when varigrained Between multi-dimensional programme content characteristic information in segment；Characteristic aggregation layer, will be corresponding described more according to audio program time point Dimension program content characteristics information is merged, and audio session and fused content characteristic are exported；Characteristic key layer, is used for The index structure for constructing audio content feature, is characterized retrieval and filtering provides support；Edit operation interface, for according to fusion The multi-dimensional programme content characteristic information and corresponding time slice afterwards is patterned displaying to audio program to be processed, For editing personnel with reference to carrying out the editing of short audio, filter, listen to, and confirm that generating short audio and description believes by editorial staff Breath.Above-mentioned each component part may be implemented on same computer, can also be separately implemented on different computers, each meter Calculation machine can cross network and cooperate.

Further, in the short audio accessory production system, herein in connection with database service module, for each composition portion Divide output or data to be used is needed to be stored.The database service module can be implemented as the shape of distributed data base Formula.

Detailed description of the invention

Fig. 1 is that short audio provided by the invention assists production method flow chart；

Fig. 2 is short audio accessory production system schematic diagram provided by the invention.

Specific embodiment

In order to which technical problem, technical solution and beneficial effect solved by the invention is more clearly understood, tie below Closing attached drawing, the present invention will be described in further detail.It should be understood that specific embodiment described herein is only to explain this Invention, is not intended to limit the present invention.

Referring to attached drawing 1, the present invention provides a kind of short audio computer-aided production method, method includes the following steps:

A, the multidimensional section based on many algorithms and parameter, extraction audio program to be processed on varigrained time slice Mesh content characteristic information；

B, the multi-dimensional programme content characteristic information corresponding to audio program different time point is merging, output Audio time segment and fused audio content feature；

C, figure is carried out to audio program to be processed according to fused audio content feature and corresponding time slice Change and show, for editing personnel with reference to carrying out the editing of short audio, filter, listen to, is confirmed by editorial staff and generate short audio and retouch State information.

Wherein, it is special that multi-dimensional programme content of the audio program to be processed on varigrained time slice is extracted in step a Reference breath, specifically: a variety of programs are calculated separately on the varigrained time slice of program for a phase audio program Content characteristic.These program content characteristics analyze the programme content for being used to carry out different dimensions.The program calculated in this method The feature of content includes but is not limited to feature set forth below:

Audio types: being divided into the different types such as music, pure voice, the voice for having music background, outfield voice for audio, leads to Audio types sorting algorithm is crossed, identifies the various types of audio fragment for including in program audio and its corresponding time Information.

Music specifying information: including the specifying information of music in identification program audio, such as song information, instrument information, section Play, school etc., and export include in program audio song information: the time point of song starting and ending, song self information (singer, school, issuing date, rhythm etc.).

Advertisement and repeated fragment: the program that advertisement in broadcast program is identified by vocal print algorithm and is repeated playing The time point of segment.

Sound bite corresponds to text information and the corresponding keyword of text information: the voice in identification program audio is simultaneously Switched to text, the information for exporting speech recognition in program audio includes: the starting and ending time of voice, the text identified Word etc..Keyword extraction is carried out to the text exported based on speech recognition algorithm, establishes the time pair of keyword and program audio It should be related to.

The theme and text snippet of sound bite: to speech recognition result, subject distillation and text snippet, output master are carried out Topic is extracted and text snippet information.

The speaker of sound bite: based on the speaker's data set and Speaker Identification model having had built up, identification section Voice in different time periods is who says in mesh audio.Speaker Identification result includes the starting and ending time of voice, correspondence Speaker ID etc..

Speaker's gender and age: based on the speaker's gender and age data collection pre-established and corresponding identification Algorithm exports current speech period, the gender of speaker and the age information of prediction.

Speaker's mood: based on voice mood recognizer output current speech period and speaker's mood result.

The multi-dimensional programme content characteristic information corresponding for audio program different time segment is melted in step b It closes, specifically includes: handling conflicting characteristic index, remove the feature significantly deviated from the broadcast standard of broadcast program and refer to The feature being mutually authenticated in logic is carried out information fusion, obtains final content characteristic by mark.

In actual multidimensional audio programs characteristic extraction procedure, have in logic between the feature often extracted Conflict needs preferentially to be handled conflicting feature to keep the reasonability between the feature extracted.Such as: audio class Type is identified as voice (confidence level 0.93), and music recognition result is identified as a certain song (confidence level 0.55).Abandon music recognition As a result.Speaker Identification result is speaker A (confidence level 0.80), and speaker's gender recognition result is female (confidence level 0.95), And the gender information of practical A is male.Abandon Speaker Identification result.For such contradictory characteristic index, establishes and compare index List.When detecting contradictory performance criteria, the recognition result for selecting confidence level high abandons corresponding contradictory performance criteria.

It also needs to remove when carrying out the fusion of multi-dimensional programme content characteristic information significantly to carry on the back with the broadcast standard of broadcast program From characteristic index.Such as: speech recognition result (calculate word speed be 20 words per minute clocks), the word speed (or are not said in normal broadcast Words) recognition result within the scope of word speed.Abandon this section of speech recognition result.Speaker Identification artificial A, but A as a result, identification is spoken Not in the speaker list of period broadcast.Abandon the Speaker Identification result.

Finally, the feature being mutually authenticated in logic is carried out information fusion, it is special to obtain the final content of a segment of audio segment Sign.Such as: 110s to the 300s period of program, audio types are identified as music, and music recognition result is certain song.Then record should The final content characteristic of period records are as follows: music, song title, singer informations, school etc..500s to the 680s period of program, Audio types are identified as voice, and Speaker Identification is announcer A, and gender is female, and mood value is 5.0, etc..Then record the period Final content characteristic is the feature after the various exclusion logics that algorithm identifies deviate from.

Step c is directed to multiple time slices and its feature description in the program audio that step b is generated, provides towards short The quick program audio editing of audio clips personnel and confirmation method.Include:

1. the graphical representation of broadcast program contents feature.The function will be in the program audio of dozens of minutes to a few hours Hold feature, is drawn and shown on an image.Short audio clip personnel can spend few time " browsing " program Content characteristic.

2. content characteristic screens.The filter method of content characteristic is provided, short audio clip personnel can customize feature mistake Filter condition, after executing filtering, program content characteristics show that the program time-interval progress feature exhibition for meeting filter condition is only presented in interface Show.

3. short audio editor listens to and confirms function.Short audio editorial staff combine broadcast program contents and it is corresponding in Hold feature, can quickly select the starting and ending time of short audio, is described with reference to the labeling that algorithm provides, editor or straight Connect the descriptive contents such as title, the abstract of confirmation short audio.Listening function provides selection and plays the period, plays the functions such as speed, Confirmation audio content is quickly listened to for editorial staff.

Alternative, step c can also be realized by automatic production model.In this mode, according to editorial staff Prerequisite, algorithm automatic screening and can generate short audio and description information.

As shown in Figure 2, corresponding with above-mentioned short audio computer-aided production method, the present invention also provides a kind of minor frequency meters Calculation machine accessory production system, the system include consisting of part: signature analysis layer, are used for according to many algorithms and parameter, divide It analyses, extract multi-dimensional programme content characteristic information of the audio program to be processed on varigrained time slice；Characteristic aggregation layer, The corresponding multi-dimensional programme content characteristic information is merged according to audio program time point, export audio session with And fused audio content feature；Characteristic key layer, for being fused audio content feature construction index structure, thus It is characterized retrieval and filtering provides support, wherein constructed index structure is that the same key can be corresponded to identical or phase Like multiple audio fragments of feature；Edit operation interface, for according to the fused multi-dimensional programme content characteristic information with And corresponding time slice is patterned displaying to audio program to be processed, for editing personnel cutting with reference to progress short audio It collects, filter, listen to, and confirmed by editorial staff and generate short audio and description information.Above-mentioned each component part may be implemented same It on one computer, can also be separately implemented on different computers, each computer can cross network and cooperate.

Compared with prior art, compared with prior art, the invention has the following advantages that

1. realizing in synchronization, short audio editorial staff " can browse " broadcast program contents of whole phase.Compared to people Ear listens to program, is greatly improved editorial staff to the receiving efficiency of broadcast program contents.

2. in edit mode, by an interface, editorial staff can be used characteristic filter and retrieval, audio audition, when Between point fine tuning, description information modification, short audio confirmation etc. a variety of modes of operation.Positive location good quality audio content improves short The production efficiency of audio.

3. automatic production model built in system, system can directly produce the higher short audio of confidence level after opening automatic mode And description information.

Claims

1. a kind of short audio computer-aided production method, this method comprises:

A, based on many algorithms and parameter, analysis and multidimensional of the audio program to be processed on varigrained time slice is extracted Program content characteristics information；

B, the corresponding multi-dimensional programme content characteristic information is merged according to audio program time point, when exporting audio Between segment and fused audio content feature；

C, exhibition is patterned to audio program to be processed according to fused audio content feature and corresponding time slice Show, is referred to for editing personnel and carry out the editing of short audio, filter, listen to, and confirmed by editorial staff or edited to generate Short audio and description information.

2. the method as described in claim 1, wherein the multi-dimensional programme content characteristic information includes: audio types, music tool Body information, advertisement and repeated fragment, sound bite correspond to text information and the corresponding keyword of text information, sound bite Theme and text snippet, the speaker of sound bite, the mood of speaker, speaker's gender and age identification and features described above The respectively time point of corresponding audio fragment.

3. the method as described in claim 1, wherein in the multi-dimensional programme corresponding for audio program different time segment Hold characteristic information to be merged, specifically include: handling conflicting characteristic index, removal and the broadcast standard of broadcast program are aobvious The characteristic index deviated from is write, the feature being mutually authenticated in logic is subjected to information fusion.

4. the method as described in claim 1, wherein being patterned displaying to audio program to be processed, specifically further include: short Audio clips personnel can customize characteristic filter condition, and after executing filtering, program content characteristics show that interface is only presented and met The program time-interval of filter condition carries out feature displaying.

5., wherein being patterned displaying to audio program to be processed, being assisted such as method of any of claims 1-4 Editing personnel carry out the editing of short audio, listen to and confirm, specifically include: section audio program different time sections to be processed are corresponding Fusion after audio content feature carry out labeling description, and program content characteristics show interface on drawn and shown； Editing personnel can play the corresponding time slice of audio program to be processed by clicking corresponding label, listened to, confirmed Audio content；Editing personnel carry out short audio to audio program to be processed by the editing tool provided with reference to described image and cut Volume；Editing personnel can also edit or directly confirm the descriptive contents such as the title of short audio, abstract simultaneously.

6. a kind of short audio computer assisted production system, the system include consisting of part:

Signature analysis layer is used for according to many algorithms and parameter, analysis, extracts audio program to be processed in the varigrained time Multi-dimensional programme content characteristic information in segment；

Characteristic aggregation layer merges the corresponding multi-dimensional programme content characteristic information according to audio program time point, Export audio session and fused content characteristic；

Characteristic key layer is characterized retrieval and filtering provides support for constructing the index structure of audio content feature；

Edit operation interface is used for according to audio content feature after fusion and corresponding time slice to audio program to be processed It is patterned displaying, is referred to for editing personnel and carries out the editing of short audio, filter, listen to, and confirmed by editorial staff and generated Short audio and description information；

Database service, for exporting to above each component part or data to be used being needed to store.

7. system as claimed in claim 6, the multi-dimensional programme content characteristic information that wherein signature analysis layer extracts include: Audio types, music specifying information, advertisement and repeated fragment, sound bite correspond to text information and the corresponding pass of text information Keyword, the theme and text snippet of sound bite, the speaker of sound bite, the mood of speaker, speaker's gender and age Identification and features described above respectively correspond to the time point of audio fragment.

8. system as claimed in claim 6, wherein characteristic aggregation layer will be corresponding described more according to audio program time point Dimension program content characteristics information carries out fusion and specifically includes: handling conflicting characteristic index, removal is broadcast with broadcast program The feature being mutually authenticated in logic is carried out information fusion by the characteristic index that phonetic symbol standard significantly deviates from.

9. system as claimed in claim 6, wherein edit operation interface is patterned displaying to audio program to be processed, tool Body further include: short audio clip personnel can customize characteristic filter condition, and after executing filtering, program content characteristics show interface only The program time-interval progress feature displaying for meeting filter condition is presented.

10. the system as described in any one of claim 6-9, wherein edit operation interface carries out figure to audio program to be processed Shapeization shows that auxiliary editing personnel carry out the editing of short audio, listen to and confirm, specifically includes: by section audio program to be processed Audio content feature carries out labeling description after the corresponding fusion of different time sections, and shows that interface is enterprising in program content characteristics Row is drawn and is shown；Editing personnel can play the corresponding time slice of audio program to be processed by clicking corresponding label, It listened to, confirm audio content；Editing personnel are with reference to described image by the editing tool of offer to audio program to be processed Carry out short audio clip；Editing personnel can also edit or directly confirm the descriptive contents such as the title of short audio, abstract simultaneously.

11. system as claimed in claim 6, wherein signature analysis layer, characteristic aggregation layer, characteristic key layer, edit operation circle Five component parts in face and database service can be realized all on same computer, can also be separately implemented at different On computer, each computer can cross network and cooperate.

12. system as claimed in claim 11, wherein database service can provide number in the form of distributed data base According to storage.