CN117041426A

CN117041426A - Video color ring optimization manufacturing method, system, equipment and storage medium

Info

Publication number: CN117041426A
Application number: CN202311216793.6A
Authority: CN
Inventors: 魏颖鹏; 康小刚
Original assignee: iMusic Culture and Technology Co Ltd
Current assignee: iMusic Culture and Technology Co Ltd
Priority date: 2023-09-19
Filing date: 2023-09-19
Publication date: 2023-11-10

Abstract

The invention discloses a video color ring optimization manufacturing method, a system, equipment and a storage medium, wherein the method comprises the following steps: acquiring an industry label of a target object; according to the industry label, pushing element templates to a target object, and determining first broadcasting sound, first background music and first video selected by the target object; performing tone quality adjustment processing on the first broadcast report dry sound and the first background music according to the industry label to obtain a second broadcast report dry sound and second background music; performing image quality adjustment processing on the first video according to the industry label to obtain a second video; matching and superposing the second broadcast dry sound and the second background music to obtain color ring audio; and matching and synthesizing the color ring audio with the second video to obtain the target video color ring. According to the embodiment of the invention, the elements of the video color ring are adjusted through the industry label, the video color ring is synthesized, the use experience of a client can be improved, and the method and the device can be widely applied to the technical field of audio and video processing.

Description

Video color ring optimization manufacturing method, system, equipment and storage medium

Technical Field

The invention relates to the technical field of audio and video processing, in particular to a method, a system, equipment and a storage medium for optimally manufacturing video color ring.

Background

The video color ring is that when the calling party initiates a call to the called color ring client, the calling party can see a section of video content in the process of waiting for the call to be connected, and the video color ring can be used for displaying the image of the client. Business polyphonic ringtones are video polyphonic ringtone services directed to government or enterprise customers who can make or upload personalized video content or select from a video library of an operator. In the related art, the uploaded or selected video is usually directly used as a video color ring for playing, so that the playing effect and the customer experience are poor. In view of the foregoing, there is a need for solving the technical problems in the related art.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a method, a system, a device, and a storage medium for optimizing video color ring, so as to improve the experience of clients.

In one aspect, the invention provides a method for optimally manufacturing video color ring, which comprises the following steps:

acquiring an industry label of a target object;

according to the industry label, pushing element templates to the target object, and determining first broadcasting dry sound, first background music and first video selected by the target object;

performing tone quality adjustment processing on the first broadcast dry sound and the first background music according to the industry label to obtain a second broadcast dry sound and second background music;

Performing image quality adjustment processing on the first video according to the industry label to obtain a second video;

matching and superposing the second broadcast dry sound and the second background music to obtain color ring audio;

and carrying out matching synthesis processing on the color ring audio and the second video to obtain the target video color ring.

Optionally, the pushing the element template to the target object according to the industry label determines a first report dry sound, a first background music and a first video selected by the target object, including:

matching the industry labels to obtain element templates from a material library, and recommending the element templates to the target object, wherein the element templates comprise a voice synthesis style, a recorder style, a music template and a video template;

acquiring text content and picture content uploaded by the target object, and responding to the selection operation of the target object;

selecting the voice synthesis style or the recorder style according to the text content to perform dry sound generation processing to obtain a first broadcast dry sound;

determining the music template selected by the target object as first background music;

and synthesizing the video template according to the text content and the picture content to obtain a first video.

Optionally, the matching according to the industry label to obtain the element template from the material library includes:

matching elements in the material library according to the industry label to obtain an element set;

and carrying out recommendation score calculation processing on the element set according to a recommendation calculation formula, and arranging the calculated recommendation scores from high to low to obtain an element template.

Optionally, the performing, according to the industry label, sound quality adjustment processing on the first broadcast dry sound and the first background music to obtain a second broadcast dry sound and a second background music includes:

preprocessing the first broadcast dry sound and the first background music to obtain a preprocessed sound set;

and carrying out fine adjustment processing on the volume of the pretreatment sound set according to the industry label to obtain a second broadcast dry sound and a second background music.

Optionally, the performing image quality adjustment processing on the first video according to the industry label to obtain a second video includes:

acquiring an adjustment processing table, wherein the adjustment processing table comprises adjustment processing data obtained by adjusting video image quality based on an experience system;

inquiring the adjustment processing table according to the industry label to obtain image quality adjustment data, wherein the image quality adjustment data comprises brightness adjustment data, contrast adjustment data, saturation adjustment data and super-division processing data;

And adjusting the first video according to the image quality adjusting data to obtain a second video.

Optionally, the matching and superimposing processing is performed on the second broadcast dry sound and the second background music to obtain a color ring tone, including:

respectively performing beat detection on the second broadcast dry sound and the second background music to obtain a dry sound beat time sequence and a background music beat time sequence;

fixing the background music beat time sequence, adjusting the dry sound beat time sequence according to an adjustment threshold value, and performing audio speed change processing on the second broadcast dry sound according to the adjusted dry sound beat time sequence to obtain a third broadcast dry sound;

and superposing the third broadcast dry sound and the second background music to obtain the color ring audio.

Optionally, the matching and synthesizing the color ring audio and the second video to obtain the target video color ring includes:

performing beat detection processing on the color ring audio to obtain an audio beat sequence;

performing transition detection processing on the second video to obtain a video transition time sequence;

performing text detection processing on the second video to obtain a text presentation time sequence;

Performing union processing on the video transition time sequence and the text presentation time sequence to obtain a video beat sequence;

matching and adjusting the video beat sequence according to the audio beat sequence, and performing video variable speed processing on the second video according to the adjusted video beat sequence to obtain a third video;

and synthesizing the third video and the color ring audio to obtain the target video color ring.

On the other hand, the embodiment of the invention also provides a video color ring making system, which comprises:

the first module is used for acquiring the industry label of the target object;

the second module is used for pushing element templates to the target object according to the industry label and determining a first broadcast sound, a first background music and a first video selected by the target object;

the third module is used for carrying out tone quality adjustment processing on the first broadcast dry sound and the first background music according to the industry label to obtain a second broadcast dry sound and second background music;

a fourth module, configured to perform image quality adjustment processing on the first video according to the industry label, to obtain a second video;

A fifth module, configured to perform matching and overlapping processing on the second broadcast dry sound and the second background music to obtain a color ring tone;

and a sixth module, configured to perform matching synthesis processing on the color ring audio and the second video to obtain a target video color ring.

Optionally, the second module includes:

the first unit is used for matching to obtain an element template from a material library according to the industry label, recommending the element template to the target object, wherein the element template comprises a voice synthesis style, a recorder style, a music template and a video template;

the second unit is used for acquiring the text content and the picture content uploaded by the target object and responding to the selection operation of the target object;

the third unit is used for selecting the voice synthesis style or the recorder style according to the text content to perform dry sound generation processing to obtain a first broadcast dry sound;

a fourth unit, configured to determine that the music template selected by the target object is first background music;

and a fifth unit, configured to perform synthesis processing on the video template according to the text content and the picture content, so as to obtain a first video.

Optionally, the first unit includes:

the first subunit is used for matching the elements in the material library according to the industry label to obtain an element set;

and the second subunit is used for carrying out recommendation score calculation processing on the element set according to a recommendation calculation formula, and obtaining an element template according to the arrangement of the calculated recommendation scores from high to low.

Optionally, the third module includes:

a sixth unit, configured to preprocess the first broadcast dry sound and the first background music to obtain a preprocessed sound set;

and a seventh unit, configured to perform fine adjustment processing on the volume of the preprocessed sound set according to the industry label, so as to obtain a second broadcast sound and a second background music.

Optionally, the fourth module includes:

an eighth unit, configured to acquire an adjustment processing table, where the adjustment processing table includes adjustment processing data obtained by adjusting video image quality based on an experience system;

a ninth unit, configured to perform query processing on the adjustment processing table according to the industry label to obtain image quality adjustment data, where the image quality adjustment data includes brightness adjustment data, contrast adjustment data, saturation adjustment data, and super-resolution processing data;

And a tenth unit, configured to perform adjustment processing on the first video according to the image quality adjustment data, so as to obtain a second video.

Optionally, the fifth module includes:

an eleventh unit, configured to perform beat detection on the second broadcast dry sound and the second background music respectively, to obtain a dry sound beat time sequence and a background music beat time sequence;

a twelfth unit, configured to fix the background music beat time sequence, adjust the dry sound beat time sequence according to an adjustment threshold, and perform audio speed change processing on the second broadcast dry sound according to the adjusted dry sound beat time sequence, so as to obtain a third broadcast dry sound;

and a thirteenth unit, configured to perform superposition processing on the third broadcast dry sound and the second background music to obtain a color ring tone.

Optionally, the sixth module includes:

a fourteenth unit, configured to perform beat detection processing on the color ring audio to obtain an audio beat sequence;

a fifteenth unit, configured to perform a transition detection process on the second video to obtain a video transition time sequence;

sixteenth unit, configured to perform text detection processing on the second video to obtain a text presentation time sequence;

Seventeenth unit, which is used to process the video transition time sequence and the word presentation time sequence in union to obtain video beat sequence;

an eighteenth unit, configured to perform matching adjustment processing on the video beat sequence according to the audio beat sequence, and perform video variable speed processing on the second video according to the adjusted video beat sequence, so as to obtain a third video;

and the nineteenth unit is used for synthesizing the third video and the color ring audio to obtain the target video color ring.

On the other hand, the embodiment of the invention also discloses electronic equipment, which comprises a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

In another aspect, embodiments of the present invention also disclose a computer readable storage medium storing a program for execution by a processor to implement a method as described above.

In another aspect, embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the foregoing method.

Compared with the prior art, the technical scheme provided by the application has the following technical effects: according to the embodiment of the application, the element template is pushed to the target object according to the industry label by acquiring the industry label, so that the selection burden of producing audio and video can be reduced, and the customer experience is improved; in addition, the embodiment of the application adjusts the tone quality and the image quality through the industry label, and optimizes the video color ring; and then, the three elements of the dry sound, the background music and the video picture of the video color ring are matched and adjusted, so that the playing effect of the video color ring is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a service system according to an embodiment of the present application;

fig. 2 is a flowchart of a video color ring optimization method according to an embodiment of the present application;

FIG. 3 is a flow chart of beat detection provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an algorithm for matching a dry sound to a background sound according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a neural network for transition detection according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of video text recognition according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an algorithm for audio and video matching according to an embodiment of the present application;

fig. 8 is a flowchart of an embodiment of generating a video color ring according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a video color ring making system according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

In the field of video color ring, business color ring is video color ring service for enterprise clients. The customer composes and makes a propaganda video through self-provided uploading or a template, and sets the video as a bell sound for an enterprise fixed phone or an employee work mobile phone which belongs to the customer. When a third party dials the enterprise fixed telephone or employee number, the corresponding propaganda video ring tone can be played on the calling mobile phone during ringing. In the related art, the propaganda video which is uploaded by a self-provided or synthesized and manufactured by a template is directly used as a video color bell, and the playing and displaying effects are not ideal.

In view of this, the embodiment of the application provides a method for optimizing and making video color ring, which can be applied to a terminal, a server, software running in a terminal or a server, and the like. The terminal may be, but is not limited to, a tablet computer, a notebook computer, a desktop computer, etc. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.

Referring to fig. 1, the embodiment of the application applies the video color ring making method to a business color service unified background, the business color service unified background interacts with clients through websites or application programs, and an interaction system comprises a business color portal, a applet and an interface server, and stores corresponding data through a physical file storage warehouse and a database. The business-to-color business unified background is connected with a label system, a recommendation system, a content making system, an audio processing server and a video processing server; the label system is used for identifying the industry labels corresponding to the clients, and the clients can be marked with the industry labels through a collection or artificial intelligence analysis method. And the recommendation system is used for recommending different audio synthesis styles, record players, background music and the like by utilizing industry labels and using a custom algorithm, so that the burden of template selection is reduced. The content making system is used for performing voice synthesis (TTS) processing, professional recording and video template making, specifically, broadcasting dry sound is obtained through voice synthesis or recording making, and a video template is obtained through text and picture making. The audio processing server is used for carrying out tone quality enhancement, beat detection, matching of the dry sound and the background sound and audio speed change processing on the audio, wherein the audio comprises the dry sound and the background sound. The video processing server performs image quality adjustment, transition detection, text detection, audio and video matching and video variable speed processing on the video based on Artificial Intelligence (AI) capability, and generates and obtains video color ring.

Referring to fig. 2, an embodiment of the present invention provides a method for optimally manufacturing a video color ring, where the method includes:

s101, acquiring an industry label of a target object;

s102, pushing element templates to the target object according to the industry label, and determining a first broadcast sound, a first background music and a first video selected by the target object;

s103, performing tone quality adjustment processing on the first broadcast dry sound and the first background music according to the industry label to obtain a second broadcast dry sound and second background music;

s104, performing image quality adjustment processing on the first video according to the industry label to obtain a second video;

s105, performing matching and superposition processing on the second broadcast dry sound and the second background music to obtain color ring audio;

and S106, carrying out matching synthesis processing on the color ring audio and the second video to obtain the target video color ring.

In the embodiment of the invention, the industry label of the target object is obtained by a collection or artificial intelligence analysis method, and the industry label identifies the industry of the target object, such as catering industry, education industry and the like. And then screening element templates from a material library according to industry labels, pushing the element templates obtained by matching to a target object, and selecting the element templates by the target object to obtain a first broadcast sound, a first background music and a first video, wherein a target client can also self-upload and determine the first broadcast sound, the first background music and the first video without selecting the corresponding element templates. The method and the device can generate the broadcast dry sound through text conversion voice, and can also record the broadcast dry sound manually through a professional recorder. And then, according to the characteristics of the industry labels, the tone quality of the first broadcast dry sound and the first background music is adjusted, and the image quality of the first video is adjusted, so that the watching experience of a client is improved. And finally, by identifying the stuck point of the dry sound and the background music, adjusting the speed and the starting time of the dry sound or the background music, matching the dry sound with the background music, and superposing the dry sound and the background music to generate the color ring tone. And then, by identifying the stuck point of the audio, the transition time of the video and the text display time, the speed and the starting time of the audio and the video are finely adjusted, and the video color ring is generated, so that the audio and the video are more synchronously shot, and the overall watching effect of the video color ring is better.

It should be noted that, in each specific embodiment of the present application, when related processing is required to be performed according to data related to the identity or characteristics of the target object, such as information of the target object, behavior data of the target object, history data of the target object, and position information of the target object, permission or consent of the target object is obtained first, and the collection, use, processing, etc. of the data complies with related laws and regulations and standards. In addition, when the embodiment of the application needs to acquire the sensitive information of the target object, the independent permission or independent consent of the target object is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the target object is explicitly acquired, the necessary target object related data for enabling the embodiment of the application to normally operate is acquired.

Further as an optional implementation manner, in step S102, the step of pushing, according to the industry label, the element template to the target object, and determining the first broadcast dry sound, the first background music, and the first video selected by the target object includes:

In the embodiment of the invention, a plurality of material templates corresponding to industry labels are contained in a material library, wherein 132 subdivision industries are defined; the styles of TTS machine synthesized voice are about 20; about 10 professional recorders; background music is about 50; there are about 900 video templates in use. And matching the industry labels to obtain element templates from the material library, and recommending the element templates to the target object. According to the embodiment of the invention, the interaction system is used for interacting with the target object, the target object can select the elements of the element template, the self-uploading material can also be selected, and the interaction system responds to the selection operation of the target object by clicking a button on the interaction interface. And then acquiring text content and picture content uploaded by the target object, wherein the text content is used for generating broadcast dry sound and video production, and the picture content is used for video production. Determining that the target object selects a voice synthesis style or a recorder style to generate dry sound by responding to the selection operation of the target object, obtaining first broadcasting dry sound, and determining a music template selected by the target object as first background music; and synthesizing the video template according to the text content and the picture content to obtain a first video.

Further as an optional implementation manner, the matching according to the industry label to obtain the element template from the material library includes:

In the embodiment of the invention, the elements in the material library are matched according to the industry labels, the elements already contain the corresponding labels, and the corresponding elements can be obtained by matching the labels, so that the element set is obtained. The embodiment of the invention is described by taking a recommended speech synthesis style as an example, and the algorithm for recommending a recorder, background music and a video template is similar to the above, and only the parameter values are different, so that the description is omitted herein, wherein the recommendation score calculation processing is performed on the element set according to the recommendation calculation formula, and the recommendation calculation formula of the TTS of the i-th style in the specified industry is as follows:

V _i ＝V _i0 +a*A _i +b*B _i ；

wherein V is _i0 Is an initially set value, typically the initial values are different; a is a coefficient of 0.4; a is that _i Is a systematic recommendation and the industry customer ultimately chooses to use the running total of this speech synthesis style The number of times; b is a coefficient of 1; b (B) _i Is the cumulative number of times the system did not recommend but the industry client ultimately selected to use the speech synthesis style. Namely V _i The value of (2) is related to the initial preset value and to the historical cumulative usage. The client uses a non-recommended speech synthesis style, which causes the TTS to expand faster (b>a) A. The invention relates to a method for producing a fibre-reinforced plastic composite Finally, selecting all V according to the high-to-low arrangement of the calculated recommendation scores _i The TTS styles with the highest scores in the values are recommended to customers for use in TTS production.

Further as an optional implementation manner, in step S103, the performing, according to the industry label, sound quality adjustment processing on the first broadcast dry sound and the first background music to obtain a second broadcast dry sound and a second background music includes:

In the embodiment of the invention, the volume of the dry sound and the volume of the background music are in a certain range, and according to the service characteristics of the color ring, the volume of the broadcast dry sound is not too low, and the volume of the background music is not too high. Some fine tuning of the volume of dry and background music is required according to industry labels. The embodiment of the invention firstly carries out pretreatment on first broadcast dry sound and first background music, wherein the pretreatment comprises pretreatment such as reverberation and the like on the broadcast dry sound, pretreatment such as fade-in fade-out and the like on the background music, and a pretreatment sound set is obtained, namely the pretreated first broadcast dry sound and first background music. And then carrying out fine adjustment processing on the volume of the preprocessed sound set according to the industry label to obtain a second broadcast dry sound and a second background music. In one possible embodiment, such as "entertainment", "sports" industry tags, background music may be highlighted. Such as "legal services", "hospitals", can highlight human voice by modifying the amplitude of the sound. In addition, for some industry labels, special handling of dry sounds and background music can be done. Such as background music for "tourist attractions" labels, added surround effect, etc.

Further as an optional implementation manner, in step S104, the performing, according to the industry label, image quality adjustment processing on the first video to obtain a second video includes:

In the embodiment of the invention, an adjustment processing table is generated inside the system, and the adjustment processing table comprises adjustment data obtained by adjusting the video image quality based on an empirical system. Because how the picture quality of the video picture is adjusted is subjective judgment, the embodiment of the invention selects specific video bell sound aiming at each different industry, gradually adjusts brightness, contrast and saturation, enables an experience system to evaluate and analyze, and can also evaluate through experienced practitioners such as art designing, video editors and the like and clients. And then inquiring the adjustment processing table according to the industry label to obtain image quality adjustment data, wherein the image quality adjustment data comprises brightness adjustment data, contrast adjustment data, saturation adjustment data and super-division processing data. In one possible embodiment, industry labels such as "entertainment", "sports", etc., may be slightly highlighting and contrast. In addition, aiming at part of industry labels, the super-resolution capability can be used, so that the resolution of the video is improved.

Further as an optional implementation manner, in step S105, the matching and superimposing processing is performed on the second broadcast dry sound and the second background music to obtain a color ring tone, including:

In the embodiment of the invention, referring to fig. 3, beat detection processing is performed on an audio file to obtain an audio beat sequence, the audio file includes a second broadcast sound and a second background music, envelope summation and peak detection can be performed through preprocessing, such as energy calculation and downsampling rate, a beat task is analyzed and calculated, and beat time point information is obtained through calculation. According to the embodiment of the invention, beat detection is carried out on the dry sound, and the broadcasting time of each word is actually detected; beat detection is performed on background music to detect the beat of the music. Obtaining a dry beat time sequence G ₀ And background music beat time sequence B ₀ Wherein, the audio end time added to the last time point of the two sequences is respectively expressed as: g ₀ ＝{t ₁ ,t ₂ ,t ₃ ……t _m }；B ₀ ＝{T ₁ ,T ₂ ,T ₃ ……T _n }. Because the video color ring has a standard length, the sequence needs to be circularly complemented according to the standard length, in one embodiment, the standard color ring length is 48 seconds, G is recorded ₀ And B ₀ The complement was cyclically made up until 48 seconds was reached, and the sequence after the complement was expressed as:

G＝{t ₁ ,t ₂ ……t _m ,t _m +t ₁ ,t _m +t ₂ ……2t _m ,2t _m +t ₁ ,2t _m +t ₂ ……}＝{t ₁ ,t ₂ ,t ₃ ……t _p }；

B＝{T ₁ ,T ₂ ……T _n ,T _n +T ₁ ,T _n +T ₂ ……2T _n ,2T _n +T ₁ ，2T _n +T ₂ ……}＝{T ₁ ,T ₂ ,T ₃ ……T _q }；

the general dry voice broadcasting can be said to be about 80-200 words in 48 seconds, namely p is about 80-200; typically, the tempo of music is about 0.75-3 Hz, i.e., q is about 36-150. And then fixing a background music beat time sequence, adjusting the dry sound beat time sequence according to an adjustment threshold value, and performing audio speed change processing on the second broadcast dry sound according to the adjusted dry sound beat time sequence to obtain a third broadcast dry sound. Specifically, referring to fig. 4, the G sequence is preferentially trimmed, leaving the B sequence unchanged. For time point t with sequence number i in G _i Finding the T closest to it at B _h I.e. if from t _i Change to T _h The expected adjustment time is: d, d _i ＝T _h -t _i The method comprises the steps of carrying out a first treatment on the surface of the When the absolute value |d of the adjustment time is expected _i When the I is too large, the point is not adjusted; only when |d _i |<(t _i+1 -t _i-1 ) And/y, wherein y is an adjustment threshold coefficient, and an empirical value 15 is taken. The adjustment should not be excessive and should be only within 13.3%. In particular, when i=1, only when |d ₁ |<2*(t ₂ -t ₁ ) Adjusting at the time of/y; when i=m, only |d _m |<2*(t _m -t _m-1 ) And/y. Only the sequence number i and the adjustment time d of all points to be adjusted in G are marked _i To, and complement (0, 0) at the initial position, get the adjustment execution sequence: m= { (0, 0) … … (i, d _i ),(j,d _j )……(k,d _k ) According to M, the voice is divided into k segments to adjust the original t _i To t _j Voice content in between, uniformly distributed at t _i +d _i To t _j +d _j Between them. And then performing audio speed change processing on the second broadcast dry sound according to the adjusted dry sound beat time sequence, wherein the audio speed change can be performed by using an LSEE-MSTFTM algorithm to obtain a third broadcast dry sound. And finally, carrying out superposition fusion processing on the third broadcast sound and the second background music to obtain the color ring audio.

Further as an optional implementation manner, in step S106, the matching and synthesizing the color ring audio and the second video to obtain the target video color ring includes:

In the embodiment of the invention, the color ring audio is subjected to beat detection processing according to the beat detection method to obtain an audio beat sequence. Then the video transition time sequence is obtained by calculating the difference detection of different attributes among video frames, wherein the attributes can be pixels, histograms and X ² Histograms, edge contours, etc. Because of transition effects such as fade-in and fade-out, dissolution, etc. commonly used in video color ring, referring to fig. 5, the embodiment of the invention uses a code open source TransNetV2 boundary detection method. Meanwhile, referring to fig. 6, the embodiment of the present invention obtains a time sequence of text presentation by using a text detection (OCR) algorithm to obtain a time of text presentation. The union of the video transition time sequence and the time sequence of the text presentation is a sequence V ₀ . And to V ₀ And (5) performing 48-second cyclic complementation to obtain: v= { t ₁ ,t ₂ ,t ₃₃ ……t _r }. The audio beat sequence may be represented as a= { T ₁ ,T ₂ ,T ₃ ……T _s R is generally less than the s value. Referring to fig. 7, using the algorithm of the above steps, the embodiment of the present invention preferentially fine-tunes the V sequence, keeping the a sequence unchanged. By adjusting the threshold coefficient yTaking an experience value of 25, carrying out matching adjustment processing on the video beat sequence according to the audio beat sequence, and carrying out video speed change processing on the second video according to the adjusted video beat sequence to obtain a third video, wherein the video speed change uses an FFMPEG tool to change a display time stamp (PTS) and carries out fixed frame rate speed change. Finally, synthesizing the third video and the color ring back tone audio to obtain the target video color ring back tone

Referring to fig. 8, a process according to a possible embodiment of the present invention specifically includes: according to industry labels, the system platform recommends TTS styles, record players, background music and video templates, has twenty TTS voices with different styles, and according to industry labels of merchants, the system recommends adaptive TTS voice styles for the merchants, and is used for generating machine synthesized broadcasting dry sounds. According to the industry label of the merchant, the system recommends an adapted recorder for the merchant and is used for generating real person broadcasting dry sounds, synthesizing the broadcasting dry sounds through the TTS style selected by the client or selecting the recorder to record the broadcasting dry sounds, and the client automatically selects whether to use the TTS machine to broadcast the dry sounds or whether the recorder real person broadcasts the dry sounds. If the former is selected, the text content and the TTS style of broadcasting are required to be provided, and the text content and the TTS style are generated by a TTS system; if the latter is selected, the broadcasted text content is provided and the recorder is selected to be generated by a real person. In this embodiment, the client selects whether to use the background music provided by the system or the self-contained background music by selecting the background music or using the self-contained background music by the client. If the latter is selected, a copyright description is provided and the self-contained background music is uploaded. The customer then selects whether to compose the video using the templates provided by the system or to use the self-contained video. If the former is selected, a template to be selected is uploaded, and a plurality of sections of characters, a plurality of pictures or a plurality of sections of short sheets meeting the requirements of the template are uploaded; if the latter is selected, the self-contained video needs to be uploaded. Then, the broadcast dry sound is subjected to pretreatment such as reverberation and the like, the background music is subjected to pretreatment such as fade-in fade-out and the like, the volume of the dry sound and the volume of the background music are compared and the like according to the industry label, the color, the brightness, the contrast and the like of the video are adjusted according to the industry label, and the speed and the starting time of the dry sound or the background music are adjusted by identifying the stuck point of the dry sound and the background music, so that the dry sound is matched with the background music. And then the dry sound is overlapped with the background music to generate the audio. By identifying the stuck point of the audio, the transition time of the video and the text display time, the speed and the starting time of the audio and the video are finely adjusted, so that the audio and the video are more in time, and the overall watching effect of the video is better. And finally, putting the audio into an audio track of the video to generate a final video which is used as the video bell of the merchant.

Referring to fig. 9, the embodiment of the invention further provides a video color ring making system, which includes:

a first module 901, configured to obtain an industry label of a target object;

a second module 902, configured to determine, according to the industry label, a first report stem, a first background music, and a first video selected by the target object by pushing an element template to the target object;

a third module 903, configured to perform, according to the industry label, sound quality adjustment processing on the first broadcast dry sound and the first background music, to obtain a second broadcast dry sound and a second background music;

a fourth module 904, configured to perform image quality adjustment processing on the first video according to the industry label, to obtain a second video;

a fifth module 905, configured to perform matching and superposition processing on the second broadcast dry sound and the second background music to obtain a color ring tone;

and a sixth module 906, configured to perform a matching synthesis process on the color ring audio and the second video to obtain a target video color ring.

It can be understood that the content in the above method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.

Referring to fig. 10, an embodiment of the present invention further provides an electronic device including a processor 1001 and a memory 1002; the memory is used for storing programs; the processor executes the program to implement the method as described above.

Corresponding to the method of fig. 1, an embodiment of the present invention also provides a computer-readable storage medium storing a program to be executed by a processor to implement the method as described above.

Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the method shown in fig. 1.

In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the application, the scope of which is defined by the claims and their equivalents.

While the preferred embodiment of the present application has been described in detail, the present application is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present application, and these equivalent modifications or substitutions are included in the scope of the present application as defined in the appended claims.

Claims

1. The method for optimally manufacturing the video color ring is characterized by comprising the following steps:

acquiring an industry label of a target object;

2. The method of claim 1, wherein the pushing element templates to the target object according to the industry label, determining the first report stem, the first background music, and the first video selected by the target object, comprises:

3. The method according to claim 2, wherein the matching the industry label to obtain the element template from the material library includes:

4. The method of claim 1, wherein the performing, according to the industry label, sound quality adjustment processing on the first broadcast dry sound and the first background music to obtain a second broadcast dry sound and a second background music includes:

5. The method of claim 1, wherein the performing image quality adjustment processing on the first video according to the industry label to obtain a second video comprises:

6. The method of claim 1, wherein the matching and superimposing the second broadcast dry sound and the second background music to obtain the color ring tone includes:

7. The method of claim 1, wherein the matching and synthesizing the color ring audio and the second video to obtain the target video color ring comprises:

8. A video color ring making system, the system comprising:

the first module is used for acquiring the industry label of the target object;

9. An electronic device comprising a memory and a processor;

the memory is used for storing programs;

the processor executing the program implements the method of any one of claims 1 to 7.

10. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 7.