CN114282941A

CN114282941A - Method, device and equipment for determining advertisement insertion position and storage medium

Info

Publication number: CN114282941A
Application number: CN202111567793.1A
Authority: CN
Inventors: 陈聪; 唐玏; 张卓鹏; 蒋亚军
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Music Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Music Co Ltd
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-04-05

Abstract

The invention discloses a method, a device, equipment and a storage medium for determining an advertisement insertion position, wherein the method for determining the advertisement insertion position comprises the following steps: acquiring the similarity between the audio frames of the candidate audios and the audio frames of the advertisement audios; determining a weight value corresponding to each frame of audio frame according to a memory forgetting function, wherein the memory forgetting function represents a corresponding relation between an advertisement insertion position and a memory forgetting degree; determining a target similarity between the candidate audio and the advertisement audio according to the similarity and the weight value; and determining the insertion position of the advertising word corresponding to the advertising audio according to the target similarity. In this way, by determining the advertisement insertion position according to the target similarity between the candidate audio and the advertisement audio, the rationality of the determination of the advertisement insertion position can be improved.

Description

Method, device and equipment for determining advertisement insertion position and storage medium

Technical Field

The present invention relates to the field of advertisement technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for determining an advertisement insertion position.

Background

With the development of internet technology, advertisements have penetrated deeply into various industries. For example, in the music industry, advertisements may be inserted while a song is played.

However, the current way of inserting advertisements in audio is typically: and directly carrying out advertisement insertion at a set time point in the audio playing process. This may result in an unreasonable determination of the position of the advertisement insertion.

Disclosure of Invention

The invention mainly aims to provide a method, a device and equipment for determining an advertisement insertion position and a computer readable storage medium, aiming at improving the rationality of determining the advertisement insertion position.

In order to achieve the above object, the present invention provides a method for determining an advertisement insertion position, comprising the steps of:

acquiring the similarity between the audio frames of the candidate audios and the audio frames of the advertisement audios;

determining a weight value corresponding to each frame of audio frame according to a memory forgetting function, wherein the memory forgetting function represents a corresponding relation between an advertisement insertion position and a memory forgetting degree;

determining a target similarity between the candidate audio and the advertisement audio according to the similarity and the weight value;

and determining the insertion position of the advertising word corresponding to the advertising audio according to the target similarity.

Optionally, the step of determining an insertion position of an advertisement word corresponding to the advertisement audio according to the target similarity includes:

determining a target candidate audio corresponding to the maximum target similarity;

and determining the advertisement insertion position according to the relevant information of the target candidate audio.

Optionally, the step of obtaining the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio includes:

acquiring a first Mel frequency cepstrum matrix formed by Mel frequency cepstrum coefficients of the audio frames of the candidate audios and a second Mel frequency cepstrum matrix formed by Mel frequency cepstrum coefficients of the audio frames of the advertisement audios;

and performing cosine similarity calculation on the first Mel frequency cepstrum matrix and the second Mel frequency cepstrum matrix to obtain the similarity between the audio frames of the candidate audio and the audio frames of the advertisement audio.

Optionally, the step of determining a weight value corresponding to each frame of audio frame according to a memory forgetting function includes:

converting the memory forgetting function to convert the memory forgetting function into a first quadrant to obtain a converted forgetting function;

normalizing the converted forgetting function to obtain a normalized forgetting function;

and acquiring the weight value according to the normalized forgetting function.

Optionally, before the step of obtaining the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio, the method further includes:

acquiring a first syllable number of the audio to be selected and a second syllable number of the advertisement audio;

determining a target syllable number range according to the first syllable number and the second syllable number;

and determining the audio to be selected with the first syllable number within the target syllable number range as candidate audio.

Optionally, the step of determining a target number of pitches range according to the first number of pitches and the second number of pitches comprises:

acquiring the maximum number and the minimum number of the first syllable number;

when the second number of pitches is less than or equal to the minimum number of pitches, determining an upper limit value of the target number of pitches range according to the second number of pitches and the minimum number of pitches, and taking the second number of pitches as a lower limit value of the target number of pitches range;

when the second number of pitches is greater than or equal to the maximum number of pitches, determining a lower limit value of the target number of pitches range according to the maximum number of pitches and the second number of pitches, and taking the second number of pitches as an upper limit value of the target number of pitches range;

and when the second number of the syllables is greater than the minimum number of the syllables and the second number of the syllables is smaller than the maximum number of the syllables, determining a lower limit value of the target number of the syllables according to the minimum number of the syllables and the second number of the syllables, and determining an upper limit value of the target number of the syllables according to the maximum number of the syllables and the second number of the syllables.

Optionally, the step of obtaining a first number of the syllables of the to-be-selected audio and a second number of the syllables of the advertisement audio includes:

reading the audio to be selected by using audio sound in the audio to be selected to obtain a reading audio, and acquiring the number of the syllables of the reading audio as a first syllable number of the audio to be selected;

and reading the advertisement words corresponding to the advertisement audio by using the audio sound in the audio to be selected to obtain the advertisement reading audio, and acquiring the syllable number of the advertisement reading audio as the second syllable number of the advertisement audio.

In addition, in order to achieve the above object, the present invention further provides an advertisement insertion position determining device, where the advertisement insertion position determining device includes a memory, a processor, and an advertisement insertion position determining program stored on the processor and executable on the processor, and the processor implements the steps of the advertisement insertion position determining method when executing the advertisement insertion position determining program.

Further, to achieve the above object, the present invention also provides an advertisement insertion position determination apparatus including: an obtaining module, a first determining module, a second determining module, and a third determining module, wherein,

an acquisition module: the similarity between the audio frames of the candidate audios and the audio frames of the advertisement audios is obtained;

a first determination module: the system comprises a memory forgetting function, a weighting value and a memory forgetting degree, wherein the memory forgetting function is used for determining a weight value corresponding to each frame of audio frame according to the memory forgetting function, and the memory forgetting function represents a corresponding relation between an advertisement insertion position and the memory forgetting degree;

a second determination module: determining a target similarity between the candidate audio and the advertisement audio according to the similarity and the weight value;

and the third determining module is used for determining the insertion position of the advertising word corresponding to the advertising audio according to the target similarity.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a program for determining an advertisement insertion position, which when executed by a processor implements the steps of the method for determining an advertisement insertion position as described above.

In the embodiment of the invention, the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio is obtained, the weight value corresponding to each audio frame is determined according to the memory forgetting function capable of representing the corresponding relation between the advertisement insertion position and the memory forgetting degree, then the target similarity between the candidate audio and the advertisement audio is determined according to the obtained similarity and the determined weight value, and the insertion position of the advertisement word corresponding to the advertisement audio is further determined according to the target similarity, so that the advertisement audio and the candidate audio can be associated to determine the advertisement insertion position, and further the problem that the advertisement insertion position is not reasonable enough due to the fact that the advertisement insertion is directly carried out according to the preset time point can be avoided. That is, the determination of the advertisement insertion position by the target similarity between the candidate audio and the advertisement audio can improve the rationality of the determination of the advertisement insertion position.

Drawings

Fig. 1 is a schematic structural diagram of an advertisement insertion position determining apparatus according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method for determining an advertisement insertion position according to the present invention;

FIG. 3 is a graphical illustration of a normalized forgetting function in an exemplary embodiment;

FIG. 4 is a flowchart illustrating a method for determining an advertisement insertion position according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for determining an advertisement insertion position according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The main solution of the invention is: acquiring the similarity between the audio frames of the candidate audios and the audio frames of the advertisement audios; determining a weight value corresponding to each frame of audio frame according to a memory forgetting function, wherein the memory forgetting function represents a corresponding relation between an advertisement insertion position and a memory forgetting degree; determining a target similarity between the candidate audio and the advertisement audio according to the similarity and the weight value; and determining the insertion position of the advertising word corresponding to the advertising audio according to the target similarity.

When the advertisement is currently inserted in the audio, the advertisement insertion is usually performed according to the set time, so that the insertion position of the advertisement is not reasonable enough, and the advertisement insertion effect is affected. The solution proposed by the present invention thus aims to improve the rationality of the ad insertion position determination to improve the ad insertion effect.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a device for determining an advertisement insertion position in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the apparatus for determining an advertisement insertion position may include: a communication bus 1002, a processor 1001, such as a CPU, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration of the means for determining an advertisement insertion location shown in fig. 1 does not constitute a limitation of the means for determining an advertisement insertion location and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

In the device for determining an advertisement insertion position shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the advertisement insertion position determination program stored in the memory 1005 and perform the following steps associated with the embodiments of the advertisement insertion position determination method.

It should be noted that the execution subject of the method for determining an advertisement insertion position proposed in the following embodiments may be an advertisement insertion position determining device or an advertisement insertion position determining device, for example, a server, or a terminal device (e.g., a television or a mobile phone, etc.) having an audio output function, or a control device (e.g., a remote controller of a television) of a terminal device having an audio output function, and the like, and is not limited specifically herein. Alternatively, the means for determining the advertisement insertion position may be provided to the device for determining the advertisement insertion position or may be provided independently of the device for determining the advertisement insertion position. Wherein, when the advertisement insertion position determining means and the advertisement insertion position determining device are provided independently, the advertisement insertion position determining means and the advertisement insertion position determining device are communicably connected.

Referring to fig. 2, fig. 2 is a flowchart of a method for determining an advertisement insertion position according to a first embodiment of the present invention, in this embodiment, the method for determining an advertisement insertion position includes the following steps:

step S10: acquiring the similarity between the audio frames of the candidate audios and the audio frames of the advertisement audios;

it should be noted that the applicable scene of the present embodiment may be an application scene with audio output, such as music playing or video playing. The candidate audio refers to audio sentences which can be used for inserting advertisements, and each audio sentence comprises at least one audio frame; the advertisement audio refers to the audio of the advertisement words needing to be inserted in the audio, such as the reading audio of the advertisement words, and the advertisement audio comprises at least one audio frame;

alternatively, the candidate audio may be an audio sentence stored in advance in a preset database, or an audio sentence determined in real time from a network resource based on at least one of copyright information, a hot degree, the number of syllables, the number of audio text words, and the number of audio frames. For example, an audio sentence in which the number of words of an audio text (e.g., the number of words of lyrics) is greater than or equal to the number of words of an advertising word in an advertising audio may be taken as a candidate audio; or, the audio sentences with the number of the syllables larger than or equal to that of the advertisement audio are taken as candidate audio; or, the audio sentences with the number of the syllables larger than or equal to that of the advertisement audio are taken as candidate audio; alternatively, an audio sentence having an audio frame number greater than or equal to the audio frame number of the commercial audio is used as the candidate audio, and the like, and is not particularly limited herein. In order to improve the integrity of the commercial break, avoid the need for sentence break, etc., it may be preferable to take as the candidate audio an audio sentence in which the number of audio text words is equal to the number of advertising word words in the commercial audio and/or the number of syllables is equal to the number of syllables of the commercial audio and/or the number of audio frames is equal to the number of audio frames of the commercial audio.

Optionally, after determining the candidate audio and the advertisement audio, in order to obtain the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio, the audio frame characteristics (e.g., short-time energy, spectral entropy, etc.) of the candidate audio may be compared with the audio frame characteristics of the advertisement audio to determine the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio. For example, the closer the temporal energy and/or spectral entropy, the higher the similarity between audio frames of the candidate audio and audio frames of the advertisement audio.

In one embodiment, in consideration of the relationship between the output audio and the human auditory characteristics, a mel-frequency cepstrum coefficient (denoted as a first mel-frequency cepstrum coefficient) corresponding to the audio frame of the candidate audio and a mel-frequency cepstrum coefficient (denoted as a second mel-frequency cepstrum coefficient) corresponding to the audio frame of the advertisement audio may be obtained, so that the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio may be determined according to the first mel-frequency cepstrum coefficient and the second mel-frequency cepstrum coefficient, so as to fully consider the human auditory characteristics, and facilitate more accurate calculation of the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio. Specifically, a first mel-frequency cepstrum matrix formed by mel-frequency cepstrum coefficients of the audio frames of the candidate audios and a second mel-frequency cepstrum matrix formed by mel-frequency cepstrum coefficients of the audio frames of the advertisement audios can be obtained, and then cosine similarity calculation is performed on the first mel-frequency cepstrum matrix and the second mel-frequency cepstrum matrix to obtain similarity between the audio frames of the candidate audios and the audio frames of the advertisement audios.

The first mel-frequency cepstrum matrix is a matrix formed by taking a first mel-frequency cepstrum coefficient as a row and taking the number of audio frame frames of the candidate audio as a column, namely, the first row in the first mel-frequency cepstrum matrix is the mel-frequency cepstrum coefficient of a first audio frame in the candidate audio, the second row is the mel-frequency cepstrum coefficient of a second audio frame in the candidate audio, and the like; the second mel-frequency cepstrum matrix is a matrix formed by taking the second mel-frequency cepstrum matrix as rows and taking the number of audio frame frames of the advertisement audio as columns, namely, the first row in the second mel-frequency cepstrum matrix is the mel-frequency cepstrum coefficient of the first audio frame in the advertisement audio, the second row is the mel-frequency cepstrum coefficient of the second audio frame in the advertisement audio, and the like.

Then, the cosine similarity between each frame of audio frame of the candidate audio and each frame of audio frame of the advertisement audio can be calculated by performing the cosine similarity calculation on the first mel frequency cepstrum matrix and the second mel frequency cepstrum matrix according to the rows. The specific calculation formula is as follows:

that is, the cosine similarity between the first frame video frame of the candidate audio and the first frame video frame of the advertisement audio can be calculated by taking the first line of the first mel-frequency cepstrum matrix as a vector a and the first line of the second mel-frequency cepstrum matrix as a vector B; then, taking a second row of the first Mel frequency cepstrum matrix as a vector A, taking a second row of the second Mel frequency cepstrum matrix as a vector B, and calculating to obtain cosine similarity between a second frame video frame of the candidate audio and a second target video frame of the advertisement audio; and so on until the last line of the first Mel frequency cepstrum matrix or the last line of the second Mel frequency cepstrum matrix is calculated.

Step S20: determining a weight value corresponding to each frame of audio frame according to a memory forgetting function, wherein the memory forgetting function represents a corresponding relation between an advertisement insertion position and a memory forgetting degree;

considering that different audio frames have different audio characteristics, after the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio is obtained, the corresponding weight value of each audio frame can be determined, so that when the target similarity between the candidate audio and the advertisement audio is calculated according to the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio and the corresponding weight value, the accuracy of subsequently determining the similarity between the candidate audio and the advertisement audio can be improved.

Optionally, the steps of obtaining the first mel-frequency cepstrum coefficient and the second mel-frequency cepstrum coefficient and determining the weight value corresponding to each frame of audio frame may be executed simultaneously, or may be executed sequentially according to a preset sequence, which is not specifically limited herein.

Optionally, in order to facilitate that a deeper impression can be left for a user after the advertisement insertion so as to improve the advertisement insertion effect, a corresponding memory forgetting function can be selected according to the memory forgetting characteristic, so as to determine a weight value corresponding to each frame of audio frame according to the memory forgetting function; in order to comprehensively consider different audio characteristics of different audio frames, the weight value corresponding to the audio frame of the candidate audio can be determined according to the audio frame characteristics (such as the number of syllables, the number of words of an audio text, an audio frequency band and the like) of the candidate audio; other ways are of course possible and are not specifically limited herein. In this embodiment, it may be preferable to determine a weight value corresponding to each frame of audio frame according to a memory forgetting function; the memory forgetting function is used for representing the corresponding relation between the advertisement insertion position and the memory forgetting degree, namely, the corresponding memory forgetting degree when different audio frames in the candidate audio are taken as the advertisement insertion position. The higher the memory forgetting degree is, the smaller the corresponding weight value is, and the lower the memory forgetting degree is, the larger the corresponding weight value is.

Optionally, in order to ensure that the determined weight value is a positive number, the memory forgetting function may be converted to convert the memory forgetting function into a first quadrant of a coordinate system, so as to obtain a converted forgetting function; then, normalizing the converted forgetting function to obtain a normalized forgetting function; and acquiring a weight value corresponding to each frame of audio frame according to the normalized forgetting function.

For example, in a specific application example, according to the memory forgetting characteristics that the memory effect of the head and tail parts is good and the forgetting of the middle part is more, the following function can be used as the memory forgetting function:

alternatively, in the above memory forgetting function, μ ═ X/2 and σ ═ X/6.

To better reflect the similarity between the audio frames of the candidate audio and the audio frames of the advertisement audio, the memory forgetting function may be inverted to obtain the following inverted forgetting function:

in this way, the lower the weight value corresponding to an audio frame that is more likely to be forgotten, the negative correlation between the forgetting degree of the audio frame and the weight value is exhibited. Where μ ═ X/2, it is characterized that the forgetting degree in the middle is the highest, and according to the principle of normal distribution 3 σ, the probability of numerical distribution in (μ -3 σ, μ +3 σ) is 0.9974, and in order to ensure that all frames are covered within 3 σ, therefore, μ -3 σ ═ 0, X ═ 0 is the 1 st frame, and σ ═ (X/6).

Because the current corresponding weighted value of the forgetting function may be a negative value, in order to ensure that the weighted value is positive, the forgotten function after being inverted can be translated upwards

Translating the inverted forgetting function to a first quadrant of a coordinate axis to obtain the following translated forgetting function:

and carrying out normalization processing on the translated forgetting function to obtain a normalized forgetting function. Thus, the weight value q (a) ═ f (a-1)/s, s ═ Σ f (a-1), a ═ 1, 2.., x corresponding to each frame of audio frame can be obtained according to the normalized forgetting function.

Alternatively, when X is 50, μ is 25, and σ is 9, a graph of the corresponding normalized forgetting function is shown in fig. 3.

Step S30: determining a target similarity between the candidate audio and the advertisement audio according to the similarity and the weight value;

after the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio is obtained and the weight value corresponding to each audio frame is determined, the target similarity between the candidate audio and the advertisement audio can be determined according to the obtained similarity and the weight value.

Optionally, the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio and the weight value corresponding to each audio frame may be weighted and summed, and the summed value obtained by weighted and summed is used as the target similarity between the candidate audio and the advertisement audio. When the similarity between the audio frames of the candidate audio and the audio frames of the advertisement audio and the weight value corresponding to each audio frame are subjected to weighted summation, the similarity between the first audio frame of the candidate audio and the first audio frame of the advertisement audio is multiplied by the weight value corresponding to the first audio frame, the similarity between the second video frame of the candidate audio and the second video frame of the advertisement audio is multiplied by the weight value corresponding to the second audio frame, and the like until the last audio frame of the candidate audio and/or the last audio frame of the advertisement audio are calculated; all calculated products are then added to obtain a weighted sum. Alternatively, when there are multiple candidate audios, the target similarity between the multiple candidate audios and the advertisement audio may be calculated. Alternatively, it may be preferable to take an audio sentence whose number of audio frames is equal to that of the advertisement audio as the candidate audio.

Of course, in some other embodiments, the target similarity between the candidate audio and the advertisement audio may also be obtained by performing weighted summation on the similarity between the audio frames of the candidate audio and the audio frames of the advertisement audio and the weighted value corresponding to each audio frame, and then dividing a quotient obtained by dividing the sum obtained by the weighted summation by the total number of the audio frames of the candidate audio, and the like, which is not specifically limited herein.

Step S40: determining the insertion position of an advertising word corresponding to the advertising audio according to the target similarity;

when the advertisement insertion position is determined according to the target similarity, the insertion position of the advertisement word corresponding to the advertisement audio (hereinafter referred to as advertisement insertion position) can be determined according to the candidate audio corresponding to the maximum target similarity. Specifically, the target candidate audio corresponding to the maximum target similarity may be determined from the candidate audio, and then the advertisement insertion position may be determined according to the related information of the target candidate audio. For example, the position or play time node of the target candidate audio can be used as the advertisement insertion position.

Optionally, the related information of the target candidate audio may include feature information corresponding to the target candidate audio, such as the number of audio text words, the number of syllables, the number of audio frames, and the like corresponding to the target candidate audio; attribute information corresponding to the target candidate audio, such as copyright attributes (e.g., song title and singer title, etc.) and play attributes, such as start time node and end time node of the target candidate audio in the audio playing process, playback times and playback time node of the target candidate audio, etc., may also be included. Further, an advertisement insertion position may be determined according to the feature information and the attribute information of the target candidate audio.

For example, if it is determined that the number of words of the audio text of the target candidate audio is equal to the number of words of the target advertisement word according to the feature information of the target candidate audio, an advertisement may be inserted with the start time node of the target candidate audio as the start position of advertisement insertion according to the attribute information; if the number of the audio text words of the target candidate audio is larger than the number of the advertisement words of the target advertisement audio according to the characteristic information of the target candidate audio, the advertisement insertion time point can be determined according to the difference value between the number of the audio text words of the target candidate audio and the number of the advertisement words of the advertisement audio. For example, when the target candidate audio contains 8 words and the target advertisement word contains 5 words, the advertisement insertion position should be before the 3 rd word in the target candidate audio is played when the target candidate audio is played. The examples are given herein by way of illustration only and not by way of limitation. In a specific application example, the advertisement insertion position can be characterized by the name of the singer corresponding to the target candidate audio, the name of the song, and the playing time of the target candidate audio.

Optionally, when the advertisement insertion position is determined according to the target similarity, the following may also be performed: and taking the candidate audio with the similarity exceeding the preset similarity with the advertisement audio as the target candidate audio. At this time, the target candidate audio may be one or two or more. Correspondingly, advertisements may be inserted at multiple locations in a candidate piece of audio, such that advertisements may be played at multiple different locations in a candidate piece of audio.

Optionally, after the advertisement insertion position is determined, when the advertisement is inserted, the pre-recorded advertisement audio is usually directly inserted into the corresponding advertisement insertion position in the currently played audio for playing. However, when the method is adopted, because the pre-recorded advertisement audio is not associated with the currently played audio, the insertion of the advertisement is very hard due to the unreasonable advertisement insertion mode, so that the user feels dislike; in addition, when the advertisement is inserted, the currently played audio is usually interrupted or the volume of the currently played audio is reduced to be used as a background sound, so that the playing effect of the currently played audio is affected. Therefore, in order to improve the rationality of the advertisement insertion mode and improve the audio playing effect during advertisement insertion, the influence of advertisement insertion on the original audio playing can be reduced by replacing the corresponding audio text in the currently played audio with the target advertisement words corresponding to the advertisement audio on the basis of keeping the audio rhythm of the original played audio, so that the advertisement playing effect is improved on the basis of improving the original audio playing effect.

Specifically, after the audio corresponding to the target candidate audio is detected to be played, when the audio corresponding to the advertisement insertion position is played, the audio text corresponding to the advertisement insertion position is replaced with the target advertisement word, and then the target advertisement word is played as the audio text corresponding to the advertisement insertion position without changing the playing of the original audio. Optionally, since a certain processing time is required to replace the audio text corresponding to the advertisement insertion position with the target advertisement word, if the audio text replacement is performed when it is detected that the advertisement insertion position is played, audio playing delay and the like may be caused. Therefore, in order to improve the continuity of audio playing, the audio text corresponding to the advertisement insertion position can be replaced by the target advertisement word in advance, so that when the audio corresponding to the target candidate audio is played, the advertisement playing can be directly realized without performing audio text replacement processing in real time; or, the audio segment corresponding to the advertisement insertion position can be extracted in advance, the audio text corresponding to the audio segment is replaced by the target advertisement word, and then the audio segment after replacing the target advertisement word is replaced by the audio segment in the audio corresponding to the target candidate audio, so that when the audio corresponding to the target candidate audio is played, the advertisement playing can be directly realized.

According to the embodiment, the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio is obtained, the weight value corresponding to each audio frame is determined according to the memory forgetting function, the memory forgetting function represents the corresponding relation between the advertisement insertion position and the memory forgetting degree, the target similarity between the candidate audio and the advertisement audio is determined according to the similarity and the weight value, and the insertion position of the advertisement word corresponding to the advertisement audio is determined according to the target similarity, so that the advertisement audio and the candidate audio can be associated to determine the advertisement insertion position, the rationality of the advertisement insertion position determination is improved, and the phenomenon that the advertisement is directly inserted at the set time point to cause the user dislike is avoided.

Based on the above embodiments, a second embodiment of the advertisement insertion position determination method of the present invention is proposed. Referring to fig. 4, in this embodiment, before step S10, the method further includes:

step S01: acquiring a first syllable number of the audio to be selected and a second syllable number of the advertisement audio;

step S02: determining a target syllable number range according to the first syllable number and the second syllable number;

step S03: and determining the audio to be selected with the first syllable number within the target syllable number range as candidate audio.

Before the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio is obtained, the audio sentences in the audio to be selected can be screened first to screen out suitable audio sentences as the candidate audio, and then the target candidate audio is screened out from the candidate audio, so that the situation that all the audio sentences in the audio to be selected are directly used as the candidate audio, the determination efficiency of the target similarity is not favorably improved, and the determination efficiency of the advertisement insertion position is not favorably improved is avoided. The candidate audio refers to an audio sentence which can be acquired by the device for determining the advertisement insertion position, such as an audio sentence in a copyrighted song.

Specifically, the audio sentences in the audio to be selected can be screened according to the number of the syllables (marked as a first syllable number) of the audio to be selected and the number of the syllables (marked as a second syllable number) of the advertisement audio, so that the audio sentences with the number of the syllables equivalent to that of the advertisement audio are screened out to serve as candidate audio, and the situation that when the number of the syllables of the candidate audio is greatly different from that of the advertisement audio, the candidate audio is not suitable for the advertisement audio, and the finally determined advertisement insertion position is not reasonable enough is avoided. For example, an audio sentence in which the difference between the first number of pitches and the second number of pitches is within a preset number of pitches may be determined as the candidate audio; the preset number range of the syllables can be set according to the actual situation, and is not limited specifically here.

In this embodiment, the target number of pitches range may be determined from the first number of pitches and the second number of pitches, and then the audio sentence having the first number of pitches within the target number of pitches range may be determined as the candidate audio. For example, in order to filter out an appropriate number of audio sentences as candidate audio, the target pitch number range may be determined according to the distribution of the first pitch number and the size of the second pitch number. Wherein, when the first syllable number close to the second syllable number is densely distributed, the target syllable number range can be properly reduced; when the distribution of the first pitch number near the second pitch number is sparse, the target pitch number range can be increased appropriately.

In one embodiment, in order to determine the target number range according to the first number of pitches and the second number of pitches, the maximum number of pitches and the minimum number of pitches in the first number of pitches may be determined, then the second number of pitches may be compared with the maximum number of pitches and the minimum number of pitches, and the target number range may be determined according to the comparison result.

The following three conditions can be specifically classified according to the comparison result:

(1) when the second number of pitches is less than or equal to the minimum number of pitches, the upper limit of the number of pitches may be determined based on the second number of pitches and the minimum number of pitches, for example, the maximum N (N is a positive integer) quantile of a range of the number of pitches formed by the second number of pitches and the minimum number of pitches may be taken as the lower limit of the number of pitches; taking the second or minimum number of the syllables as the upper limit value of the number of the syllables, thus screening out the audio sentences of which the first number of the syllables is between the upper limit value of the number of the syllables and the lower limit value of the number of the syllables as candidate audio;

(2) when the second number of pitches is greater than or equal to the maximum number of pitches, the upper limit of the number of pitches may be determined based on the maximum number of pitches and the second number of pitches, for example, the minimum N (N is a positive integer) quantile of a range of the number of pitches formed by the maximum number of pitches and the second number of pitches may be taken as the lower limit of the number of pitches; taking the second syllable number or the maximum syllable number as a lower limit value of the syllable number; in this way, the audio sentences with the first number of pitches between the lower limit value of the number of pitches and the second number of pitches can be screened out as candidate audios;

(3) when the second number of pitches is greater than the minimum number of pitches and the second number of pitches is less than the maximum number of pitches, the lower limit of the number of pitches may be determined according to the minimum number of pitches and the second number of pitches, for example, the minimum N (N is a positive integer) quantile of a range of the number of pitches formed by the minimum number of pitches and the second number of pitches may be taken as the lower limit of the number of pitches; determining an upper limit of the number of pitches based on the maximum number of pitches and the second number of pitches, for example, the maximum N (N is a positive integer) quantile of a range of the number of pitches formed by the second number of pitches and the maximum number of pitches may be used as the upper limit of the number of pitches; as such, an audio sentence in which the first number of pitches is between the lower limit value of the number of pitches and the upper limit value of the number of pitches can be taken as the candidate audio.

Alternatively, N may preferably be 4.

Since the audio is usually accompanied by noise such as background sound during playing, and the advertisement words are usually presented in a reading form, in order to improve the accuracy of the adaptation of the candidate audio and the advertisement audio, in an embodiment, before the first number of the pitches of the audio to be selected and the second number of the pitches of the advertisement audio are acquired, the audio to be selected may be converted into the reading audio, so that the influence on the acquisition of the number of pitches is reduced while the noise interference is reduced. Specifically, the audio sounds in the audio to be selected may be extracted first, then the audio to be selected is read aloud by using the extracted audio sounds to obtain the read-aloud audio, and then the number of the syllables of the read-aloud audio may be obtained as the first number of the syllables of the audio to be selected. Correspondingly, the target advertisement words can be read aloud by the audio sound in the candidate audio to obtain advertisement reading audio, and the advertisement reading audio is used as the advertisement audio. Therefore, the audio sentences and the target advertisement words in the audio to be selected are read aloud simultaneously through the original audio sound in the audio to be selected, the consistency of the audio sentences and the aloud sound source of the advertisement audio can be kept, and the audio sentences and the advertisement audio can be matched more favorably.

In the embodiment, the first number of the syllables of the audio to be selected and the second number of the syllables of the advertisement audio are obtained, the target syllable number range is determined according to the first number of the syllables and the second number of the syllables, and the audio to be selected with the first number of the syllables within the target syllable number range is determined as the candidate audio, so that the situation that a large amount of time is consumed for similarity calculation and target similarity calculation due to the fact that all the audio to be selected are directly used as the candidate audio can be avoided, the efficiency of determining the target similarity is not improved, and the efficiency of determining the advertisement insertion position is not improved.

In a specific application scenario, advertisements may be inserted while music is being played. The specific heat-emitting advertisement inserting process comprises the following steps:

1. and taking the copyrighted songs in the database as candidate audio, and extracting singer sounds V1, V2, Vn in the copyrighted songs in the database.

2. The audio sentences in all songs of the mth singer are converted into reading audio for reading by the singer sound Vm, and the reading audio is recorded as N11.

3. Reading the target advertising words through the singer voice Vk (k 1.. n) so as to convert the target advertising words into the advertising reading audio which is read by the singer voice Vk, and obtaining the advertising audio corresponding to the singer voice Vk.

4. Extracting the number of the syllables of the advertisement audio, and recording the number as J0; then, each of the reading-aloud audios n11.... Nnn is extracted, and the corresponding number of the syllables is respectively marked as j11.... Jnn.

5. Ordering j11.. Jnn and J0 from small to large, resulting in an ordering of jmin.. J0... Jmax.

6. When J0 is between Jmin and Jmax, the maximum quartile J75% can be taken for the range of number of syllables consisting of Jmin and J0; and, the minimum quartile J25% is taken for the range of the number of syllables made up of J0 and Jmax. And then taking out the audio sentences with the number of the syllables between J75% -J25% as candidate audio. Subsequent processing operations are performed on the eligible candidate audio.

7. The Mel frequency cepstrum coefficient (the second Mel frequency cepstrum coefficient) of the advertisement audio is extracted to obtain a coefficient matrix T (T is a matrix of X rows and Y columns, and X is the frame number of the voice segment).

8. The mel frequency cepstrum coefficient (the first mel frequency cepstrum coefficient) of Nkk corresponding to the audio sentence is extracted to obtain a coefficient matrix Tkk (Tkk is a matrix with X rows and Y columns, X represents the frame number of the speech, and the first mel frequency cepstrum coefficient is obtained by sampling according to the frame number of T).

9. The cosine similarity between Tkk and T is calculated to obtain a column of X-line vector P, and the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio can be represented by the vector P.

10. According to the fact that the inhibition of the normal and reverse taking is cognitive psychology, the inhibition of the normal taking is the interference effect of the materials which are learned firstly on the learning materials which are remembered and recalled later. The back-eating inhibition is the interfering effect of the post-learned material on the retention and recall of the pre-learned material. The memory effect of the head and the tail parts is good, and the forgetting of the middle part is more. Wherein the memory forgetting curve satisfies the following normal distribution function (i.e. the preset forgetting function):

wherein, mu-X (X/2), and σ -X (X/6);

note that μ ═ X/2 represents a portion in which the forgetting degree is highest in the entire memory, and according to the principle of normal distribution 3 σ, the probability of the numerical value distribution in (μ -3 σ, μ +3 σ) is 0.9974, and to ensure that all frames are included in 3 σ, μ -3 σ ═ 0, and X ═ 0 represents the 1 st frame, σ ═ X/6.

11. Each sentence of advertisement words or lyrics is read aloud and is composed of a plurality of frames, the corresponding similarity of the corresponding frames is calculated as above, in order to better express the similarity between the advertisement audio and the audio sentences, the more the advertisement words or lyrics are forgotten, the lower the weight is, the negative correlation is presented, so the inverse function of the normal distribution function is taken, and the following inversed forgetting function is obtained:

12. the current value of the weight function is a negative value, and in order to ensure that the weight is positive, the weighted forgetting function is weighted uniformly after negation, so that the function is translated upwards

The function becomes a translated forgetting function as follows:

13. acquiring a weight value corresponding to each frame of audio frame according to the function value, and normalizing to obtain a final weight value of each frame; then, a value f (a-1) corresponding to a 1. The weight q (a) of each frame is f (a-1)/s.

14. The similarity tkk is p (a) q (a), where p (a) is the value of the similarity calculated in step 9 for each frame.

15. All similarities tkk are calculated in sequence in the manner described above.

16. In case of a total of m singers, all tkk values for each copyrighted singer can be calculated in the above calculation manner. According to the maximum tkk value, the words corresponding to which words in which song of which singer can be found out can be replaced by the target advertising words.

17. The targeted advertising words are converted into audio sentences corresponding to the singer's voice and inserted into the song.

In addition, the present application also provides an apparatus for determining an advertisement insertion position, the apparatus for determining an advertisement insertion position includes a memory, a processor and a program stored on the memory and running on the processor for determining an advertisement insertion position, and the processor implements the steps of the method for determining an advertisement insertion position as described above when executing the program for determining an advertisement insertion position.

In addition, the application also provides a device for determining the position of the advertisement insertion. Alternatively, the apparatus for determining an advertisement insertion position may comprise or be externally connected to the means for determining an advertisement insertion position.

In one embodiment, referring to fig. 5, the apparatus 100 for determining an advertisement insertion position may include: an obtaining module 10, a first determining module 20, a second determining module 30, and a third determining module 40, wherein:

the acquisition module 10: the similarity between the audio frames of the candidate audios and the audio frames of the advertisement audios is obtained;

the first determination module 20: the system comprises a memory forgetting function, a weighting value and a memory forgetting degree, wherein the memory forgetting function is used for determining a weight value corresponding to each frame of audio frame according to the memory forgetting function, and the memory forgetting function represents a corresponding relation between an advertisement insertion position and the memory forgetting degree;

the second determination module 30: determining a target similarity between the candidate audio and the advertisement audio according to the similarity and the weight value;

the third determination module 40: and the inserting position of the advertising word corresponding to the advertising audio is determined according to the target similarity.

It should be noted that the embodiments of the device 100 for determining an advertisement insertion position are substantially the same as the embodiments of the method for determining an advertisement insertion position, and are not described herein again.

Furthermore, the embodiment of the present invention further provides a computer-readable storage medium, on which a program for determining an advertisement insertion position is stored, and when executed by a processor, the program for determining an advertisement insertion position implements the steps of the method for determining an advertisement insertion position as described above.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a television, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for determining an advertisement insertion position, comprising the steps of:

2. The method for determining an insertion position of an advertisement according to claim 1, wherein the step of determining an insertion position of an advertisement word corresponding to the advertisement audio according to the target similarity includes:

and determining the insertion position of the advertising word corresponding to the advertising audio according to the relevant information of the target candidate audio.

3. The method of claim 1, wherein the step of obtaining the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio comprises:

4. The method for determining advertisement insertion position according to claim 1, wherein the step of determining the weight value corresponding to each audio frame according to the memory forgetting function comprises:

and acquiring the weight value according to the normalized forgetting function.

5. The method for determining an advertisement insertion position according to claim 1, wherein the step of obtaining the similarity between the audio frame of the candidate audio and the audio frame of the advertisement audio is preceded by the step of:

6. The method of determining an advertisement insertion position according to claim 5, wherein the step of determining a target number range of the number of the pitches based on the first number of the pitches and the second number of the pitches comprises:

7. The method for determining an advertisement insertion position according to claim 5, wherein the step of acquiring a first number of pitches of the audio to be selected and a second number of pitches of the advertisement audio comprises:

8. An apparatus for determining an advertisement insertion position, wherein the apparatus comprises a memory, a processor and a program for determining an advertisement insertion position, the program being stored in the memory and being executable on the processor, and the processor implements the steps of the method for determining an advertisement insertion position according to any one of claims 1 to 7 when executing the program for determining an advertisement insertion position.

9. An advertisement insertion position determining apparatus, characterized in that the advertisement insertion position determining apparatus comprises: an obtaining module, a first determining module, a second determining module, and a third determining module, wherein,

10. A computer-readable storage medium, on which an advertisement insertion position determination program is stored, the advertisement insertion position determination program, when executed by a processor, implementing the steps of the advertisement insertion position determination method according to any one of claims 1 to 7.