CN111261167A

CN111261167A - Automatic tag generation method for audio hot content

Info

Publication number: CN111261167A
Application number: CN202010046698.6A
Authority: CN
Inventors: 吴海旭; 丁宁
Original assignee: Guangzhou Lizhi Network Technology Co ltd
Current assignee: Guangzhou Lizhi Network Technology Co ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-06-09
Anticipated expiration: 2040-01-16
Also published as: CN111261167B

Abstract

The invention discloses an automatic label generation method for audio hot content, which comprises the following steps: s1: constructing a hot content portrait system; s2: constructing an audio content representation system; s3: calculating the similarity between the hot content and the audio content; s4: comparing the calculated similarity with at least one reference value, and determining whether to set a corresponding tag for the audio content based on the comparison result; s5: and storing the audio content provided with the corresponding label in an audio hot spot content database. According to the automatic audio hot content tag generation method, the automatic tagging of the audio content is objectively carried out on the basis of the hot content portrait system and the big data, the accuracy of the audio content tag is effectively improved, and meanwhile, the labor cost is favorably reduced.

Description

Automatic tag generation method for audio hot content

Technical Field

The invention relates to the technical field of big data and artificial intelligence, in particular to an audio hot content automatic tag generation method based on a hot content representation system.

Background

With the vigorous development of the mobile internet, channels for people to acquire information are more and more diversified. Voice related information is a key channel for people to mainly obtain information. Digital life leads people to generate and consume more and more audio contents, and audio-related recommendation technology is developed to enable people to find the desired audio contents more conveniently. For audio-related recommendation, hot content is one of the most popular categories, and identification of a hot spot related to the hot category is one of important bases for recommendation by a recommendation system.

Due to the particularity of the audio platform, the related content is audio, and the labels related to the hotspots are generally manually marked by titles, specifically as follows: manually collecting external hotspot information; manually extracting keywords related to the hotspot information; extracting keywords for the related titles of the audio platform; and matching the related keywords of the audio platform with the hot keywords, and selecting related audio. However, this method of operation has the following problems:

1. the audio title does not necessarily reflect the audio content well, and a large part of the audio content, the title is not a description of the content, so that great difficulty and loss of accuracy are brought to matching;

2. a large amount of human resources are consumed, and whether the hot spot information belongs to or not is greatly influenced by subjective judgment of individuals.

Therefore, aiming at the particularity of the audio content, the method for automatically generating the audio hot content tag based on the hot content representation system is significant.

Disclosure of Invention

The invention aims to solve at least one of the problems in the prior art to a certain extent, and therefore, the invention provides an automatic audio hot content tag generation method, which realizes the automatic tagging of audio content based on a hot content portrait system, effectively improves the accuracy of audio content tags, and is beneficial to reducing the labor cost.

According to the automatic tag generation method for the audio hotspot content, the method is realized by the following technical scheme:

an automatic label generation method for audio hotspot content comprises the following steps:

s1: capturing hot search articles and related information thereof, and constructing a hot content portrait system;

s2: constructing an audio content representation system;

s3: calculating the similarity between the hot content and the audio content;

s4: comparing the calculated similarity with at least one reference value, and determining whether to set a corresponding tag for the audio content based on the comparison result;

s5: and storing the audio content provided with the corresponding label in an audio hot spot content database.

In some embodiments, in step S1, the capturing the hot search articles and the related information thereof, and constructing a hot content representation system includes:

s11: capturing hot search articles and related information thereof, and constructing a hot text database;

s12: performing article word segmentation on each article in the hot text database, and storing the article word segmentation result in an article word segmentation table;

s13: converting the article word segmentation of the hot text into an article word frequency vector, and storing the article word frequency vector result in an article word frequency vector table;

s14: and taking the article ID as a main key, and associating an article word segmentation table, an article word frequency vector table, reading quantity, praise quantity, forwarding quantity, comment quantity and release time related to the hot search article to obtain the hot content portrait system.

In some embodiments, in step S11, the capturing hot search articles and related information thereof captures a reading amount, a praise amount, a forwarding amount, a comment amount, and a release time of each hot search article by obtaining the hot search articles from at least one of a hundredth hot search board, a microblog hot search board, and a WeChat search hot word board.

In some embodiments, in step S12, the performing article segmentation on each article in the hot text database includes performing a segmentation operation on the title and the body of each article in the hot text database by using a forward maximum matching method to obtain article title segmentation and article body segmentation.

In some embodiments, in step S13, the converting the article participles into article word frequency vectors includes: combining related words in the article title participles and related words in the article text participles; respectively calculating the word frequency of each word of the merged article title participles and article text participles; and respectively converting the word frequency of the article title participle and the word frequency of the article text participle into an article title word frequency vector and an article text word frequency vector.

In some embodiments, in step S2, the constructing an audio content representation system includes:

s21: converting the audio into an audio text, and storing the audio text in an audio text database;

s22: performing audio word segmentation on the audio text, and storing an audio word segmentation result in an audio word list;

s23: converting the audio word segmentation of the audio text into an audio word frequency vector, and storing the audio word frequency vector result in an audio word frequency vector table;

s24: and constructing an audio content representation system by taking the audio ID as a main key and based on the audio participle table and the audio word frequency vector table.

In some embodiments, in step S22, the audio is audio-segmented, and the audio-title segmentation and the audio-body segmentation are obtained by performing a segmentation operation on the related titles and bodies of the audio texts in the audio text database by using a forward maximum matching method.

In some embodiments, in step S23, the converting the audio participles of the audio text into audio word-frequency vectors includes: merging related words in the audio title word segmentation and related words in the audio text word segmentation; respectively calculating the word frequency of each word of the merged audio title participles and audio text participles; and respectively converting the word audio of the audio title word segmentation and the word audio of the audio text word segmentation into an audio title word audio vector and an audio text word audio vector.

In some embodiments, in step S3, the calculating the similarity between the hot content and the audio content includes:

s31: extracting audio word frequency vectors from an audio content representation system;

s32: extracting an article word frequency vector from a hotspot content portrait system;

s33: and calculating the similarity of the extracted article word frequency vector and the extracted audio word frequency vector.

In some embodiments, in step S33, the method comprises

And calculating the similarity according to a formula, wherein F (x) is the similarity, Ai is the extracted ith audio word frequency vector, and Bi is the extracted ith article word frequency vector.

Compared with the prior art, the invention at least comprises the following beneficial effects:

according to the audio hot content automatic label generation method, the hot content portrait system and the audio content portrait system are constructed, the similarity between the audio content and the hot content is calculated, and whether the audio content is subjected to automatic labeling is determined based on the comparison result of the similarity and the reference value, so that the audio content is objectively subjected to automatic labeling based on the hot content portrait system and the big data, the accuracy of the audio content labeling is effectively improved, and meanwhile, the labor cost is favorably reduced.

Drawings

FIG. 1 is a flow chart of a method for automated tag generation for audio hot content in an embodiment of the present invention;

FIG. 2 is a flow chart of building a hot content representation system according to an embodiment of the present invention;

FIG. 3 is a flow chart of a method for building an audio content representation system according to an embodiment of the invention;

fig. 4 is a flowchart of calculating the similarity between the hot content and the audio content according to the embodiment of the present invention.

Detailed Description

The present invention is illustrated by the following examples, but the present invention is not limited to these examples. Modifications to the embodiments of the invention or equivalent substitutions of parts of technical features without departing from the spirit of the invention are intended to be covered by the scope of the claims of the invention.

As shown in fig. 1, an automatic tag generation method for audio hot content includes the following steps:

specifically, the hot content portrayal system comprises an article word segmentation table, an article word frequency vector table, reading quantity, praise quantity, forwarding quantity, comment quantity, release time and other related information of the hot search articles. Capturing hot search articles and related information of the hot search articles, such as reading quantity, praise quantity, forwarding quantity, comment quantity, release time and the like, of the hot search articles by using an information capturing technology; converting the hot search articles into an article word segmentation table and an article word frequency vector table; and (3) using the article ID as a main key, associating the article ID with article related information such as an article word segmentation list, an article word frequency vector list, reading quantity, praise quantity, forwarding quantity, comment quantity, release time and the like of the hot search article, and completing construction of the hot content representation system.

S2: constructing an audio content representation system;

specifically, the audio content representation system comprises audio related information such as audio content with article ID as a main key, an audio word segmentation table and an audio word frequency vector table.

S3: calculating the similarity between the hot content and the audio content;

specifically, similarity of the article word frequency vector and the audio word frequency vector is calculated through a similarity calculation formula by extracting the article word frequency vector representing hot content from the hot content representation system and extracting the audio word frequency vector representing the audio content from the audio content representation system.

specifically, the at least one reference value includes a preset threshold used for determining whether to perform automation tagging in the audio, and the preset threshold is set by a user based on business requirements, so as to improve adaptability and flexibility of use of the audio hotspot content automation tagging. Comparing the similarity calculated in step S3 with a preset threshold, and if the calculated similarity is greater than the preset threshold, indicating that the audio content belongs to a hot category with a high recommendation value, at this time, setting a corresponding tag for the audio content; if the calculated similarity is smaller than or equal to the preset threshold, the fact that the audio content does not belong to the hotspot category and does not have recommendation value is indicated, and at the moment, setting of a corresponding label for the audio content is abandoned. Therefore, whether the audio content is subjected to automatic labeling or not is objectively judged according to the big data, the accuracy of the audio content labeling is improved, and the accuracy of audio recommendation and the maximum profit are improved.

According to the audio hotspot content automatic label generation method, the hotspot content portrait system and the audio content portrait system are constructed, the similarity between the audio content and the hotspot content is calculated, whether the audio content is automatically labeled or not is determined based on the comparison result of the similarity and the reference value, so that the audio content is objectively labeled based on the hotspot content portrait system and the big data, the accuracy of audio content labeling is effectively improved, the accuracy of audio recommendation is improved, the income maximization of an audio platform is facilitated to be improved, and the labor cost is reduced.

As shown in fig. 2, preferably, in step S1, the capturing hot search articles and related information to construct a hot content representation system includes:

specifically, hot search articles are obtained from at least one hot search platform in a Baidu hot search list, a microblog hot search list and a WeChat search hot word list, article related information such as reading amount, praise amount, forwarding amount, appraisal amount and release time of each hot search article is captured from the hot search platforms through an information capture technology, and the hot search articles and the captured article related information are stored to form a hot text database. Therefore, the hot search articles and the related information of the articles are directly acquired from the hot search platform, so that the articles in the hot text database have the advantages of high heat point and high recommendation value, and the representativeness and the accuracy of data in the hot content portrait system are improved.

specifically, by using a forward maximum matching method (FMM), segmenting the title and the text of each article in a hot spot text database respectively to obtain article title segmentation and article text segmentation; and then, with the article ID as a main key, respectively storing the article title participles and the article text participles in an article title participle table and an article text participle table of a database. In this embodiment, the word segmentation result of the text of the article is as follows: document _ segment { word1, word2, word3, word4, word5 … wordn }; the article title word segmentation results are: title _ segment { word1, word2, word3, word4, word5 … word }. Therefore, word segmentation operation is respectively carried out on the article title and the article text of each hot search article, so that the accuracy of a data source is improved, and the matching of audio content and hot content is improved.

specifically, the article body word frequency vector and the article title word frequency vector of the hot text are respectively converted into a corresponding article body word frequency vector and a corresponding article title word frequency vector, and the article body word frequency vector and the article title word frequency vector are respectively stored in an article body word frequency vector table and an article title word frequency vector table of a database by taking an article ID as a main key.

S14: and taking the article ID as a main key, and associating an article word segmentation table, an article word frequency vector table, reading quantity, praise quantity, forwarding quantity, comment quantity and release time related to the article to obtain the hot content representation system.

The method has the advantages that the article title and the article text of each hot-search article are subjected to word segmentation operation respectively, and the article text word segmentation and the article title word segmentation are converted into the corresponding article text word frequency vector and the corresponding article title word frequency vector respectively, so that the representativeness and the accuracy of data in the hot content portrait system are improved, and the matching difficulty of the audio content and the hot content in the hot content portrait system is effectively reduced.

More preferably, in step S13, the converting the article participles into article word frequency vectors includes: combining related words in the article title participles and related words in the article text participles; respectively calculating the word frequency of each word of the merged article title participles and article text participles; and respectively converting the word frequency of the article title participle and the word frequency of the article text participle into an article title word frequency vector and an article text word frequency vector. Therefore, by respectively counting the word frequency of each word in the article title participle and the article text participle, converting the word frequency of the article title participle into an article title word frequency vector and converting the word frequency of the article text participle into an article text word frequency vector, the interest point of the user can be accurately and objectively judged according to the lengths of the article title word frequency vector and the article text word frequency vector.

In this embodiment, the word frequency of the article title participle is: title _ word _ count ═ { word1count1_ t1, word2 count2_ t2, word3 count3_ t3 … }, where count _ d #, represents the number of times the word appears in article # s. The word frequency vector of the title of the article is as follows: title _ word _ vector ═ count1_ t1, count2_ t2, count3_ t3, count4_ t4 …. The word frequency of the text word segmentation of the article is as follows: document _ word _ count [ { word1count1_ d1, word2 count2_ d2, and word3 count3_ d3 … }, where count _ d #, indicates the number of times the word occurs in article # s. The article text word frequency vector is: document _ word _ vector ═ count1_ d1, count2_ d2, count3_ d3, and count4_ d4 ….

As shown in fig. 3, preferably, in step S2, the constructing an audio content representation system includes:

specifically, the audio content is converted into audio text by an automatic speech recognition technique (ASR), and the associated title of the audio is extracted, and the audio text and the extracted associated title are stored in an audio text database with the audio ID as a primary key.

specifically, the audio is subjected to audio word segmentation, and relevant titles and texts of audio texts in an audio text database are subjected to word segmentation respectively by using a forward maximum matching method to obtain audio title word segmentation and audio text word segmentation; and with the audio ID as a main key, storing the audio title participles and the audio text participles in an audio title participle table and an audio text participle table in a database respectively.

specifically, relevant words in the audio title participles and relevant words in the audio text participles are respectively merged, the word frequency of each word of the merged audio title participles and audio text participles is respectively calculated, and the word frequency of the audio title participles and the word frequency of the audio text participles are respectively converted into audio title word frequency vectors and audio text word frequency vectors; and with the audio ID as a main key, respectively storing the audio title word frequency vector and the audio text word frequency vector in an audio title word frequency vector table and an audio text word frequency vector table in a database.

Therefore, the word segmentation operation is respectively carried out on the related title and the text of the audio text, and the audio title word segmentation and the audio text word segmentation are respectively converted into the corresponding audio title word audio vector and audio text word audio vector, so that the similarity calculation of the audio content and the hot content is facilitated, and the matching accuracy of the audio content and the hot content in the hot content portrait system is improved.

As shown in fig. 4, preferably, in step S3, the calculating the similarity between the hot content and the audio content includes:

specifically, audio body word frequency vectors and/or audio heading word frequency vectors are extracted from an audio content rendering system according to business requirements and audio opening cost.

specifically, correspondingly, the text word frequency vector and/or the article title word frequency vector of the article are/is extracted from the hot content portrait system, so that the data extracted from the two portrait systems can meet the comparison requirement, and the accuracy of the similarity calculation result can be improved conveniently.

In particular, by

Calculating by a formula, wherein F (x) is similarity, and the larger the value of F (x), the higher the similarity is represented; ai is the extracted ith audio word frequency vector, and Bi is the extracted ith article word frequency vector.

In this embodiment, if the extracted article word frequency vector and the extracted audio word frequency vector are an article text word frequency vector and an audio text word frequency vector, respectively, the calculated result is a text similarity f (document) used for comparing with a preset threshold value for representing the text.

If the extracted article word frequency vector and the extracted audio word frequency vector are the article title word frequency vector and the audio title word frequency vector respectively, the calculated result is the title similarity F (title), and the title similarity F (title) is used for comparing with a preset threshold value for representing the title.

If the extracted audio word frequency vector is an audio text word frequency vector and an audio title word frequency vector, and the extracted article word frequency vector corresponds to an article text word frequency vector and an article title word frequency vector, passing a formula

Respectively calculating to obtain title similarity F (title) and text similarity F (document); comparing the calculated title similarity F (title) with a preset threshold value for representing the title, comparing the text similarity F (document) with a preset threshold value for representing the text, and determining whether to set a corresponding label for the audio by combining the title comparison result and the text comparison result. Therefore, whether the audio content is labeled automatically or not is objectively judged according to the big data, the accuracy of labeling the audio content is improved, and meanwhile, accurate audio recommendation is facilitated.

Further, the calculated title similarity f (title) and/or text similarity f (document) may be trained by the formula μ (x) α f (title) + β f (document), where μ (x) represents the similarity of the title and the text at the same time, and the larger the value of μ (x) is, the higher the similarity is, α is a title super-scope parameter adjustable according to business needs, and β is a text super-scope parameter adjustable according to business needs, the trained μ (x) is compared with a preset threshold value for representing the title and the text at the same time, if μ (x) is larger than the preset threshold value, a corresponding label is set for the audio content, if μ (x) is smaller than the preset threshold value, the corresponding label is abandoned for the audio content, it is visible, the calculated title similarity f (title) + β f (document) is judged by the formula μ (x) α f (title) + β f (document), and the efficiency of the training of the title and/text is improved, thereby facilitating the judgment of the efficiency of the audio content.

What has been described above are merely some embodiments of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made without departing from the inventive concept thereof, and these changes and modifications can be made without departing from the spirit and scope of the invention.

Claims

1. An automatic label generation method for audio hotspot contents is characterized by comprising the following steps:

s2: constructing an audio content representation system;

s3: calculating the similarity between the hot content and the audio content;

2. The method as claimed in claim 1, wherein in step S1, the capturing hot search articles and related information to construct a hot content representation system includes:

3. The method of claim 2, wherein in step S11, the capturing hot search articles and related information captures a reading amount, a praise amount, a forwarding amount, a review amount and a release time of each hot search article by obtaining the hot search article from at least one of a hundredth hot search board, a microblog hot search board and a WeChat search hot word board.

4. The method of claim 2, wherein in step S12, the method for automatically generating tags for audio hot content includes performing article segmentation on each article in the hot text database, and performing a word segmentation operation on the title and the text of each article in the hot text database by using a forward maximum matching method to obtain article title segmentation and article text segmentation.

5. The method of claim 4, wherein in step S13, the converting the article word segmentation into the article word frequency vector comprises: combining related words in the article title participles and related words in the article text participles; respectively calculating the word frequency of each word of the merged article title participles and article text participles; and respectively converting the word frequency of the article title participle and the word frequency of the article text participle into an article title word frequency vector and an article text word frequency vector.

6. The method for automatically generating tags for audio hot spot contents according to any one of claims 2-5, wherein in step S2, said constructing an audio content representation system comprises:

7. The method of claim 6, wherein in step S22, the audio is audio-segmented by using a forward maximum matching method to segment relevant titles and texts of audio texts in the audio text database to obtain audio title segments and audio text segments.

8. The method of claim 7, wherein in step S23, the converting the audio word segmentation of the audio text into the audio word frequency vector comprises: merging related words in the audio title word segmentation and related words in the audio text word segmentation; respectively calculating the word frequency of each word of the merged audio title participles and audio text participles; and respectively converting the word audio of the audio title word segmentation and the word audio of the audio text word segmentation into an audio title word audio vector and an audio text word audio vector.

9. The method of claim 6, wherein in step S3, the calculating the similarity between the hot content and the audio content comprises:

10. The method for automatically generating tags for audio hot spot contents according to claim 9, wherein in step S33, the audio hot spot contents are automatically generated by