CN118261650A

CN118261650A - Advertisement demand determining system and method based on voice analysis

Info

Publication number: CN118261650A
Application number: CN202410685999.1A
Authority: CN
Inventors: 余毅; 冉凡高; 陈德来; 董绘军; 许彪
Original assignee: Shanghai Tontron Information Technology Co ltd
Current assignee: Shanghai Tontron Information Technology Co ltd
Priority date: 2024-05-30
Filing date: 2024-05-30
Publication date: 2024-06-28
Anticipated expiration: 2044-05-30
Also published as: CN118261650B

Abstract

The invention belongs to the field of data monitoring, and particularly relates to an advertisement demand determining system and method based on voice analysis, wherein the system firstly converts user voice into standard text in real time through voice acquisition, noise reduction pretreatment and multi-source voice conversion, secondly, a text recognition unit further recognizes entities and constructs a user knowledge graph to realize structured storage of user preference, meanwhile, an emotion recognition unit recognizes user voice emotion types and quantifies intensity by combining a multidimensional emotion analysis technology, and provides accurate emotion labels for advertisement pushing, and fourthly, a data mining and user portrait construction unit analyzes user behaviors and interests to construct accurate portraits, determines advertisement pushing objects and generates pushing contents according to portraits and emotion analysis, and finally, the system collects user feedback, evaluates advertisement effects and dynamically adjusts and determines strategies; the invention realizes accurate advertisement pushing based on voice analysis and emotion analysis, and further improves the effectiveness of advertisement pushing.

Description

Advertisement demand determining system and method based on voice analysis

Technical Field

The invention belongs to the field of data monitoring, and particularly relates to an advertisement demand determining system and method based on voice analysis.

Background

With the rapid development of technology, the advertising industry is continually seeking innovations to better capture and meet consumer needs. In the digital marketing era, an advertisement demand analysis system based on advanced technologies such as big data, artificial intelligence and the like has become an important tool for enterprises to get insight into markets and to formulate accurate marketing strategies. The voice analysis is used as an emerging technical means, and is increasingly focused and applied by the advertising industry in a direct and natural man-machine interaction mode. However, the existing advertisement demand determining system and method based on voice analysis still have some limitations and disadvantages in practical application, especially in the aspect of identifying and utilizing emotion information in voice.

A system and method for determining advertisement demand by voice analysis is disclosed in the patent publication CN110264261a, which comprises: an advertisement platform: for delivering advertisements; and the voice extraction module is used for: for speech extraction; and the voice analysis module is used for: the voice recognition module is used for analyzing the voice information extracted by the voice extraction module, and carrying out recognition and key content extraction; a main control module; receiving voice information and extracted key content identified by a voice analysis module, analyzing actual demands of clients, formulating advertisement promotion strategies and issuing commands; an advertisement promotion module: an advertisement library is built in the advertisement platform, a command issued by the main control module is received, and advertisements in the advertisement library are pushed to the advertisement platform for delivery according to the command.

The patent with publication number CN110827074A discloses a method for advertisement delivery evaluation by adopting video voice analysis, which comprises the following steps: step one, acquiring video data and extracting voice audio from the video data; step two, noise reduction processing is carried out on the voice audio to obtain theme audio; step three, performing voice recognition on the theme audio, extracting the feature description of the keyword pair theme audio, and performing advertisement delivery evaluation according to the feature description; the method for extracting the voice audio from the video data comprises the following steps: firstly extracting audio data from video data, and then extracting human voice audio from the audio data, wherein the human voice audio noise reduction processing method comprises the following steps: and weakening the frequency of the human voice audio outside the human voice audio to obtain the theme audio.

The above prior art has the following problems: 1) in the complex situations of noisy environment, dialect accent, too fast or too slow speech speed, etc., the accuracy of speech recognition is poor, 2) most of the existing systems focus on recognizing keywords and phrases in speech, but the processing capacity of complex language phenomena with emotion colors such as metaphors, irony, bilingual, etc. is limited for the deep understanding of semantic level, and 3) when the existing systems perform speech analysis, the emotion information contained in the contexts is often not fully considered, so that the recognition result is opposite to the original meaning; in order to solve the above problems, the present invention provides a system and a method for determining advertisement demand based on voice analysis.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an advertisement demand determining system and method based on voice analysis, firstly, the system converts user voice into standard text in real time through voice acquisition, noise reduction pretreatment and multi-source voice conversion, secondly, a text recognition unit further recognizes entities and constructs a user knowledge graph to realize structured storage of user preference, meanwhile, an emotion recognition unit recognizes the emotion type and the quantization intensity of the user voice by combining a multi-dimensional emotion analysis technology, and provides an accurate emotion tag for advertisement pushing, fourthly, a data mining and user portrait construction unit constructs an accurate portrait according to the voice text, emotion and the association score between user behaviors and interests, determines advertisement pushing objects and generates pushing contents according to portrait and emotion analysis, and finally, the system also collects user feedback, evaluates advertisement effects and dynamically adjusts strategies; the invention realizes accurate advertisement pushing based on voice analysis and emotion analysis, and further improves the effectiveness of advertisement pushing.

In order to achieve the above purpose, the present invention provides the following technical solutions:

an advertising demand determination system based on speech analysis, comprising: the system comprises a voice recognition module, a natural language processing module, a voice emotion analysis module, a user portrait module and an advertisement demand determining and pushing module;

The voice recognition module is used for collecting multi-source voice information of the user and converting the collected voice of the user into standard text information; the voice recognition module comprises a multi-source voice conversion unit; the multi-source voice conversion unit is used for converting multi-source voice information into text information in real time by using a deep learning algorithm;

The natural language processing module is used for carrying out semantic analysis on the text information output by the voice recognition module and constructing a user voice information knowledge graph base according to the analysis result;

The voice emotion analysis module is used for analyzing emotion information in the voice of the user; the voice emotion analysis module comprises an emotion recognition unit and an emotion strength evaluation unit; the emotion recognition unit is used for recognizing emotion types in the multi-source voice information of the user through a multi-dimensional emotion recognition technology; the emotion intensity evaluation unit is used for performing deep learning quantitative evaluation on the identified emotion types to obtain intensities of the corresponding emotion types;

the user portrait module is used for mining user data information by utilizing a data mining technology according to the acquired emotion information and voice information base to construct a user portrait; the user portrait module comprises a data mining unit and a user portrait construction unit; the data mining unit is used for calculating the comprehensive interest score between the user behavior and the interest according to the data in the user voice information knowledge graph base by using the association rule mining technology; the user portrait construction unit is used for constructing a user portrait according to the acquired user voice information knowledge graph base, emotion type and strength and comprehensive interest scores between user behaviors and interests;

The advertisement demand determining and pushing module is used for determining the advertisement demand of the user according to the emotion type and the intensity score obtained by the user image and voice emotion analysis module and pushing the corresponding advertisement content; the advertisement demand determining and pushing module comprises a demand determining unit; the demand determining unit is used for dynamically adjusting the advertisement demand determining strategy through emotion type and intensity scores obtained by the constructed user portrait and voice emotion analyzing module, and determining the advertisement demand of the user by utilizing the adjusted determining strategy.

Specifically, the multi-source voice conversion unit adopts a multi-source voice recognition strategy, and the specific steps include:

A1, crawling dialect long voice data of different areas and public Chinese and English long voice data by utilizing a crawler technology, editing the crawled voice data of different areas by utilizing voice editing software, framing the edited dialect voice data of different areas and Chinese and English voice data sets, acquiring voice segments with the length of 25 milliseconds, and performing text labeling, scattering and denoising on all the acquired voice segments; further, in this embodiment, the labels of the dialect voice data of different areas include corresponding dialect text information and standard mandarin text information corresponding to the dialect; in this embodiment, the breaking means that the voice segments belonging to the same section of dialect long voice audio and the voice segments belonging to other dialect long voice audio are subjected to random mixing processing;

A2, preprocessing the marked and scattered voice segments by utilizing an acoustic feature processing technology, acquiring initial features of a spectrogram, initial features of a Mel filter bank and initial features of Mel frequency cepstrum coefficients corresponding to the voice segments, and utilizing original long voice data, the voice segments and the initial features of the corresponding spectrogram, the initial features of the Mel filter bank and the initial features of the Mel frequency cepstrum coefficients according to 8:2, constructing a voice conversion training set and a verification set according to the proportion;

Specifically, the multi-source voice recognition strategy further comprises the specific steps of:

A3, constructing a multi-source voice recognition conversion model, inputting the original long voice data in a training set and a verification set, corresponding voice segments, initial features of a spectrogram, initial features of a Mel filter bank and initial features of Mel frequency cepstrum coefficients into an initial feature extraction layer in the multi-source voice recognition conversion model, and outputting 1 long voice context feature and 4 voice segment initial voice features;

a4, inputting the 4 initial voice features of the voice segments into a fusion attention perception layer for fusion operation, obtaining high-dimensional fusion text features of the voice segments, inputting the obtained high-dimensional fusion text features of the voice segments into a full-connection layer, and obtaining text conversion data of the corresponding voice segments;

And A5, inputting the long voice context characteristics and the acquired corresponding voice segment text conversion data into a conversion former combination sorting layer, and combining and sorting the voice segment text conversion data belonging to the same original long voice data to acquire final long voice text conversion data.

Specifically, the emotion recognition unit and the emotion intensity evaluation unit adopt a multidimensional voice emotion recognition and evaluation strategy at the same time, and the method specifically comprises the following steps:

b1, constructing a multi-dimensional voice emotion recognition model and an evaluation model, and performing preliminary pre-training on the constructed multi-dimensional voice emotion recognition model and the evaluation model by utilizing a public voice emotion data set to acquire a multi-dimensional voice emotion recognition model and an evaluation model which are completed through pre-training;

B2, marking emotion types and emotion intensities of the voice segment obtained in the A2 and corresponding initial features of the spectrogram, the initial features of the Mel filter bank and the initial features of the Mel frequency cepstrum coefficient to obtain a multi-dimensional emotion recognition fine adjustment data set; the emotion type label is specifically as follows: marking emotion types with counter with a question voice by using-1, and marking emotion types without counter with a question voice by using +1; the emotion strength label is specifically as follows: marking the emotion intensity in each voice segment by utilizing numerical scores through expert experience;

b3, inputting the acquired multi-dimensional emotion recognition fine adjustment data set into the pre-trained multi-dimensional voice emotion recognition model and the pre-trained evaluation model for fine adjustment, and acquiring the multi-dimensional voice emotion recognition model with fine adjustment;

Specifically, the multidimensional speech emotion recognition and assessment strategy comprises the following specific steps:

The trained multidimensional voice emotion recognition model is deployed into an emotion recognition unit, real-time voice data crawled by a crawler are input into the emotion recognition unit, and emotion type and emotion intensity scores corresponding to the long voice data of the user are obtained; the emotion intensity score calculation method comprises the specific steps of firstly obtaining corresponding voice segments according to received voices, calculating the emotion intensity score of each voice segment by using a multidimensional voice emotion recognition model and an evaluation model, and obtaining the emotion intensity score corresponding to long voice data of a user by using the obtained emotion intensity score of each voice segment and adopting weighted fusion calculation;

And B5, constructing a triplet by using the long-voice text conversion data obtained in the A5 and the emotion type and emotion intensity score corresponding to the long-voice text obtained in the B4, and inputting the triplet into a knowledge graph unit for storage.

Specifically, the data mining unit calculates a comprehensive interest score between a user behavior and an interest, which includes:

C1, counting the frequency of various product keywords mentioned in corresponding user voice text information and the corresponding voice emotion intensity score of the product keywords from a constructed user voice information knowledge graph library by using a statistical method;

C2, calculating to obtain the comprehensive interest score of the user on the mentioned product at the current moment by using the obtained frequency number of the product keywords and the corresponding text information voice emotion intensity score of the product keywords and using a weighting strategy;

And C3, constructing an interest threshold section, classifying the association degree of the user and the mentioned products according to the calculated comprehensive interest score, and obtaining the comprehensive interest level of the user on different products.

Specifically, the specific steps of dynamically adjusting the advertisement demand determination policy include:

D1, setting a push score threshold, comparing the comprehensive interest score and the relevance classification result obtained by the calculation of C2 with the push score threshold, if the comprehensive interest score is larger than the push score threshold and the relevance classification result meets the push score threshold setting, generating corresponding advertisements according to user information and interested contents stored in a user voice information knowledge graph library, and pushing the generated advertisements to corresponding users by using a matching algorithm;

Setting a user satisfaction score threshold, monitoring and collecting voice data of the user after advertisement pushing in real time, calculating to obtain a satisfaction score of the user after advertisement viewing by utilizing the voice data collected in real time, and if the satisfaction score of the user after advertisement viewing is higher than or equal to the satisfaction score threshold, continuing to execute the advertisement pushing of the current content type;

If the current time integrated interest score and the relevance classification meet D1, updating the corresponding pushed advertisement content by using text information obtained by the current time voice analysis result, so that the user satisfaction score of each push advertisement is higher than the satisfaction score threshold in real time;

And D4, storing the push advertisement content before and after updating and the calculated comprehensive interest score and association degree classification into a knowledge graph, and directly calling and updating the existing advertisement to push to the corresponding user when the advertisement is pushed to the same user again.

An advertisement demand determining method based on voice analysis, comprising:

S1, preprocessing the collected voice signals by using a filtering noise reduction technology on the user voice signals collected by different platforms and areas, converting the preprocessed voice signals into standard text information in real time by using a multi-source voice recognition conversion model, processing the obtained standard text information by using an entity recognition and extraction technology to obtain corresponding keywords and keyword attributes, and inputting the obtained keywords and keyword attributes into a graph database to construct a user voice information knowledge graph base;

S2, inputting the collected voice signals and the obtained standard text information into a multidimensional voice emotion recognition model and an evaluation model, recognizing emotion types in user voices, quantitatively evaluating the recognized emotion intensities, and simultaneously feeding the obtained emotion types and emotion intensities back to a constructed user voice information knowledge graph base for updating and saving;

s3, inputting the acquired text information and voice emotion information into a data mining model, acquiring a comprehensive interest score between the user behavior and the interest, and inputting the acquired comprehensive interest score between the user behavior and the interest and data stored in a knowledge graph base into a user portrait construction unit to construct an accurate user portrait;

S4, determining a specific advertisement pushing object according to the constructed user portrait and voice emotion analysis result by the dynamic adjustment strategy, generating a corresponding advertisement according to the determined pushing object, pushing the advertisement to the determined user object, simultaneously monitoring and analyzing the pushed advertisement in real time, acquiring an advertisement pushing evaluation result, feeding the evaluation result back to the requirement determination unit, and dynamically adjusting and storing the advertisement requirement determination strategy.

A computer readable storage medium having stored thereon computer instructions which when executed perform a method of advertisement demand determination based on speech analysis.

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of advertisement demand determination based on speech analysis when executing the computer program.

Compared with the prior art, the invention has the beneficial effects that:

1. Aiming at the recognition accuracy problem caused by noisy environments, dialects, accents, speed change and the like, the multi-source voice recognition strategy remarkably improves the recognition accuracy under complex conditions by fusing voice data of different sources and combining with advanced acoustic feature processing technology; secondly, for the deep understanding problem of the semantic level, the multidimensional voice emotion recognition and assessment strategy focuses on keywords and phrases in voice, and deep mines complex language phenomena with emotion colors, such as metaphors, ironics and the like, so that the deep understanding of the semantic level is realized, the intelligent level of an advertisement pushing system is improved, and the pertinence and the effectiveness of advertisement content are enhanced through the construction and the use of the strategy;

2. according to the invention, through statistics and analysis of the frequency and emotion intensity scores of the product keywords in the user voice text information, not only is the surface interest of the user captured, but also the emotion tendency of the user to the product is deeply mined, so that a comprehensive interest score which is more comprehensive and accurate is obtained. And secondly, by constructing an interest threshold interval, the association degree between the user and the product is finely classified, and a more accurate basis is provided for advertisement pushing. In addition, the application of the dynamic adjustment strategy ensures that the determination and pushing of the advertisements can be adjusted in real time according to the feedback of the user, thereby not only improving the effect of pushing the advertisements, but also enhancing the user experience.

Drawings

FIG. 1 is a block diagram of an advertisement demand determination system based on voice analysis according to embodiment 1 of the present invention;

FIG. 2 is a diagram illustrating a speech feature generation architecture in a multi-source speech recognition strategy according to embodiment 1 of the present invention;

FIG. 3 is a diagram showing a multi-input speech conversion time-series attention model according to embodiment 1 of the present invention;

FIG. 4 is a diagram showing a fused attention perception layer in a multi-input speech conversion time-series attention model according to embodiment 1 of the present invention;

FIG. 5 is a diagram showing the structure of a multidimensional speech emotion recognition model and an evaluation model according to embodiment 1 of the present invention;

FIG. 6 is a diagram showing the operation of a system unit for determining advertisement demand based on voice analysis according to embodiment 2 of the present invention;

fig. 7 is a flowchart of an advertisement demand determining method based on voice analysis according to embodiment 3 of the present invention.

Detailed Description

Example 1

Referring to fig. 1, an embodiment of the present invention is provided: an advertising demand determination system based on speech analysis, comprising: the system comprises a voice recognition module, a natural language processing module, a voice emotion analysis module, a user portrait module, an advertisement demand determining and pushing module, a user feedback and behavior analysis module and a privacy protection and data security module;

The voice recognition module is used for collecting multi-source voice information of the user and converting the collected voice of the user into standard text information; the natural language processing module is used for carrying out semantic analysis on the text output by the voice recognition module and constructing a user voice information knowledge graph base according to the analysis result; the voice emotion analysis module is used for analyzing emotion information in the voice of the user; the user portrait module is used for mining user data information by utilizing a data mining technology according to the acquired emotion information and voice information base to construct a user portrait; the advertisement demand determining and pushing module is used for determining the advertisement demand of the user according to the emotion type and the intensity score obtained by the user image and voice emotion analysis module and pushing the corresponding advertisement content; the user feedback and behavior analysis module is used for analyzing feedback and behavior data of the user for the advertisement demand determination and pushing module to push the advertisement, and optimizing the advertisement demand determination and pushing module according to the analysis result; and the privacy protection and data security module is used for protecting the privacy of users and the security of data and ensuring the compliance and the credibility of the system.

The voice recognition module comprises a voice acquisition unit and a multi-source voice conversion unit; the natural language processing module comprises a text recognition unit and a knowledge graph unit; the voice emotion analysis module comprises an emotion recognition unit and an emotion strength evaluation unit; the user portrait module comprises a data mining unit and a user portrait construction unit;

The voice acquisition unit is used for acquiring voice signal information of users in different platforms and areas and preprocessing the acquired voice signal information by adopting a filtering noise reduction technology; the voice signal information of the users in different platforms and areas comprises dialect voice information and standard mandarin voice information of different areas; the embodiment adopts a filtering noise reduction technology to process the collected voice, wherein the noise reduction technology comprises an adaptive filtering method, a spectral subtraction method, a low-pass filter, a high-pass filter and the like;

The multi-source voice conversion unit is used for converting the filtered and noise-reduced multi-source voice signal information into text information in real time by using a deep learning algorithm; further, referring to fig. 3, the multi-source speech conversion unit of the present embodiment configures a multi-source speech recognition strategy, which specifically includes the following steps:

A2, preprocessing the marked and scattered voice segments by utilizing an acoustic feature processing technology, acquiring initial features of a spectrogram, initial features of a Mel filter bank and initial features of Mel frequency cepstrum coefficients corresponding to the voice segments, and utilizing original long voice data, the voice segments and the initial features of the corresponding spectrogram, the initial features of the Mel filter bank and the initial features of the Mel frequency cepstrum coefficients according to 8:2, constructing a voice conversion training set and a verification set according to the proportion; further, referring to fig. 2, in the embodiment, the initial feature of the spectrogram is obtained by pre-emphasizing, framing, windowing and fourier transforming the preprocessed speech segment to obtain a corresponding speech spectrum, and then logarithmically taking the obtained speech spectrum to obtain the initial feature of the spectrogram; the method comprises the steps of obtaining initial characteristics of a Mel filter bank, namely squaring an obtained voice frequency spectrum to obtain a power spectrum, filtering the power spectrum by using M Mel band-pass filters, superposing energy in each filtering frequency band after filtering, and obtaining logarithm of the energy, thereby obtaining the initial characteristics of the corresponding Mel filter bank; in this embodiment, the procedure of obtaining the initial characteristic of the mel-frequency cepstrum coefficient includes performing discrete cosine transform on the initial characteristic of the mel-filter bank to obtain a discrete cepstrum, i.e., the initial characteristic of the mel-frequency cepstrum coefficient;

A3, constructing a multi-source voice recognition conversion model, inputting the original long voice data in a training set and a verification set, corresponding voice segments, initial features of a spectrogram, initial features of a Mel filter bank and initial features of Mel frequency cepstrum coefficients into an initial feature extraction layer in the multi-source voice recognition conversion model, and outputting 1 long voice context feature and 4 voice segment initial voice features; further, the multi-source speech recognition conversion model in the embodiment comprises an initial feature extraction layer, a fusion attention perception layer and a full connection layer; the initial feature extraction layer consists of 5 parallel convolutional neural networks; referring to fig. 4, the attention sensing layer is fused, and the attention sensing layer comprises a first TCN network, a second Sigmoid function, a third TCN network, a third Sigmoid function, a fourth TCN network and a fourth Sigmoid function;

A4, inputting the 4 initial voice features of the voice segments into a fusion attention perception layer for fusion operation, obtaining high-dimensional fusion text features of the voice segments, inputting the obtained high-dimensional fusion text features of the voice segments into a full-connection layer, and obtaining text conversion data of the corresponding voice segments; referring to fig. 4, the specific workflow of the fused attention sensing layer is as follows: firstly, simultaneously acquiring long voice context characteristics, initial voice characteristics I, initial voice characteristics II, initial voice characteristics III and initial voice characteristics IV through 5 parallel convolutional neural networks in an initial characteristic extraction layer; then inputting the obtained initial voice feature I, initial voice feature II, initial voice feature III and initial voice feature IV into a fused attention sensing layer, cascading the initial voice feature II, the initial voice feature III and the initial voice feature IV in the fused attention sensing layer, and then sequentially inputting the cascaded initial voice feature II, the cascaded initial voice feature III and the cascaded initial voice feature IV into a TCN network I and a Sigmoid function I to obtain the weight of the initial voice feature I The first initial voice feature, the third initial voice feature and the fourth initial voice feature are cascaded and then sequentially input into the second TCN network and the second Sigmoid function to obtain the weight of the second initial voice featureThe first initial voice feature, the second initial voice feature and the fourth initial voice feature are cascaded and then sequentially input into a third TCN network and a third Sigmoid function to obtain the weight of the third initial voice featureThe first initial voice feature, the second initial voice feature and the third initial voice feature are cascaded and then sequentially input into the TCN network IV and the Sigmoid function IV to obtain the weight of the initial voice feature IVMultiplying the obtained weight and the corresponding initial voice feature thereof, inputting the multiplied weight and the corresponding initial voice feature into a cascading operation for adding, and obtaining the high-dimensional fusion text feature of the voice segment; the 5 parallel convolutional neural networks comprise a first convolutional sub-layer, a second convolutional sub-layer, a third convolutional sub-layer, a fourth convolutional sub-layer and a fifth convolutional sub-layer;

A5, inputting the long voice context characteristics and the acquired corresponding voice segment text conversion data into a conversion former combination sorting layer, and combining and sorting the voice segment text conversion data belonging to the same original long voice data to acquire final long voice text conversion data; in the embodiment, a transformer combination sequencing layer adopts a Bert model which is simultaneously pre-trained by utilizing a Chinese and English data set to execute a specific text combination and sequencing process;

The text recognition unit is used for carrying out primary entity recognition and relation extraction on the converted text information by using a text extraction technology; the text extraction technology of the embodiment adopts a entity identification and extraction combined algorithm based on Chinese pre-training Bert to carry out primary entity identification and relation extraction;

The knowledge graph unit is used for constructing a user voice information knowledge graph base according to the analysis result of the text recognition unit; in the embodiment, the knowledge graph unit constructs a triplet by using the entity extracted by the text recognition unit and the corresponding relation, and inputs the constructed triplet into the graph database to acquire a corresponding knowledge graph; the graph database includes Neo4j, orientDB, janusGraph, arangoDB, etc.;

the emotion recognition unit is used for recognizing emotion types in the filtered and noise-reduced multi-source voice signal information by utilizing a multi-dimensional emotion recognition technology, and determining corresponding emotion labels for advertisement pushing; the emotion strength evaluation unit is used for performing deep learning quantitative evaluation on the identified emotion to obtain the strength of the corresponding emotion type; further, the emotion recognition unit and the emotion intensity evaluation unit in this embodiment are configured with a multidimensional speech emotion recognition and evaluation strategy, and the specific steps include:

b1, constructing a multi-dimensional voice emotion recognition model and an evaluation model, and performing preliminary pre-training on the constructed multi-dimensional voice emotion recognition model and the evaluation model by utilizing a public voice emotion data set to acquire a multi-dimensional voice emotion recognition model and an evaluation model which are completed through pre-training; further, referring to fig. 5, the multi-dimensional speech emotion recognition model and the evaluation model include a specific feature coordinate residual attention layer and an output layer; the specific feature coordinate residual error attention layer consists of BiLstm, a first convolution block, a second convolution block, a third convolution block, res2Net1, res2Net2, res2Net3, res2Net4 and a CA block; the first convolution block, the second convolution block and the third convolution block have the same structure; res2Net1, res2Net2, res2Net3 and Res2Net4 have the same structure, the CA block represents cooperative attention, and in the embodiment, a convolution attention network is used as cooperative attention to perform weighted fusion on the features input by Res2Net1, res2Net2, res2Net3 and Res2Net 4; the output layer comprises 2q+2 fully-connected sublayers with the same structure and is used for outputting q+1 emotion types and emotion intensity scores corresponding to the emotion types;

B2, marking emotion types and emotion intensities of the voice segment obtained in the A2 and corresponding initial features of the spectrogram, the initial features of the Mel filter bank and the initial features of the Mel frequency cepstrum coefficient to obtain a multi-dimensional emotion recognition fine adjustment data set; the emotion type label is specifically as follows: marking emotion types with counter with a question voice by using-1, and marking emotion types without counter with a question voice by using +1; emotion type correspondence is classified into positive emotion type and negative emotion type, wherein positive emotion type includes happiness, excitement, satisfaction, happiness; negative emotion types include anger, aversion, and depression; the emotion strength label is specifically as follows: marking the emotion intensity in each voice segment by using numerical scores through expert experience, wherein the numerical scores are sequentially 1-9, and the emotion intensity is stronger when the score is higher; marking the emotion type with counter with a question voices by adopting an expert experience marking strategy, and marking the voices with emotion colors of questioning, discontent, anger or emphasis by counter with a question;

The trained multidimensional voice emotion recognition model is deployed into an emotion recognition unit, real-time voice data crawled by a crawler are input into the emotion recognition unit, and emotion type and emotion intensity scores corresponding to the long voice data of the user are obtained; the method comprises the specific steps of calculating emotion intensity scores, namely firstly obtaining corresponding voice segments according to collected voices, calculating the emotion intensity score of each voice segment by using a multidimensional voice emotion recognition model and an evaluation model, and obtaining the emotion intensity score corresponding to long voice data of a user by using the obtained emotion intensity score of each voice segment and adopting weighted fusion calculation;

The data mining unit is used for calculating the comprehensive interest score between the user behavior and the interest according to the data in the user voice information knowledge graph base by utilizing the association rule mining technology; further, the specific process of calculating the comprehensive interest score between the user behavior and the interest by the data mining unit in this embodiment includes:

C2, calculating to obtain the comprehensive interest score of the user on the mentioned product at the current moment by using the obtained frequency number of the product keywords and the corresponding text information voice emotion intensity score of the product keywords and using a weighting strategy; the specific formula for calculating the comprehensive interest score in this embodiment is: wherein Represents the comprehensive interest score corresponding to the g-th product, Q represents the total number of positive emotion types obtained by the emotion recognition unit,Represents the total frequency of the keywords of the product with positive emotion type,Frequency of product keywords with the q-th positive emotion type in the positive emotion type is represented,Representing intensity scores corresponding to the q-th positive emotion types;

C3, constructing an interest threshold interval, classifying the association degree of the user and the mentioned products according to the calculated comprehensive interest score, and obtaining comprehensive interest grades of the user on different products; the comprehensive interest scores are classified through an interest threshold interval to obtain comprehensive interest grades; wherein the threshold interval of interest comprises a score of no interest interval [0, 20], a score of greater interest interval (20, 40), a score of interest interval (40, 60], a score of greater interest interval (60, 80);

the user portrait construction unit is used for constructing accurate user portraits according to the acquired voice text data, emotion type and strength and comprehensive interest scores between user behaviors and interests, and realizing target positioning of the advertisement pushing user;

The advertisement demand determining and pushing module comprises a demand determining unit, an advertisement generating unit and an advertisement pushing unit; the user feedback and behavior analysis module comprises a feedback collection unit and a behavior analysis unit; the privacy protection and data security module comprises a data encryption unit and an access control unit;

The demand determining unit is used for dynamically adjusting an advertisement demand determining strategy according to emotion type and intensity scores obtained by the constructed user portrait and voice emotion analysis module and determining the advertisement demand of the user by utilizing the adjusted determining strategy; further, the specific steps of the demand determining unit in this embodiment for dynamically adjusting the advertisement demand determining policy include:

D1, setting a push score threshold, comparing the comprehensive interest score and the relevance classification result obtained by the calculation of C2 with the push score threshold, if the comprehensive interest score is larger than the push score threshold and the relevance classification result meets the push score threshold setting, generating corresponding advertisements according to user information and interested contents stored in a user voice information knowledge graph library, and pushing the generated advertisements to corresponding users by using a matching algorithm; further, in this embodiment, when the association degree classification comprehensive interest level meets or is higher than the corresponding score between the interested score regions, the push score threshold is set, and the push score threshold is at least 40;

Setting a user satisfaction score threshold, monitoring and collecting voice data of the user after advertisement pushing in real time, calculating to obtain a satisfaction score of the user after advertisement viewing by utilizing the voice data collected in real time, and if the satisfaction score of the user after advertisement viewing is higher than or equal to the satisfaction score threshold, continuing to execute the advertisement pushing of the current content type; in this embodiment, the satisfaction score threshold is the same as the push score threshold, and the satisfaction score is the same as the comprehensive interest score calculation process in the data mining unit; the data adopted in the process of calculating the satisfaction score is voice feedback data collected after pushing advertisement content, the collected voice feedback data is analyzed through the voice recognition module, the natural language processing module and the voice emotion analysis module, the comprehensive interest score 1 of the user for the corresponding product after pushing the advertisement is obtained through calculation, and the calculated comprehensive interest score 1 is used as the satisfaction score; for example, the advertisement pushing is an exercise product, after the advertisement pushing of the exercise product is collected, the voice feedback data of the corresponding user on the product is calculated to obtain a comprehensive interest score 1, namely a satisfaction score, and whether the advertisement pushing of the current content type is continuously executed for the corresponding user is judged;

The advertisement generating unit is used for dynamically generating advertisement contents required by corresponding users by utilizing a large language model in combination with the user portrait and the voice emotion analysis result; in the embodiment, a large language model is adopted to generate a corresponding text, wherein the large language model comprises GPT, BERT, a text dialect and the like;

The advertisement pushing unit is used for matching the result obtained by the user portrait construction unit with the generated advertisement by utilizing a matching algorithm and pushing the advertisement of the corresponding user according to the matching result; the feedback collection unit is used for collecting feedback behavior and emotion change information of the user on the push advertisement; the behavior analysis unit is used for analyzing the behavior and emotion change of the user after receiving the advertisement by utilizing a data mining technology, evaluating the advertisement pushing effect and feeding back the evaluation result to the advertisement demand determination strategy and the constructed user voice information knowledge graph base; the data encryption unit is used for carrying out high-strength encryption on the user data in the storage and transmission process and preventing the data from being leaked or acquired by an unauthorized third party; and the access control unit is used for implementing strict authority management so that sensitive data is only opened to authorized personnel.

Example 2

Referring to fig. 6, an embodiment of the present invention is provided: the advertisement demand determining system based on voice analysis corresponds to the working specific flows of a voice recognition module, a natural language processing module, a voice emotion analysis module, a user portrait module, an advertisement demand determining and pushing module, a user feedback and behavior analysis module, a privacy protection and data security module and functional units thereof, and comprises the following components:

Firstly, collecting user voice signals from different platforms and areas by utilizing a voice collecting unit, preprocessing by applying a noise reduction technology, and converting the preprocessed voice into standard text information in real time by utilizing a multi-source voice conversion unit and a deep learning algorithm; secondly, performing primary entity recognition and relation extraction on the converted text by using a text recognition unit, inputting the recognized and extracted text information into a knowledge graph unit, constructing a user voice information knowledge graph base according to the text information, and storing user preference knowledge in a structuring mode; thirdly, inputting the collected voice signals and the obtained standard text information into an emotion recognition unit, recognizing the emotion type in the voice of the user by using a multidimensional emotion analysis technology, quantifying the emotion intensity of the obtained emotion type by using an emotion intensity evaluation unit, providing accurate emotion label basis for formulating advertisement push strategies, feeding back the obtained different user voice emotion classification results and intensities to a user voice information knowledge graph library, and updating the knowledge graph; fourth, the data mining unit is utilized to analyze and obtain the comprehensive interest score between the user behavior and the interests according to the acquired text information and voice emotion information, and the user portrait construction unit is utilized to combine the comprehensive interest score between the user behavior and the interests and the text and emotion data stored in the knowledge graph base to construct a precise user portrait; fifthly, a demand determining unit determines a specific advertisement pushing object according to the user image and voice emotion analysis result, and an advertisement generating unit and an advertisement pushing unit generate and push advertisements to a determined user according to the specific advertisement pushing object and user information stored in a knowledge base; sixthly, collecting feedback information of the user on the advertisement pushing by using a feedback collecting unit, deeply analyzing behavior change of the user after receiving the advertisement by using a behavior analyzing unit by using a data mining technology, evaluating advertisement pushing effect, feeding back an evaluation result to a demand determining unit to dynamically adjust an advertisement demand determining strategy, storing the adjustment strategy into an information layer of the user corresponding to a knowledge graph, and directly calling the stored advertisement demand determining strategy when pushing the advertisement to the same user again; seventh, the data encryption unit is used for carrying out high-intensity encryption on the user data stored and transmitted in the first to sixth processes, and the access control unit is used for implementing strict authority management, so that sensitive data is ensured to be only opened to authorized personnel.

The system ensures the comprehensive acquisition and accurate interpretation of the voice information of the user through cross-platform and cross-regional voice acquisition and real-time text conversion driven by deep learning; the text recognition unit performs primary entity recognition and relation extraction, the knowledge graph unit constructs a user voice information knowledge graph library according to the initial entity recognition and relation extraction, the user preference knowledge is stored in a structured mode, the personalized requirements of the user are effectively described, a voice emotion analysis module is introduced, emotion types and intensities in user voices are recognized by using a multidimensional emotion analysis technology, the emotion types and intensities are integrated into the user knowledge graph, and accurate quantification and full utilization of emotion factors in an advertisement pushing strategy are achieved; the advertisement pushing can capture the emotion change of the user sharply, and provide product or service recommendation highly conforming to the emotion state of the user, so that the relevance and acceptance of the advertisement are greatly improved; thirdly, the data mining unit reveals potential association between user behaviors and interests through comprehensive analysis of text information and original voice signals, the boosting user portrait construction unit combines multivariate data to construct accurate user portraits, the process not only enriches user feature dimensions, but also deepens understanding of user behavior patterns and consumption trends, a solid foundation is laid for accurate matching of advertisements, fourth, the demand determination unit combines user portraits and emotion analysis results to accurately lock advertisement pushing objects, the advertisement generation unit and the advertisement pushing unit generate and push personalized advertisements according to the results, the feedback collection unit and the behavior analysis unit collect user feedback in real time, the advertisement effects are deeply analyzed, and the advertisement demand determination strategy is dynamically adjusted; the strategy is ensured to be advanced over time, and the evolution of the user demands is adapted. The adjusted strategy is directly stored in the information layer of the user corresponding to the knowledge graph, and when the advertisement is pushed to the same user again, the optimized strategy can be directly called, so that the pushing efficiency and the pushing effect are improved.

Example 3

Referring to fig. 7, another embodiment of the present invention is provided: an advertisement demand determining method based on voice analysis comprises the following specific steps:

S4, a dynamic adjustment strategy determines a specific advertisement pushing object according to the constructed user portrait and voice emotion analysis result, generates a corresponding advertisement according to the determined pushing object, pushes the advertisement to the determined user object, monitors and analyzes the pushed advertisement in real time at the same time, acquires an advertisement pushing evaluation result, feeds the evaluation result back to a requirement determination unit, and dynamically adjusts and stores the advertisement requirement determination strategy;

S5, the data encryption unit is used for carrying out high-strength encryption on the user data stored and transmitted in the S1 to the S4, and meanwhile, the access control unit is used for carrying out strict authority management, so that the user sensitive data is ensured to be in a safe state in real time.

Further, the specific process of calculating the comprehensive interest score between the user behavior and the interest by the data mining model comprises the following steps:

C2, calculating to obtain the comprehensive interest score of the user on the mentioned product at the current moment by using the obtained frequency number of the product keywords and the corresponding text information voice emotion intensity score of the product keywords;

Example 4

Example 5

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and variations, modifications, substitutions and alterations of the above-described embodiments may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the present invention as defined by the claims, which are all within the scope of the present invention.

Claims

1. An advertising demand determination system based on speech analysis, comprising: the system comprises a voice recognition module, a natural language processing module, a voice emotion analysis module, a user portrait module and an advertisement demand determining and pushing module;

The user portrait module is used for mining user data information by utilizing a data mining technology according to the acquired emotion information and voice information base to construct a user portrait; the user portrait module comprises a data mining unit and a user portrait construction unit; the data mining unit is used for calculating comprehensive interest scores between user behaviors and interests according to data in the user voice information knowledge graph base by using an association rule mining technology; the user portrait construction unit is used for constructing a user portrait according to the acquired user voice information knowledge graph base, emotion type and strength and comprehensive interest scores between user behaviors and interests;

2. The advertisement demand determination system based on voice analysis according to claim 1, wherein the multi-source voice conversion unit is configured with a multi-source voice recognition strategy, and the specific steps include:

A1, crawling dialect long voice data of different areas and public Chinese and English long voice data by utilizing a crawler technology, editing the crawled voice data of different areas by utilizing voice editing software, framing the edited dialect voice data of different areas and Chinese and English voice data sets, acquiring voice segments with the length of 25 milliseconds, and performing text labeling, scattering and denoising on all the acquired voice segments;

A2, preprocessing the marked and scattered voice segments by utilizing an acoustic feature processing technology, acquiring initial features of a spectrogram, initial features of a Mel filter bank and initial features of Mel frequency cepstrum coefficients corresponding to the voice segments, and utilizing original long voice data, the voice segments and the initial features of the corresponding spectrogram, the initial features of the Mel filter bank and the initial features of the Mel frequency cepstrum coefficients according to 8:2 to construct a speech conversion training set and a validation set.

3. The advertisement demand determination system based on voice analysis of claim 2, wherein the multi-source voice recognition strategy further comprises the steps of:

4. The advertisement demand determination system based on voice analysis according to claim 3, wherein the emotion recognition unit and emotion intensity evaluation unit are simultaneously configured with a multidimensional voice emotion recognition and evaluation strategy, and the specific steps include:

b2, marking emotion types and emotion intensities of the voice segment obtained in the A2 and corresponding initial features of the spectrogram, the initial features of the Mel filter bank and the initial features of the Mel frequency cepstrum coefficient to obtain a multi-dimensional emotion recognition fine adjustment data set;

And B3, inputting the acquired multi-dimensional emotion recognition fine adjustment data set into the pre-trained multi-dimensional voice emotion recognition model and the pre-trained evaluation model for fine adjustment, and acquiring the multi-dimensional voice emotion recognition model with fine adjustment.

5. The advertisement demand determination system based on voice analysis of claim 4, wherein the multidimensional voice emotion recognition and assessment strategy further comprises the steps of:

The trained multidimensional voice emotion recognition model is deployed into an emotion recognition unit, real-time voice data crawled by a crawler are input into the emotion recognition unit, and emotion type and emotion intensity scores corresponding to the long voice data of the user are obtained;

6. The advertisement demand determination system based on voice analysis according to claim 5, wherein the data mining unit calculates a comprehensive interest score between the user's behavior and interest, comprising:

7. The advertisement demand determination system based on voice analysis of claim 6, wherein the dynamically adjusting advertisement demand determination policy comprises:

8. A method for determining advertisement demand based on voice analysis, which is implemented based on the advertisement demand determining system based on voice analysis according to any one of claims 1 to 7, characterized in that the steps include:

9. A computer readable storage medium having stored thereon computer instructions which when executed perform a method of determining advertisement demand based on speech analysis as claimed in claim 8.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements a method for determining advertisement demand based on speech analysis as claimed in claim 8.