CN116320621B - NLP-based streaming media content analysis method and system - Google Patents
NLP-based streaming media content analysis method and system Download PDFInfo
- Publication number
- CN116320621B CN116320621B CN202310554226.5A CN202310554226A CN116320621B CN 116320621 B CN116320621 B CN 116320621B CN 202310554226 A CN202310554226 A CN 202310554226A CN 116320621 B CN116320621 B CN 116320621B
- Authority
- CN
- China
- Prior art keywords
- information
- streaming media
- text
- nouns
- evaluation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 25
- 238000011156 evaluation Methods 0.000 claims abstract description 149
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000012216 screening Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 20
- 238000003062 neural network model Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 15
- 230000007935 neutral effect Effects 0.000 claims description 13
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000001939 inductive effect Effects 0.000 claims description 5
- 230000010365 information processing Effects 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 description 28
- 230000008569 process Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 235000019633 pungent taste Nutrition 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000008719 thickening Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
- H04N21/440236—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44204—Monitoring of content usage, e.g. the number of times a movie has been viewed, copied or the amount which has been watched
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/454—Content or additional data filtering, e.g. blocking advertisements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4662—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
- H04N21/4666—Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/84—Generation or processing of descriptive data, e.g. content descriptors
- H04N21/8405—Generation or processing of descriptive data, e.g. content descriptors represented by keywords
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention is applicable to the technical field of information processing, and provides a streaming media content analysis method and system based on NLP, wherein the method comprises the following steps: receiving a search keyword input by a user, and determining a matched streaming media video according to the search keyword; screening the streaming media videos according to the heat value, processing the screened streaming media videos, and determining text information corresponding to each streaming media video; receiving a function keyword input by a user, summarizing the function keyword and the search keyword into nouns, extracting adjectives and nouns in each piece of text information based on NLP, binding a noun for each adjective, and determining content evaluation information of the text information; and analyzing and integrating all the content evaluation information to obtain stream media evaluation information, and carrying out special marking on the evaluation content of the functional keywords in the stream media evaluation information. According to the invention, the streaming media evaluation information is automatically obtained, and the streaming media evaluation information can accurately reflect the overall public opinion guidance.
Description
Technical Field
The invention relates to the technical field of information processing, in particular to a streaming media content analysis method and system based on NLP.
Background
When a new product is released or marketed, the knowledge of the streaming media content guidance is important for strategic layout adjustment of the new product, and along with the rising of short videos, accurate analysis is required to be performed on the content of the streaming media video, so that manufacturers can know the public opinion of the new product in time, and at present, more accurate public opinion analysis is difficult to automatically perform on a large amount of streaming media video content. Therefore, a method and a system for analyzing a streaming media content based on NLP are needed to solve the above problems.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide a streaming media content analysis method and a streaming media content analysis system based on NLP, so as to solve the problems existing in the background art.
The invention is realized in such a way that a streaming media content analysis method based on NLP comprises the following steps:
receiving a search keyword input by a user, and determining a matched streaming media video according to the search keyword;
screening the streaming media videos according to the heat value, processing the screened streaming media videos, and determining text information corresponding to each streaming media video;
receiving a function keyword input by a user, inducing the function keyword and the search keyword into nouns,
extracting adjectives and nouns in each text message based on NLP, binding a noun for each adjective, and determining content evaluation information of the text message;
and analyzing and integrating all the content evaluation information to obtain stream media evaluation information, and carrying out special marking on the evaluation content of the functional keywords in the stream media evaluation information.
As a further scheme of the invention: the step of processing the filtered streaming media videos and determining text information corresponding to each streaming media video specifically comprises the following steps:
judging whether the screened streaming media video has subtitle information or not;
when subtitle information exists, performing text recognition on the subtitle information in the streaming media video to obtain text information;
when the subtitle information does not exist, the audio information of the streaming media video is acquired, and the audio information is subjected to voice conversion to obtain text information.
As a further scheme of the invention: the step of extracting adjectives and nouns in each text message based on NLP specifically comprises the following steps:
determining the influence degree of a streaming media video author corresponding to the text information;
when the influence degree is smaller than or equal to the set influence value, extracting adjectives and nouns in the text information by using a word segmentation tool, and carrying out position marking on the extracted adjectives and nouns;
when the influence degree is larger than a set influence value, training corpus information is received, feature learning is conducted on the training corpus information based on the CNN-LSTM model to obtain an exclusive neural network model, text information is processed through the exclusive neural network model to obtain adjectives and nouns, and position marking is conducted on the obtained adjectives and nouns.
As a further scheme of the invention: the step of binding a noun for each adjective and determining the content evaluation information of the text information specifically comprises the following steps:
binding a noun for each adjective according to the position mark, and determining the part of speech of each adjective, wherein the part of speech comprises an identification word, a detraction word and a neutral word;
classifying all adjectives according to nouns to obtain a plurality of categories, wherein nouns corresponding to each category are identical;
and determining a text evaluation value of the text information, wherein the text evaluation value=the number of a×sense words+the number of b×devaluation words+the number of c×neutral words, and the category and the text evaluation value form content evaluation information.
As a further scheme of the invention: the step of analyzing and integrating all the content evaluation information to obtain the streaming media evaluation information specifically comprises the following steps:
integrating the categories in all the content evaluation information, and merging the categories corresponding to the same noun;
the influence degree of the streaming media video authors corresponding to each text evaluation value is called;
and determining an overall evaluation value, wherein the overall evaluation value is = Σtext evaluation value multiplied by influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.
Another object of the present invention is to provide an NLP-based streaming content analysis system, the system comprising:
the streaming media video determining module is used for receiving a search keyword input by a user and determining matched streaming media videos according to the search keyword;
the text information acquisition module is used for screening the streaming media videos according to the heat value, processing the screened streaming media videos and determining text information corresponding to each streaming media video;
a function keyword input module for receiving the function keywords input by the user, inducing the function keywords and the search keywords into nouns,
the adjective noun determining module is used for extracting adjectives and nouns in each piece of text information based on NLP, binding a noun for each adjective, and determining content evaluation information of the text information;
and the streaming media evaluation information module is used for analyzing and integrating all the content evaluation information to obtain streaming media evaluation information, and specially marking the evaluation content of the functional keywords in the streaming media evaluation information.
As a further scheme of the invention: the text information acquisition module comprises:
the subtitle information judging unit is used for judging whether the screened streaming media video has subtitle information or not;
the first text information unit is used for carrying out text recognition on the subtitle information in the streaming media video to obtain text information when the subtitle information exists;
and the second text information unit is used for acquiring the audio information of the streaming media video when the subtitle information does not exist, and carrying out voice conversion on the audio information to obtain text information.
As a further scheme of the invention: the adjective noun determination module includes:
the influence degree determining unit is used for determining the influence degree of the streaming media video author corresponding to the text information;
a first adjective noun unit, when the influence degree is smaller than or equal to a set influence value, extracting adjectives and nouns in the text information by using an adjective tool, and carrying out position marking on the extracted adjectives and nouns;
and the second adjective noun unit is used for receiving training corpus information when the influence degree is larger than a set influence value, performing feature learning on the training corpus information based on the CNN-LSTM model to obtain a proprietary neural network model, processing text information through the proprietary neural network model to obtain adjectives and nouns, and performing position marking on the obtained adjectives and nouns.
As a further scheme of the invention: the adjective noun determination module further includes:
an adjective noun binding unit, configured to bind a noun for each adjective according to the position mark, and determine a part of speech of each adjective, where the part of speech includes an identification word, a disambiguation word, and a neutral word;
the adjective classification unit is used for classifying all adjectives according to nouns to obtain a plurality of categories, and nouns corresponding to each category are the same;
and a text evaluation value unit for determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the positive words+b×the number of the negative words+c×the number of the neutral words, and the category and the text evaluation value form content evaluation information.
As a further scheme of the invention: the streaming media evaluation information module comprises:
the category integrating unit is used for integrating the categories in all the content evaluation information and combining the categories corresponding to the same noun;
the influence degree calling unit is used for calling the influence degree of the streaming media video author corresponding to each text evaluation value;
and the overall evaluation value unit is used for determining an overall evaluation value, wherein the overall evaluation value is = Σtext evaluation value multiplied by influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.
Compared with the prior art, the invention has the beneficial effects that:
the invention processes the screened streaming media video to determine the text information corresponding to each streaming media video; the functional keywords and the search keywords input by the user are generalized into nouns, adjectives and nouns in each text message are extracted based on NLP, a noun is bound for each adjective, and content evaluation information of the text message is determined; and analyzing and integrating all the content evaluation information to obtain the streaming media evaluation information. Thus, the streaming media evaluation information can be automatically analyzed and obtained, and the streaming media evaluation information can accurately reflect the overall public opinion guidance.
Drawings
Fig. 1 is a flowchart of a method for analyzing a streaming media content based on NLP.
Fig. 2 is a flowchart of determining text information of a streaming video in an NLP-based streaming content analysis method.
Fig. 3 is a flowchart of extracting adjectives and nouns in each text message in an NLP-based streaming media content analysis method.
Fig. 4 is a flowchart of a method for analyzing a streaming media content based on NLP, in which a noun is bound for each adjective.
Fig. 5 is a flowchart of obtaining streaming media evaluation information in an NLP-based streaming media content analysis method.
Fig. 6 is a schematic structural diagram of an NLP-based streaming media content analysis system.
Fig. 7 is a schematic structural diagram of a text information acquisition module in an NLP-based streaming media content analysis system.
Fig. 8 is a schematic structural diagram of an adjective noun determining module in an NLP-based streaming media content analysis system.
Fig. 9 is a schematic structural diagram of a streaming media evaluation information module in an NLP-based streaming media content analysis system.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Specific implementations of the invention are described in detail below in connection with specific embodiments.
As shown in fig. 1, an embodiment of the present invention provides a method for analyzing streaming media content based on NLP, which includes the following steps:
s100, receiving a search keyword input by a user, and determining a matched streaming media video according to the search keyword;
s200, screening the streaming media videos according to the heat value, processing the screened streaming media videos, and determining text information corresponding to each streaming media video;
s300, receiving the function keywords input by the user, inducing the function keywords and the search keywords into nouns,
s400, extracting adjectives and nouns in each text message based on NLP, binding a noun for each adjective, and determining content evaluation information of the text message;
s500, analyzing and integrating all the content evaluation information to obtain stream media evaluation information, and carrying out special marking on the evaluation content of the functional keywords in the stream media evaluation information.
In the embodiment of the invention, when a manufacturer needs to know the public opinion of a new product, a search keyword is input, the search keyword can be a new product name, a streaming media video platform can determine a plurality of matched streaming media videos according to the search keyword, then the embodiment of the invention can screen the streaming media videos according to a heat value, the heat value is related to the praise amount, comment amount and forwarding amount of the streaming media videos, the streaming media videos with higher heat value are reserved, and the screened streaming media videos are processed to determine text information corresponding to each streaming media video; then, the user needs to input a functional keyword, the functional keyword is a new push function in a new product, and the new push function is a bright product point of the manufacturer, the embodiment of the invention can sum up the functional keyword and the search keyword into nouns, then the embodiment of the invention can extract adjectives and nouns in each text message based on a natural language processing technology (NLP), bind a noun for each adjective, indicate that the adjectives describe the nouns, obtain content evaluation information of the text message, finally analyze and integrate all the content evaluation information to obtain streaming media evaluation information, the streaming media evaluation information can reflect the overall public opinion guide, and make special marks, such as thickening, of the evaluation content of the functional keyword in the streaming media evaluation information, so that a manufacturer staff can conveniently see market effects of the new function at a glance, and the evaluation content of the functional keyword is the adjective corresponding to the functional keyword.
As shown in fig. 2, as a preferred embodiment of the present invention, the step of processing the filtered streaming video to determine text information corresponding to each streaming video specifically includes:
s201, judging whether subtitle information exists in the screened streaming media video;
s202, when subtitle information exists, performing text recognition on the subtitle information in the streaming media video to obtain text information;
and S203, when the subtitle information does not exist, acquiring the audio information of the streaming media video, and performing voice-to-text conversion on the audio information to obtain text information.
In the embodiment of the invention, in order to obtain text information, whether the screened streaming media video contains subtitle information is required to be judged, and if the screened streaming media video contains subtitle information, text information can be obtained by directly carrying out text recognition on the subtitle information in the streaming media video; if the subtitle information does not exist, the audio information of the streaming media video is required to be called, noise reduction processing is carried out on the audio information, and then voice conversion is carried out to obtain text information.
As shown in fig. 3, as a preferred embodiment of the present invention, the step of extracting adjectives and nouns in each text message based on NLP specifically includes:
s401, determining the influence degree of a streaming media video author corresponding to text information;
s402, extracting adjectives and nouns in text information by using a word segmentation tool when the influence degree is smaller than or equal to a set influence value, and carrying out position marking on the extracted adjectives and nouns;
and S403, when the influence degree is larger than the set influence value, receiving training corpus information, performing feature learning on the training corpus information based on the CNN-LSTM model to obtain a proprietary neural network model, processing text information through the proprietary neural network model to obtain adjectives and nouns, and performing position marking on the adjectives and nouns.
In the embodiment of the invention, the influence degree of the streaming media video author corresponding to each text message needs to be determined, the influence degree is determined according to the praise amount and the vermicelli amount of the video author, the influence degree=m×praise amount and +n×vermicelli amount, M and N are fixed values, when the influence degree is smaller than or equal to a set influence value, adjectives and nouns in the text message are directly extracted by using a word segmentation tool, the extracted adjectives and nouns are subjected to position marking, the position marking is used for indicating the position in the text message, and the word segmentation tool can use jieba, hanlp, ansj or standby. When the influence degree is larger than a set influence value, an exclusive neural network model of the streaming media video author is required to be built, so that analysis can be more accurate, in addition, the video author with larger influence degree in each field is limited, the limited exclusive neural network model is built, the streaming media video author can be always used after the first time of building is finished, during building, a user is required to upload training corpus information, the training corpus information is obtained according to the previous video of the video author, and then feature learning is carried out on the training corpus information based on a CNN-LSTM model to obtain the exclusive neural network model, so that the exclusive neural network model can carry out better semantic analysis on the video content of the video author.
As shown in fig. 4, as a preferred embodiment of the present invention, the step of binding a noun for each adjective and determining content rating information of the text information specifically includes:
s404, binding a noun for each adjective according to the position mark, and determining the part of speech of each adjective, wherein the part of speech comprises an identification word, a detraction word and a neutral word;
s405, classifying all adjectives according to nouns to obtain a plurality of categories, wherein nouns corresponding to each category are the same;
and S406, determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the sense words+b×the number of the devaluation words+c×the number of the neutral words, and the category and the text evaluation value form content evaluation information.
In the embodiment of the invention, a noun is bound for each adjective according to the position mark, the bound noun is the noun with the nearest position of the adjective in the same sentence, the part of speech of each adjective is determined, and the adjective can be input into an electronic dictionary to obtain the part of speech; and classifying all adjectives according to nouns, wherein nouns corresponding to each category are the same, forming a table, wherein the first column is the noun, the second column is the adjective corresponding to the noun, finally determining a text evaluation value of the text information, wherein the text evaluation value = a x the number of the positive words + b x the number of the negative words + c x the number of the neutral words, and the values of a, b and c are all definite values.
As shown in fig. 5, as a preferred embodiment of the present invention, the step of analyzing and integrating all content evaluation information to obtain streaming media evaluation information specifically includes:
s501, integrating the categories in all content evaluation information, and merging the categories corresponding to the same noun;
s502, the influence degree of the streaming media video authors corresponding to each text evaluation value is called;
s503, determining an overall evaluation value, wherein the overall evaluation value= Σtext evaluation value×influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.
In the embodiment of the invention, the content evaluation information corresponding to the screened streaming media video is integrated, the overall evaluation value is determined, and the overall evaluation value is accumulated after being equal to the total text evaluation value multiplied by the corresponding influence, and the overall evaluation value reflects the quality of the overall public opinion.
As shown in fig. 6, the embodiment of the present invention further provides a streaming media content analysis system based on NLP, where the system includes:
the streaming media video determining module 100 is configured to receive a search keyword input by a user, and determine a matched streaming media video according to the search keyword;
the text information acquisition module 200 is configured to screen the streaming media video according to the hotness value, process the screened streaming media video, and determine text information corresponding to each streaming media video;
a function keyword input module 300 for receiving a function keyword input by a user, generalizing the function keyword and the search keyword into nouns,
adjective noun determination module 400 extracts adjectives and nouns in each text message based on NLP, binds a noun for each adjective, and determines content evaluation information of the text message;
the streaming media evaluation information module 500 is configured to analyze and integrate all the content evaluation information to obtain streaming media evaluation information, and specially mark the evaluation content of the functional keywords in the streaming media evaluation information.
In the embodiment of the invention, when a manufacturer needs to know the public opinion of a new product, a search keyword is input, the search keyword can be a new product name, a streaming media video platform can determine a plurality of matched streaming media videos according to the search keyword, then the embodiment of the invention can screen the streaming media videos according to a heat value, the heat value is related to the praise amount, comment amount and forwarding amount of the streaming media videos, the streaming media videos with higher heat value are reserved, and the screened streaming media videos are processed to determine text information corresponding to each streaming media video; then, the user needs to input a functional keyword, wherein the functional keyword is a new push function in a new product, and is a bright product point of the manufacturer, the embodiment of the invention can sum up the functional keyword and the search keyword into nouns, then the embodiment of the invention can extract adjectives and nouns in each text message based on a Natural Language Processing (NLP), bind a noun for each adjective, indicate that the adjectives describe the noun, obtain content evaluation information of the text message, finally analyze and integrate all the content evaluation information to obtain streaming media evaluation information, the streaming media evaluation information can reflect the overall public opinion guide, and make special marks on the evaluation content of the functional keyword in the streaming media evaluation information, so that the manufacturer can conveniently see the market effect of the new function at a glance, and the evaluation content of the functional keyword is the adjective corresponding to the functional keyword.
As shown in fig. 7, as a preferred embodiment of the present invention, the text information acquiring module 200 includes:
a caption information determining unit 201, configured to determine whether the filtered streaming video has caption information;
a first text information unit 202, configured to perform text recognition on the subtitle information in the streaming media video to obtain text information when the subtitle information exists;
and the second text information unit 203 is configured to obtain audio information of the streaming video when the subtitle information does not exist, and perform voice-to-text conversion on the audio information to obtain text information.
As shown in fig. 8, as a preferred embodiment of the present invention, the adjective noun determining module 400 includes:
an influence degree determining unit 401, configured to determine an influence degree of a streaming media video author corresponding to the text information;
a first adjective noun unit 402 that extracts adjectives and nouns in the text information using the word segmentation tool and position-marks the extracted adjectives and nouns when the influence degree is less than or equal to a set influence value;
the second adjective noun unit 403 is configured to receive training corpus information when the influence degree is greater than the set influence value, perform feature learning on the training corpus information based on the CNN-LSTM model to obtain an exclusive neural network model, process text information through the exclusive neural network model to obtain adjectives and nouns, and perform position marking on the obtained adjectives and nouns.
As shown in fig. 8, as a preferred embodiment of the present invention, the adjective noun determining module 400 further includes:
an adjective noun binding unit 404, configured to bind a noun for each adjective according to the position mark, and determine a part of speech of each adjective, where the part of speech includes an identification word, a disambiguation word, and a neutral word;
an adjective classification unit 405, configured to classify all adjectives according to nouns, so as to obtain a plurality of categories, where nouns corresponding to each category are the same;
a text evaluation value unit 406, configured to determine a text evaluation value of the text information, where the text evaluation value=a×the number of positive words+b×the number of negative words+c×the number of neutral words, and the category and the text evaluation value form content evaluation information.
As shown in fig. 9, as a preferred embodiment of the present invention, the streaming media evaluation information module 500 includes:
a category integrating unit 501, configured to integrate categories in all content evaluation information, and combine categories corresponding to the same noun;
the influence degree retrieving unit 502 is configured to retrieve the influence degree of the streaming media video author corresponding to each text evaluation value;
the overall evaluation value unit 503 is configured to determine an overall evaluation value, where the overall evaluation value= Σtext evaluation value×influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.
The foregoing description of the preferred embodiments of the present invention should not be taken as limiting the invention, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
Claims (8)
1. A method for analyzing streaming media content based on NLP, the method comprising the steps of:
receiving a search keyword input by a user, and determining a matched streaming media video according to the search keyword;
screening the streaming media videos according to the heat value, processing the screened streaming media videos, and determining text information corresponding to each streaming media video;
receiving a function keyword input by a user, inducing the function keyword and the search keyword into nouns,
extracting adjectives and nouns in each text message based on NLP, binding a noun for each adjective, and determining content evaluation information of the text message;
analyzing and integrating all the content evaluation information to obtain stream media evaluation information, and carrying out special marking on the evaluation content of the functional keywords in the stream media evaluation information;
the step of processing the filtered streaming media videos and determining text information corresponding to each streaming media video specifically includes: judging whether the screened streaming media video has subtitle information or not; when subtitle information exists, performing text recognition on the subtitle information in the streaming media video to obtain text information; when the subtitle information does not exist, the audio information of the streaming media video is acquired, and the audio information is subjected to voice conversion to obtain text information.
2. The method for analyzing the streaming media content based on the NLP according to claim 1, wherein the step of extracting adjectives and nouns in each text message based on the NLP comprises the following steps:
determining the influence degree of a streaming media video author corresponding to the text information;
when the influence degree is smaller than or equal to the set influence value, extracting adjectives and nouns in the text information by using a word segmentation tool, and carrying out position marking on the extracted adjectives and nouns;
when the influence degree is larger than a set influence value, training corpus information is received, feature learning is conducted on the training corpus information based on the CNN-LSTM model to obtain an exclusive neural network model, text information is processed through the exclusive neural network model to obtain adjectives and nouns, and position marking is conducted on the obtained adjectives and nouns.
3. The NLP-based streaming media content analysis method of claim 2, wherein the step of binding a noun for each adjective and determining the content rating information of the text information comprises the following steps:
binding a noun for each adjective according to the position mark, and determining the part of speech of each adjective, wherein the part of speech comprises an identification word, a detraction word and a neutral word;
classifying all adjectives according to nouns to obtain a plurality of categories, wherein nouns corresponding to each category are identical;
and determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the sense words+b×the number of the devaluation words+c×the number of the neutral words, the category and the text evaluation value form content evaluation information, and a, b and c are all constant values.
4. The method for analyzing and integrating NLP-based streaming media contents according to claim 3, wherein the step of analyzing and integrating all the content evaluation information to obtain the streaming media evaluation information comprises the following steps:
integrating the categories in all the content evaluation information, and merging the categories corresponding to the same noun;
the influence degree of the streaming media video authors corresponding to each text evaluation value is called;
and determining an overall evaluation value, wherein the overall evaluation value is = Σtext evaluation value multiplied by influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.
5. A NLP-based streaming media content analysis system, the system comprising:
the streaming media video determining module is used for receiving a search keyword input by a user and determining matched streaming media videos according to the search keyword;
the text information acquisition module is used for screening the streaming media videos according to the heat value, processing the screened streaming media videos and determining text information corresponding to each streaming media video;
a function keyword input module for receiving the function keywords input by the user, inducing the function keywords and the search keywords into nouns,
the adjective noun determining module is used for extracting adjectives and nouns in each piece of text information based on NLP, binding a noun for each adjective, and determining content evaluation information of the text information;
the streaming media evaluation information module is used for analyzing and integrating all the content evaluation information to obtain streaming media evaluation information, and specially marking the evaluation content of the functional keywords in the streaming media evaluation information;
the text information acquisition module comprises: the subtitle information judging unit is used for judging whether the screened streaming media video has subtitle information or not; the first text information unit is used for carrying out text recognition on the subtitle information in the streaming media video to obtain text information when the subtitle information exists; and the second text information unit is used for acquiring the audio information of the streaming media video when the subtitle information does not exist, and carrying out voice conversion on the audio information to obtain text information.
6. The NLP-based streaming media content analysis system of claim 5, wherein the adjective noun determination module comprises:
the influence degree determining unit is used for determining the influence degree of the streaming media video author corresponding to the text information;
a first adjective noun unit, when the influence degree is smaller than or equal to a set influence value, extracting adjectives and nouns in the text information by using an adjective tool, and carrying out position marking on the extracted adjectives and nouns;
and the second adjective noun unit is used for receiving training corpus information when the influence degree is larger than a set influence value, performing feature learning on the training corpus information based on the CNN-LSTM model to obtain a proprietary neural network model, processing text information through the proprietary neural network model to obtain adjectives and nouns, and performing position marking on the obtained adjectives and nouns.
7. The NLP-based streaming media content analysis system of claim 6, wherein the adjective noun determination module further comprises:
an adjective noun binding unit, configured to bind a noun for each adjective according to the position mark, and determine a part of speech of each adjective, where the part of speech includes an identification word, a disambiguation word, and a neutral word;
the adjective classification unit is used for classifying all adjectives according to nouns to obtain a plurality of categories, and nouns corresponding to each category are the same;
and the text evaluation value unit is used for determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the positive words+b×the number of the negative words+c×the number of the neutral words, the category and the text evaluation value form content evaluation information, and a, b and c are all constant values.
8. The NLP-based streaming content analysis system of claim 7, wherein the streaming rating information module comprises:
the category integrating unit is used for integrating the categories in all the content evaluation information and combining the categories corresponding to the same noun;
the influence degree calling unit is used for calling the influence degree of the streaming media video author corresponding to each text evaluation value;
and the overall evaluation value unit is used for determining an overall evaluation value, wherein the overall evaluation value is = Σtext evaluation value multiplied by influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310554226.5A CN116320621B (en) | 2023-05-17 | 2023-05-17 | NLP-based streaming media content analysis method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310554226.5A CN116320621B (en) | 2023-05-17 | 2023-05-17 | NLP-based streaming media content analysis method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116320621A CN116320621A (en) | 2023-06-23 |
CN116320621B true CN116320621B (en) | 2023-08-04 |
Family
ID=86794504
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310554226.5A Active CN116320621B (en) | 2023-05-17 | 2023-05-17 | NLP-based streaming media content analysis method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116320621B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103455562A (en) * | 2013-08-13 | 2013-12-18 | 西安建筑科技大学 | Text orientation analysis method and product review orientation discriminator on basis of same |
CN112991017A (en) * | 2021-03-26 | 2021-06-18 | 刘秀萍 | Accurate recommendation method for label system based on user comment analysis |
CN114970494A (en) * | 2021-02-25 | 2022-08-30 | 腾讯科技(北京)有限公司 | Comment generation method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150186790A1 (en) * | 2013-12-31 | 2015-07-02 | Soshoma Inc. | Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews |
-
2023
- 2023-05-17 CN CN202310554226.5A patent/CN116320621B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103455562A (en) * | 2013-08-13 | 2013-12-18 | 西安建筑科技大学 | Text orientation analysis method and product review orientation discriminator on basis of same |
CN114970494A (en) * | 2021-02-25 | 2022-08-30 | 腾讯科技(北京)有限公司 | Comment generation method and device, electronic equipment and storage medium |
CN112991017A (en) * | 2021-03-26 | 2021-06-18 | 刘秀萍 | Accurate recommendation method for label system based on user comment analysis |
Also Published As
Publication number | Publication date |
---|---|
CN116320621A (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210200961A1 (en) | Context-based multi-turn dialogue method and storage medium | |
JP5167546B2 (en) | Sentence search method, sentence search device, computer program, recording medium, and document storage device | |
KR102283134B1 (en) | METHOD AND APPARATUS FOR GENERATING READING DOCUMENT Of MINUTES | |
Braz et al. | Document classification using a Bi-LSTM to unclog Brazil's supreme court | |
CN107958068B (en) | Language model smoothing method based on entity knowledge base | |
CN111382570A (en) | Text entity recognition method and device, computer equipment and storage medium | |
US11907656B2 (en) | Machine based expansion of contractions in text in digital media | |
CN114117038A (en) | Document classification method, device and system and electronic equipment | |
CN111291535B (en) | Scenario processing method and device, electronic equipment and computer readable storage medium | |
CN117725182A (en) | Data retrieval method, device, equipment and storage medium based on large language model | |
CN116320621B (en) | NLP-based streaming media content analysis method and system | |
CN113128205A (en) | Script information processing method and device, electronic equipment and storage medium | |
CN117216214A (en) | Question and answer extraction generation method, device, equipment and medium | |
CN109992778A (en) | Resume document method of discrimination and device based on machine learning | |
CN113609864B (en) | Text semantic recognition processing system and method based on industrial control system | |
Shahbazi et al. | Computing focus time of paragraph using deep learning | |
CN111164589A (en) | Emotion marking method, device and equipment of speaking content and storage medium | |
CN110321404B (en) | Vocabulary entry selection method and device for vocabulary learning, electronic equipment and storage medium | |
CN112905763A (en) | Session system development method, device, computer equipment and storage medium | |
Sanosi et al. | Automated Identification of Discourse Markers Using the NLP Approach: The Case of" Okay". | |
CN117953533B (en) | Efficient extraction method and system for document pages | |
CN115358158B (en) | Method, system and equipment for detecting standardization of rail transit BIM model | |
CN112331211B (en) | Learning situation information acquisition method, device, equipment and storage medium | |
CN112989042B (en) | Hot topic extraction method and device, computer equipment and storage medium | |
CN116778930A (en) | Prosody annotation data quality inspection method, prosody annotation data quality inspection device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |