WO2020155496A1 - Public opinion tracking method and device for combined video-text data, and computer apparatus - Google Patents

Public opinion tracking method and device for combined video-text data, and computer apparatus Download PDF

Info

Publication number
WO2020155496A1
WO2020155496A1 PCT/CN2019/089609 CN2019089609W WO2020155496A1 WO 2020155496 A1 WO2020155496 A1 WO 2020155496A1 CN 2019089609 W CN2019089609 W CN 2019089609W WO 2020155496 A1 WO2020155496 A1 WO 2020155496A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
video
public opinion
text
public
Prior art date
Application number
PCT/CN2019/089609
Other languages
French (fr)
Chinese (zh)
Inventor
吴壮伟
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155496A1 publication Critical patent/WO2020155496A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Definitions

  • This application relates to the field of data analysis technology, and in particular to a public opinion tracking method, device and computer equipment for video text combined data.
  • the current methods of monitoring public opinion on the Internet use keyword matching or text OCR recognition technology to capture relevant news, social dynamics, and netizen comments on various platforms on the Internet.
  • keyword matching or text OCR recognition technology to capture relevant news, social dynamics, and netizen comments on various platforms on the Internet.
  • the volume is only the volume of the text, and the content that only contains the video without the relevant keyword cannot be searched.
  • the main purpose of this application is to provide a public opinion tracking method, device and computer equipment for video text combined data, which aims to solve the disadvantages of the existing public opinion monitoring methods that cannot achieve public opinion tracking on video.
  • this application provides a public opinion tracking method for video text combined data, including:
  • a first preset frequency obtain designated public data from a first preset network platform, the designated public data including all the public information of the first preset network platform and the propagation path corresponding to each public information ,
  • the public information includes individual text information with only text, individual video information with only video, and combined video and text information related to the video and the text;
  • the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than the first preset value is filtered in the public database; and according to the video source address and the video similarity, The public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value is filtered from the public database;
  • the popularity trend of the preset public opinion topic is obtained.
  • This application also provides a public opinion tracking device for video text combined data, including:
  • the first obtaining module is configured to obtain designated public data from the first preset network platform according to the first preset frequency
  • the first screening module is configured to filter the public opinion video text combination data corresponding to the preset public opinion topic in the public database;
  • the second filtering module is configured to filter the public opinion text data with the text data of the public opinion video text combined data that is higher than the first preset value in the public database according to the text similarity; and according to the video source Address and video similarity, filtering the public opinion video data with the video data of the public opinion video text combination data that has a similarity higher than a second preset value from the public database;
  • the first generating module is configured to obtain public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
  • the analysis module is used to obtain the trend of the hotness change of the preset public opinion topic according to the public opinion data.
  • the present application also provides a computer device, including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the steps of any one of the foregoing methods when the computer-readable instructions are executed by the processor.
  • the present application also provides a computer non-volatile readable storage medium, on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the steps of any one of the above methods are implemented.
  • the public opinion tracking method, device and computer equipment of video text combined data provided in this application realizes comprehensive coverage of public opinion information by combining text and video in public opinion information for tracking, and effectively increases the popularity of public opinion information Resolution accuracy of changing trends.
  • Figure 1 is a schematic diagram of the steps of a public opinion tracking method for video text combined data in an embodiment of the present application
  • FIG. 2 is a block diagram of the overall structure of a public opinion tracking device for video text combined data in an embodiment of the present application
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
  • an embodiment of the present application provides a public opinion tracking method for video text combined data, including:
  • S1 According to the first preset frequency, obtain designated public data from the first preset network platform.
  • the designated public data includes all public information of the first preset network platform and the propagation path corresponding to each public information.
  • the public information includes Individual text information with only text, individual video information with only video, and video text combined information associated with video and text;
  • S5 Obtain public opinion data based on the combined public opinion video text data, public opinion text data, and public opinion video data;
  • S6 According to the public opinion data, obtain the trend of change in the popularity of the preset public opinion topic.
  • the public opinion system is pre-associated with the first preset network platform according to the developer's setting, so as to obtain information publicly released by the users of each platform in the first preset network platform.
  • the first preset network platform is a communication path through which publicly released information can be queried, and the publicly released information includes a video network platform, such as online social platforms such as Weibo and Douyin.
  • the first preset network platform can directly query the information publicly released by the users of each platform, as well as the relevant interactive actions such as comments, forwarding, or likes corresponding to the information, and query all the communication paths corresponding to the information according to these interactive actions.
  • the public opinion system can be associated with multiple first preset network platforms at the same time to obtain publicly released information and integrate and analyze it.
  • the public opinion system When the public opinion system is in use, according to the developer's setting, according to the first preset frequency, the public data used in the time period is directly obtained from the first preset network platform through the web crawler, that is, the designated public data.
  • the designated public data includes the public information publicly released by all platform users during the time period and the respective propagation paths of these public information.
  • the public information in the designated public data includes separate text information with only text, separate video information with only video, and combined video text information published in association with video and text. For example, a platform user attaches a video when publishing a text, which is Video text combination information.
  • the public opinion system builds a public database to store all designated public data.
  • the public opinion system has preset public opinion topics.
  • the preset public opinion topic can be a single word or a sentence.
  • the preset public opinion topic can be manually input by the developer, or it can be obtained by the public opinion system itself according to a preset setting.
  • the public opinion system monitors the hot topic lists of social platforms such as Weibo and Douyin, and selects the top topic of the topic list as the preset public opinion topic.
  • the public opinion system inputs the preset public opinion topics into the public database, first selects the first public opinion keywords from the preset public opinion topics according to the part of speech, and filters out the text information from the video text combination sub-database according to the first public opinion keywords. Part of the video text combined data of a public opinion keyword is used as the video text combined sub-data.
  • the public opinion system determines that the video sub-data is public opinion video sub-data.
  • the public opinion system obtains public opinion video text combination data related to the public opinion topic according to the public opinion video sub-data and the text data corresponding to the public opinion video sub-data. Based on the combined data of the public opinion video text, the public opinion system obtains the original source address of each individual video data and public opinion video sub-data, that is, the video source address, and then separates the original source address of each individual video data with the original public opinion video sub-data.
  • the source addresses are compared, and the individual video data with the same original source address is used as the first video data, and the individual video data with the original source address inconsistent is used as the second video data.
  • the public opinion system uses the public opinion video sub-data in the public opinion video text combination data as a benchmark, and according to the video similarity, filters the second video data from the second video data that has a video similarity value greater than the second preset value.
  • the video data serves as the third video data.
  • the public opinion system integrates the first video data and the second video data to obtain public opinion video data.
  • the public opinion system compares the public opinion text sub-data in the combined public opinion video text data with the individual text data, and selects keywords and part-of-speech analysis to filter the public opinion text sub-data from the separate text database.
  • a preset value for example, part of individual text data with a similarity of more than 90% is used as individual public opinion text data.
  • the public opinion system aggregates individual public opinion text data, individual public opinion video data, and public opinion video text combined data to generate public opinion data.
  • the public opinion system can obtain the trend of the popularity of the preset public opinion topic through the analysis of the public opinion data, specifically: the public opinion data includes the number of users of all public opinion content about the preset public opinion topic released within the time period, and comments related to the public opinion content According to the number of users, the number of comments, and the number of reposts, the public opinion system can determine the popularity of the topic of public opinion, and at the same time, it can know that the topic of public opinion is in the region based on the user’s location information. On the degree of diffusion. The public opinion system comprehensively pays attention to the degree of popularity and geographical spread, and can intuitively analyze the trend of the preset public opinion topic.
  • the public opinion data includes the number of users of all public opinion content about the preset public opinion topics released within the time period, the number of comments related to the public opinion content, the forwarding volume related to the public opinion content, and related user location information.
  • the numerical value of the number of users, the number of comments, and the amount of forwarding can determine the popularity of the public opinion topic, and at the same time, the extent of the geographical spread of the public opinion topic can be obtained according to the user location information.
  • the public database includes a separate text sub-database, a separate video sub-database, and a video text combined sub-database
  • the separate text sub-database is a database composed of multiple sets of separate text data
  • the separate video sub-database is multiple sets
  • the video text combined sub-database is a database composed of multiple sets of video and text one-to-one correspondingly associated data
  • the public opinion corresponding to the preset public opinion topic is obtained by screening in the public database
  • the steps of combining video text data include:
  • each of the public opinion video text combination data consists of a piece of public opinion video subdata and One piece of public opinion text sub-data corresponds to the association composition;
  • the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than a first preset value is filtered in the public database; and according to the video source address and the video similarity
  • the step of screening the public opinion video data with the video data of the public opinion video text combination data whose similarity is higher than a second preset value in the public database includes:
  • the video information and text information in the video text combination information acquired by the public opinion system correspond to each other when the platform user discloses, that is, when the user discloses the text information, the video information published together in the same piece of information is attached.
  • the public database constructed by the public opinion system includes a separate text sub-database, a separate video sub-database and a video text combined sub-database.
  • the separate text sub-database is a database composed of multiple sets of separate text data
  • the separate video sub-database is a database composed of multiple sets of separate video data
  • the video text combined sub-database is composed of multiple sets of videos and texts in one-to-one correspondence.
  • Each group of data includes public information and the corresponding transmission channels of public information.
  • the public opinion system inputs the preset public opinion topic into the video text combination sub-database, and first parses the preset public opinion topic to obtain at least one first public opinion keyword, that is, the keyword of the preset public opinion topic. Then, each first public opinion keyword is input into the video text combination sub-database, and part of the video text combination data containing the first public opinion keyword in the text information is filtered out, and then the same video in the partial video text combination data is classified Class, and count the number of appearances of the same video. If the number of appearances of the same video is greater than the preset number of viewing times, the public opinion system determines that the same video is public opinion video sub-data.
  • the public opinion system integrates public opinion video sub-data and public opinion text data corresponding to the public opinion video sub-data to generate public opinion video text combined data.
  • the public opinion video text data includes public opinion video sub-data and public opinion text sub-data, and there is a one-to-one correspondence between the two, and an index relationship is constructed so that the two are related to each other according to the index relationship.
  • text A corresponds to video a, and the index of both is 1
  • text B corresponds to video b, and the index between the two is 2.
  • the public opinion system uses the public opinion video sub-data in the public opinion video text combined data as a benchmark, compares each individual video data with the public opinion video sub-data one by one, and filters to obtain the same original source At least one individual video data whose address or video similarity value is greater than the second preset value is used as public opinion video data. Since the individual public opinion video data is similar to the public opinion video data, the individual public opinion video data corresponds to the preset public opinion topic.
  • the public opinion system is based on the public opinion text sub-data in the public opinion video text combined data, through word embedding and part-of-speech analysis, from the public opinion text sub-data to filter out the second public opinion keywords corresponding to each text, and at the same time to filter out the separate text data Separate keywords for each text. Then, the public opinion system counts the number of times for each current public opinion keyword and individual keywords. If there are more occurrences of the same keyword in two texts, it means that the similarity of the two texts is higher. The public opinion system judges the similarity between the two texts based on this, and selects the part of the individual text data whose similarity with the text information in the combined public opinion video text data is greater than the first preset value from the individual text data as the public opinion text data.
  • the step of obtaining the public opinion video text combination data by filtering according to the keywords of the preset public opinion topic includes:
  • S3011 Parse the preset public opinion topic to obtain at least one first public opinion keyword
  • S3014 retrieve a preset number of times, compare each of the number of appearances with the preset number of times, and select the video sub-data corresponding to the number of appearances greater than the preset number of times as the public opinion video sub-data;
  • S3015 From the video text combination sub-database, respectively filter the text sub-data corresponding to each of the public opinion video sub-data as the public opinion text sub-data;
  • S3016 Correspond each of the public opinion text sub-data and each of the public opinion video sub-data respectively to obtain the public opinion video text combined data.
  • the public opinion system performs analysis such as word segmentation and removal of stop words on a preset public opinion topic to obtain at least one first public opinion keyword.
  • the video text combination sub-database includes multiple sets of data in one-to-one correspondence between video data and text data.
  • the first public opinion keyword belongs to the video text combined sub-database, and at least one set of text data containing the first public opinion keyword is selected as the text sub-data from multiple sets of text sub-data.
  • the number of occurrences of the first public opinion keyword of each text in the text sub-data can be counted, and part of the text sub-data whose occurrence number exceeds a preset number can be selected to improve the text sub-data and public opinion.
  • the relevance of the topic Since the text data corresponds to the video data, the public opinion system can directly filter out all the video sub-data corresponding to the text sub-data from the video data.
  • the public opinion system separately counts the number of appearances of each video sub-data, then retrieves the preset number of times set in advance, and compares the number of appearances corresponding to each video sub-data with the preset number one by one, and selects the number of appearances greater than the preset number
  • the corresponding video sub-data serves as the public opinion video sub-data.
  • the public opinion system again filters out the public opinion text sub-data corresponding to the public opinion video sub-data from each group of text sub-data according to the correspondence between the video data and the text data.
  • Public opinion summarizes all public opinion video sub-data and public opinion text sub-data, and associates them one by one according to the corresponding relationship to generate public opinion video text combination data.
  • step of respectively matching each of the public opinion text sub-data and each of the public opinion video sub-data to obtain the public opinion video text combined data includes:
  • S30162 Associate and group the public opinion text sub-data and public opinion video sub-data with the same public address, so as to achieve a one-to-one correspondence between each public opinion text sub-data and each public opinion video sub-data;
  • S30163 Obtain the public opinion video text combined data according to each of the public opinion text sub-data and each of the public opinion video sub-data after being associated and grouped.
  • the public opinion system separately obtains the public address of each public opinion text sub-data and the public address of each public opinion video sub-data, where the public address is the original network address that discloses the data.
  • the public opinion system distinguishes the correspondence between each public opinion text sub-data and each public opinion video sub-data according to the public opinion of the data, that is, two data with the same public address are information released together by the same user when they are public.
  • the public opinion system associates public opinion text sub-data and public opinion video sub-data with the same public address into one group.
  • the public opinion system obtains the publication time of each data, and combines the public opinion text sub-data and public opinion video according to the same public address and the same public time.
  • the sub-data are related to each other. After the public opinion system associates each public opinion text sub-data and each public opinion video sub-data into groups, the combined public opinion video text data is obtained.
  • the separate video sub-database includes multiple sets of separate video data, and in the separate video sub-database, the similarity with the public opinion video sub-data is higher than that of the public opinion video sub-data by filtering according to the video source address and video similarity.
  • the step of describing the public opinion video data of the second preset value includes:
  • S4012 Compare the original source addresses of the individual video data with the original source addresses of the public opinion video sub-data one by one, select the individual video data with the same original source address as the first video data, and select the original source address Inconsistent individual video data is used as the second video data;
  • S4013 Calculate video similarity values between each of the second video data and the public opinion video sub-data
  • S4014 Compare each of the video similarity values one by one with the second preset value, and select the second video data corresponding to the video similarity value greater than the second preset value as the third video data;
  • S4015 Use the first video data and the third video data as public opinion video data.
  • the designated public data also includes the original source address of each public information.
  • the individual video sub-database is composed of multiple sets of individual video data.
  • the public opinion system first obtains the original source address of each public opinion video sub-data from the public database, and constructs the original source address database according to the original source address of each public opinion video sub-data. Then obtain the original source address of the individual video data, and compare the original source address of the individual video data with each original source address in the original source address database based on each original source address in the original source address database. Determine whether the original source address of the individual video data is included in the original source address database.
  • the individual video has the same original source address as the public opinion video and belongs to the same video, that is, the individual video data with the same original source address is selected as the first video data. If it does not have the same original source address as the public opinion video sub-data, it is regarded as the second video data.
  • the public opinion system selects a preset number of play frame pictures from each public opinion video sub-data according to the play time of the video, and groups the preset number of play frame pictures and corresponding public opinion video sub-data into corresponding groups.
  • the public opinion video sub-data uses a clustering model for each group of broadcast frame pictures, such as the DBSCAN clustering model, to form each group, and then randomly select a broadcast frame picture from each group as the key to the corresponding public opinion video sub-data Frame picture.
  • the public opinion system groups all the key frame pictures according to the corresponding public opinion video sub-data to build a key frame picture library.
  • the key frame picture library contains: public opinion video A, key frame pictures: a1, b2, c3; public opinion video B, key frame pictures: a2, b3, c4.
  • the public opinion system selects the key frame pictures of the individual video from the individual video data according to the above steps, and compares the key frame pictures of the individual video with the key frame picture library one by one.
  • the video similarity value between the two videos can be calculated. For example, if the key frame pictures of the sub-data of the individual video and the public opinion video are both 5, where there are 3 identical or similar key frame pictures, the video similarity value of the two videos is 60%.
  • the same or similar judgment between the key frame pictures can be clustered through the DBSCAN clustering model.
  • the public opinion system compares each video similarity value with the preset second preset value one by one, and uses the second video corresponding to the video similarity value greater than the second preset value as the third video. That is, if the video similarity value between the two videos is greater than the second preset value, the public opinion system determines that the two videos are the same or similar, and selects the second video data as the third video data.
  • the second preset value is preferably 80%.
  • the public opinion system aggregates the first video data and the third video data and sets them as public opinion video data.
  • the step of filtering according to text similarity to obtain public opinion text data whose similarity with the public opinion text sub-data is higher than the first preset value includes:
  • S4016 According to the part of speech, analyze each of the public opinion text sub-data and each of the individual text data to obtain a preset number of second public opinion keywords corresponding to each of the public opinion text sub-data, and the corresponding individual text data The preset number of individual keywords;
  • the public opinion text sub-data includes multiple public opinion texts
  • the separate text database includes multiple individual texts.
  • the public opinion system first performs word segmentation and removal of stop words on each public opinion text and individual text based on part of speech, and obtains a public opinion vocabulary corresponding to a single public opinion text and a separate vocabulary corresponding to a single individual text.
  • word segmentation refers to the operation of decomposing a text into individual words such as subject, predicate, and object. After word segmentation, a corresponding relationship is established based on the original expression relationship of the subject, predicate, and object in the text. For example, in the sentence "I went to Beijing", the subject is "I”, the predicate is "Go”, and the object is "Beijing".
  • the three words are related according to the original order in the text.
  • the predicate “Go” or the object “Beijing” in the same sentence will also be related to it Together, make a combination.
  • remove stop words is to remove meaningless words, such as "ah” and "oh”.
  • the public opinion system separately counts the word frequency of each word in the public opinion lexicon in the corresponding text. Then, the public opinion system separately calculates the reverse file frequency of each word, that is, the number of all texts corresponding to all words in the public opinion lexicon divided by the number of texts containing the word, and then the obtained quotient is obtained by taking the logarithm.
  • the weight of each word in a single text is obtained by multiplying the word frequency of the word by the reverse document frequency.
  • the importance of a single word increases in proportion to the number of times it appears in the document, but at the same time it decreases in inverse proportion to the frequency of its appearance in the corpus, that is, the greater the weight, the higher the importance.
  • the public opinion system selects a preset number of words in the calculated single text in descending order of weight as the second public opinion keyword.
  • the public opinion system calculates the weight of each word in the separate thesaurus according to the above method, and filters from the separate thesaurus to the same preset number of separate keywords in descending order.
  • the public sentiment system is based on the second public opinion keyword, selects the same keywords with the same second public opinion keyword from the individual keywords, and counts the number of times each same keyword appears in the corresponding single text In the text data, the same keywords whose appearance times are greater than the first threshold are filtered, and the individual text data corresponding to the same keywords are used as public opinion text data.
  • the method includes:
  • S7 Acquire multiple sets of search data from a second preset network platform according to a second preset frequency, where the search data includes search information and the number of searches corresponding to the search information;
  • S8 Compare each number of searches with a second threshold respectively, and use search information corresponding to the number of searches greater than the second threshold as the preset public opinion topic.
  • the public opinion topic may be manually input by the developer, or may be automatically screened by the public opinion system.
  • the public opinion system is associated with a second preset network platform in advance, and the second preset network platform is a search platform, such as Baidu and Sogou search.
  • the public opinion system obtains all search data in the time period from the last acquisition time to the current time from the second preset network platform.
  • the search data includes the search information input by the user and the total number of searches corresponding to the search information.
  • the public opinion system calls the second preset times, and compares the search times corresponding to each search information with the second preset times respectively, and filters out the designated search data whose search times are greater than the second preset times from the search data.
  • the public opinion system automatically sets the search information in the specified search data as public opinion topics.
  • the designated public data includes the public time of the information
  • the public opinion system is pre-associated with at least one preset terminal, and after the step of obtaining the trend of the preset public opinion topic based on the public opinion data, include:
  • S11 Obtain the link of the pushed public opinion data, and generate push information including the link of the pushed public opinion data;
  • S12 Send the push information to each of the preset terminals respectively.
  • the public opinion system analyzes the public opinion data, and can obtain the trend of the popularity of the public opinion topic.
  • the public opinion system comprehensively pays attention to the degree of popularity and geographical spread, and can intuitively obtain the trend of the preset public opinion topic.
  • the public opinion system judges whether the trend of public opinion topics meets the conditions for triggering automatic push according to the degree of attention and geographical spread in the trend of heat changes. If the focus of attention in the popularity trend exceeds the preset number, and the spread of the region exceeds the preset range, such as the number of reposts and comments are greater than 500,000, and the spread of the region exceeds 100,000 square kilometers, then judge the topic of public opinion The heat change trend meets the conditions for triggering automatic push.
  • the public opinion system screens out the public opinion video text combination data that includes text and video from the public opinion data, and at the same time, the public opinion video text data with the shortest time interval between the public release and the current time is used as the push public opinion data.
  • the public opinion system obtains the link to push the public opinion data, generates push information containing the link to push the public opinion data, and then automatically sends the push information to the preset terminal so that the public can understand the current public opinion topic in time.
  • the public opinion tracking method of video text combined data provided in this embodiment realizes comprehensive coverage of public opinion information by combining text and video in public opinion information for tracking, and effectively improves the accuracy of the analysis of the trend of public opinion information. degree.
  • an embodiment of the present application also provides a public opinion tracking device for video text combined data, including:
  • the first obtaining module 1 is configured to obtain designated public data from a first preset network platform according to a first preset frequency
  • the construction module 2 is used to construct a public database according to the specified public data
  • the first screening module 3 is configured to filter the public opinion video text combination data corresponding to the preset public opinion topic in the public database;
  • the second screening module 4 is configured to filter the public opinion text data whose similarity with the text data of the public opinion video text combination data is higher than the first preset value in the public database according to the text similarity; and according to the video The source address and the video similarity, and the public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value is obtained by screening in the public database;
  • the first generating module 5 is configured to obtain public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
  • the analysis module 6 is configured to obtain the trend of the popularity change of the preset public opinion topic according to the public opinion data.
  • the public opinion tracking device for video text combined data provided in this embodiment realizes comprehensive coverage of public opinion information by combining text and video in public opinion information for tracking, and effectively improves the accuracy of analyzing the trend of public opinion information. degree.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface and a database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as public databases.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor, the processes of the foregoing method embodiments are executed.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application further provides a computer non-volatile readable storage medium, on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the processes of the above-mentioned method embodiments are executed.
  • a computer non-volatile readable storage medium on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the processes of the above-mentioned method embodiments are executed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present application provides a public opinion tracking method and device for combined video-text data, and a computer apparatus. The method comprises: obtaining specified public data, and building a public database; filtering the public database to obtain public opinion combined video-text data, public opinion text data, and public opinion video data, all corresponding to a public opinion topic; and combining all of the data obtained by filtering so as to form public opinion data. The present application performs tracking by combining text and video in public opinion information, thereby achieving comprehensive tracking of public opinion information.

Description

视频文本组合数据的舆情跟踪方法、装置和计算机设备Public opinion tracking method, device and computer equipment for video text combined data
本申请要求于2019年1月31日提交中国专利局、申请号为201910100413.X,发明名称为“视频文本组合数据的舆情跟踪方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 31, 2019, the application number is 201910100413.X, and the invention title is "Public Opinion Tracking Method, Apparatus and Computer Equipment for Video Text Combined Data", all of which The content is incorporated in this application by reference.
技术领域Technical field
本申请涉及数据分析技术领域,特别涉及一种视频文本组合数据的舆情跟踪方法、装置和计算机设备。This application relates to the field of data analysis technology, and in particular to a public opinion tracking method, device and computer equipment for video text combined data.
背景技术Background technique
目前的网络舆情监测方法均是通过关键字匹配或文字OCR识别技术,抓取网络各个平台上相关的新闻、社交动态、网民评论等网络舆情信息。在这种情况下,只有包含相关关键字文本的数据才会被处理,声量只是文字声量,针对只有视频而没有相关关键字的内容却不能被搜索到。现在越来越多的用户喜欢用视频来表达情感,他们可能没有提及到某些关键词,但却用视频传递了同样的信息。比如,微博真实用户每日所发的短视频就达数百万个。目前由于缺乏有效的识别手段,视频数据是长期存在的监测盲区。The current methods of monitoring public opinion on the Internet use keyword matching or text OCR recognition technology to capture relevant news, social dynamics, and netizen comments on various platforms on the Internet. In this case, only the data containing the relevant keyword text will be processed, and the volume is only the volume of the text, and the content that only contains the video without the relevant keyword cannot be searched. Now more and more users like to use videos to express their emotions. They may not mention certain keywords, but they use videos to convey the same information. For example, there are millions of short videos posted daily by real users of Weibo. At present, due to the lack of effective identification methods, video data is a long-standing blind spot for monitoring.
技术问题technical problem
本申请的主要目的为提供一种视频文本组合数据的舆情跟踪方法、装置和计算机设备,旨在解决现有舆情监测方法无法实现对视频进行舆情跟踪的弊端。The main purpose of this application is to provide a public opinion tracking method, device and computer equipment for video text combined data, which aims to solve the disadvantages of the existing public opinion monitoring methods that cannot achieve public opinion tracking on video.
技术解决方案Technical solutions
为实现上述目的,本申请提供了一种视频文本组合数据的舆情跟踪方法,包括:To achieve the above objective, this application provides a public opinion tracking method for video text combined data, including:
根据第一预设频率,从第一预设网络平台获取指定公开数据,所述指定公开数据包括所述第一预设网络平台的所有所述公开信息以及各所述公开信息分别对应的传播路径,所述公开信息包括只有文本的单独文本信息、只有视频的单独视频信息、视频和文本相关联的视频文本组合信息;According to a first preset frequency, obtain designated public data from a first preset network platform, the designated public data including all the public information of the first preset network platform and the propagation path corresponding to each public information , The public information includes individual text information with only text, individual video information with only video, and combined video and text information related to the video and the text;
根据所述指定公开数据构建公开数据库;Construct a public database according to the specified public data;
在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据;Screening and obtaining public opinion video text combination data corresponding to the preset public opinion topic in the public database;
根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than the first preset value is filtered in the public database; and according to the video source address and the video similarity, The public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value is filtered from the public database;
根据所述舆情视频文本组合数据、所述舆情文本数据和所述舆情视频数据得到舆情数据;Obtaining public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。According to the public opinion data, the popularity trend of the preset public opinion topic is obtained.
本申请还提供了一种视频文本组合数据的舆情跟踪装置,包括:This application also provides a public opinion tracking device for video text combined data, including:
第一获取模块,用于根据第一预设频率,从第一预设网络平台获取指定公开数据;The first obtaining module is configured to obtain designated public data from the first preset network platform according to the first preset frequency;
构建模块,用于根据所述指定公开数据构建公开数据库;A building module for building a public database based on the designated public data;
第一筛选模块,用于在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据;The first screening module is configured to filter the public opinion video text combination data corresponding to the preset public opinion topic in the public database;
第二筛选模块,用于根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;The second filtering module is configured to filter the public opinion text data with the text data of the public opinion video text combined data that is higher than the first preset value in the public database according to the text similarity; and according to the video source Address and video similarity, filtering the public opinion video data with the video data of the public opinion video text combination data that has a similarity higher than a second preset value from the public database;
第一生成模块,用于根据所述舆情视频文本组合数据、所述舆情文本数据和所述舆情视频数据得到舆情数据;The first generating module is configured to obtain public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
解析模块,用于根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。The analysis module is used to obtain the trend of the hotness change of the preset public opinion topic according to the public opinion data.
本申请还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现上述任一项所述方法的步骤。The present application also provides a computer device, including a memory and a processor, the memory stores computer-readable instructions, and the processor implements the steps of any one of the foregoing methods when the computer-readable instructions are executed by the processor.
本申请还提供一种计算机非易失性可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述任一项所述的方法的步骤。The present application also provides a computer non-volatile readable storage medium, on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the steps of any one of the above methods are implemented.
有益效果Beneficial effect
本申请中提供的一种视频文本组合数据的舆情跟踪方法、装置和计算机设备,通过将舆情信息中的文本和视频相结合进行跟踪,实现对舆情信息的全面覆盖,有效提高对舆情信息的热度变化趋势的解析准确度。The public opinion tracking method, device and computer equipment of video text combined data provided in this application realizes comprehensive coverage of public opinion information by combining text and video in public opinion information for tracking, and effectively increases the popularity of public opinion information Resolution accuracy of changing trends.
附图说明Description of the drawings
图1是本申请一实施例中视频文本组合数据的舆情跟踪方法步骤示意图;Figure 1 is a schematic diagram of the steps of a public opinion tracking method for video text combined data in an embodiment of the present application;
图2是本申请一实施例中视频文本组合数据的舆情跟踪装置整体结构框图;2 is a block diagram of the overall structure of a public opinion tracking device for video text combined data in an embodiment of the present application;
图3是本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the drawings.
本发明的最佳实施方式The best mode of the invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the objectives, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not to limit the application.
参照图1,本申请一实施例中提供了一种视频文本组合数据的舆情跟踪方法,包括:1, an embodiment of the present application provides a public opinion tracking method for video text combined data, including:
S1:根据第一预设频率,从第一预设网络平台获取指定公开数据,指定公开数据包括第一预设网络平台的所有公开信息以及各所述公开信息分别对应的传播路径,公开信息包括只有文本的单独文本信息、只有视频的单独视频信息、视频和文本相关联的视频文本组合信息;S1: According to the first preset frequency, obtain designated public data from the first preset network platform. The designated public data includes all public information of the first preset network platform and the propagation path corresponding to each public information. The public information includes Individual text information with only text, individual video information with only video, and video text combined information associated with video and text;
S2:根据所述指定公开数据构建公开数据库;S2: construct a public database according to the specified public data;
S3:在公开数据库中筛选得到预设舆情话题对应的舆情视频文本组合数据;S3: Filter the public opinion video text combination data corresponding to the preset public opinion topic in the public database;
S4:根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;S4: According to the text similarity, filter in the public database to obtain the public opinion text data whose similarity with the text data of the public opinion video text combination data is higher than the first preset value; and according to the video source address and the video similarity , Screening in the public database to obtain public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value;
S5:根据舆情视频文本组合数据、舆情文本数据和舆情视频数据得到舆情数据;S5: Obtain public opinion data based on the combined public opinion video text data, public opinion text data, and public opinion video data;
S6:根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。S6: According to the public opinion data, obtain the trend of change in the popularity of the preset public opinion topic.
本实施例中,舆情系统根据开发人员的设定,预先跟第一预设网络平台进行关联,用以获取第一预设网络平台中各个平台用户公开发布的信息。其中,第一预设网络平台为可以查询到公开发布的信息的的传播路径,且公开的信息中包括有视频的网络平台,比如微博,抖音等网络社交平台。第一预设网络平台可以直接查询到各个平台用户公开发布的信息,以及该信息对应的相关评论、转发或点赞之类互动动作,并根据这些互动动作查询到该信息对应的所有传播路径。舆情系统可以同时跟多个第一预设网络平台关联,获取公开发布的信息,并整合分析。舆情系统在使用时,根据开发人员的设定,按照第一预设频率,通过网络爬虫从第一预设网络平台直接获取该时间段内的所用公开数据,即指定公开数据。其中,指定公开数据包括该时间段内所有平台用户公开发布的公开信息以及这些公开信息分别对应的传播路径。指定公开数据中的公开信息包括仅有文本的单独文本信息、仅有视频的单独视频信息,以及视频和文本关联发布的视频文本组合信息,比如平台用户在发布文本时附带了一段视频,即为视频文本组合信息。舆情系统构建公开数据库,用以存储所有的指定公开数据。舆情系统设定有预设舆情话题。其中,预设舆情话题可以为单个的词语,也可以为语句。该预设舆情话题可以由开发人员手动输入,也可以是舆情系统自行根据预设设定获得。比如舆情系统监控微博、抖音等社交平台的热门话题榜,选择话题榜第一的话题作为预设舆情话题。舆情系统将预设舆情话题输入公开数据库中,首先根据词性从预设舆情话题中选择得到第一舆情关键词,并根据第一舆情关键词从视频文本组合子数据库中筛选出文本信息包含有第一舆情关键词的部分视频文本组合数据作为视频文本组合子数据。然后将视频文本组合子数据中的相同视频,即各个视频子数据的出现次数进行统计。如果某个视频子数据的出现次数大于预设次数,则舆情系统判定该视频子数据为舆情视频子数据。舆情系统根据舆情视频子数据以及该舆情视频子数据相对应的文本数据,得到与舆情话题相关的舆情视频文本组合数据。舆情系统以舆情视频文本组合数据为基准,分别获取各单独视频数据和舆情视频子数据的原始源地址,即视频源地址,然后分别将各单独视频数据的原始源地址与舆情视频子数据的原始源地址进行比较,将原始源地址一致的单独视频数据作为第一视频数据,并将原始源地址不一致的单独视频数据作为第二视频数据。然后,舆情系统以舆情视频文本组合数据中的舆情视频子数据为基准,根据视频相似性,从第二视频数据中筛选出与舆情视频子数据的视频相似值大于第二预设值的第二视频数据作为第三视频数据。舆情系统综合第一视频数据和第二视频数据,得到舆情视频数据。同时,舆情系统将舆情视频文本组合数据中的舆情文本子数据与单独文本数据进行比对,通过选取关键词和词性分析相结合,从单独文本数据库中筛选与舆情文本子数据的相似度大于第一预设值,比如相似度90%以上的部分单独文本数据作为单独舆情文本数据。舆情系统汇总单独舆情文本数据、单独舆情视频数据、舆情视频文本组合数据,生成舆情数据。舆情系统通过对舆情数据的解析,可以得到预设舆情话题的热度变化趋势,具体为:舆情数据包括该时间段内发布的关于预设舆情话题的所有舆情内容的用户数、舆情内容相关的评论数、舆情内容相关的转发量以及相关的用户位置信息,舆情系统根据用户数、评论数和转发量的数值大小,可以确定舆情话题的关注热度,同时根据用户位置信息可以获知该舆情话题在地域上的扩散程度。舆情系统综合关注热度和地域的扩散程度,可以直观的解析得到该预设舆情话题的热度变化趋势。解析过程具体为:舆情数据包括该时间段内发布的关于预设舆情话题的所有舆情内容的用户数、舆情内容相关的评论数、舆情内容相关的转发量以及相关的用户位置信息,舆情系统根据用户数、评论数和转发量的数值大小,可以确定舆情话题的关注热度,同时根据用户位置信息可以获知该舆情话题在地域上的扩散程度。In this embodiment, the public opinion system is pre-associated with the first preset network platform according to the developer's setting, so as to obtain information publicly released by the users of each platform in the first preset network platform. Among them, the first preset network platform is a communication path through which publicly released information can be queried, and the publicly released information includes a video network platform, such as online social platforms such as Weibo and Douyin. The first preset network platform can directly query the information publicly released by the users of each platform, as well as the relevant interactive actions such as comments, forwarding, or likes corresponding to the information, and query all the communication paths corresponding to the information according to these interactive actions. The public opinion system can be associated with multiple first preset network platforms at the same time to obtain publicly released information and integrate and analyze it. When the public opinion system is in use, according to the developer's setting, according to the first preset frequency, the public data used in the time period is directly obtained from the first preset network platform through the web crawler, that is, the designated public data. Among them, the designated public data includes the public information publicly released by all platform users during the time period and the respective propagation paths of these public information. The public information in the designated public data includes separate text information with only text, separate video information with only video, and combined video text information published in association with video and text. For example, a platform user attaches a video when publishing a text, which is Video text combination information. The public opinion system builds a public database to store all designated public data. The public opinion system has preset public opinion topics. Among them, the preset public opinion topic can be a single word or a sentence. The preset public opinion topic can be manually input by the developer, or it can be obtained by the public opinion system itself according to a preset setting. For example, the public opinion system monitors the hot topic lists of social platforms such as Weibo and Douyin, and selects the top topic of the topic list as the preset public opinion topic. The public opinion system inputs the preset public opinion topics into the public database, first selects the first public opinion keywords from the preset public opinion topics according to the part of speech, and filters out the text information from the video text combination sub-database according to the first public opinion keywords. Part of the video text combined data of a public opinion keyword is used as the video text combined sub-data. Then, the same videos in the video text combination sub-data, that is, the number of appearances of each video sub-data are counted. If the number of occurrences of a certain video sub-data is greater than the preset number, the public opinion system determines that the video sub-data is public opinion video sub-data. The public opinion system obtains public opinion video text combination data related to the public opinion topic according to the public opinion video sub-data and the text data corresponding to the public opinion video sub-data. Based on the combined data of the public opinion video text, the public opinion system obtains the original source address of each individual video data and public opinion video sub-data, that is, the video source address, and then separates the original source address of each individual video data with the original public opinion video sub-data. The source addresses are compared, and the individual video data with the same original source address is used as the first video data, and the individual video data with the original source address inconsistent is used as the second video data. Then, the public opinion system uses the public opinion video sub-data in the public opinion video text combination data as a benchmark, and according to the video similarity, filters the second video data from the second video data that has a video similarity value greater than the second preset value. The video data serves as the third video data. The public opinion system integrates the first video data and the second video data to obtain public opinion video data. At the same time, the public opinion system compares the public opinion text sub-data in the combined public opinion video text data with the individual text data, and selects keywords and part-of-speech analysis to filter the public opinion text sub-data from the separate text database. A preset value, for example, part of individual text data with a similarity of more than 90% is used as individual public opinion text data. The public opinion system aggregates individual public opinion text data, individual public opinion video data, and public opinion video text combined data to generate public opinion data. The public opinion system can obtain the trend of the popularity of the preset public opinion topic through the analysis of the public opinion data, specifically: the public opinion data includes the number of users of all public opinion content about the preset public opinion topic released within the time period, and comments related to the public opinion content According to the number of users, the number of comments, and the number of reposts, the public opinion system can determine the popularity of the topic of public opinion, and at the same time, it can know that the topic of public opinion is in the region based on the user’s location information. On the degree of diffusion. The public opinion system comprehensively pays attention to the degree of popularity and geographical spread, and can intuitively analyze the trend of the preset public opinion topic. The analysis process is specifically as follows: the public opinion data includes the number of users of all public opinion content about the preset public opinion topics released within the time period, the number of comments related to the public opinion content, the forwarding volume related to the public opinion content, and related user location information. The numerical value of the number of users, the number of comments, and the amount of forwarding can determine the popularity of the public opinion topic, and at the same time, the extent of the geographical spread of the public opinion topic can be obtained according to the user location information.
进一步的,所述公开数据库包括单独文本子数据库、单独视频子数据库和视频文本组合子数据库,所述单独文本子数据库为多组单独的文本数据组成的数据库,所述单独视频子数据库为多组单独的视频数据组成的数据库,所述视频文本组合子数据库为多组视频和文本一一对应关联的数据组成的数据库;所述在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据的步骤,包括:Further, the public database includes a separate text sub-database, a separate video sub-database, and a video text combined sub-database, the separate text sub-database is a database composed of multiple sets of separate text data, and the separate video sub-database is multiple sets A database composed of separate video data, the video text combined sub-database is a database composed of multiple sets of video and text one-to-one correspondingly associated data; the public opinion corresponding to the preset public opinion topic is obtained by screening in the public database The steps of combining video text data include:
S301:在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据,其中,每一个所述舆情视频文本组合数据由一条舆情视频子数据和一条舆情文本子数据对应关联组成;S301: In the video text combination sub-database, filter the public opinion video text combination data according to the keyword of the preset public opinion topic, wherein each of the public opinion video text combination data consists of a piece of public opinion video subdata and One piece of public opinion text sub-data corresponds to the association composition;
所述根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据的步骤包括:According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than a first preset value is filtered in the public database; and according to the video source address and the video similarity The step of screening the public opinion video data with the video data of the public opinion video text combination data whose similarity is higher than a second preset value in the public database includes:
S401:在所述单独文本子数据库中,根据文本相似性筛选得到与所述舆情文本子数据的相似度高于所述第一预设值的舆情文本数据,并在所述单独视频子数据库中,根据视频源地址和视频相似性筛选得到与所述舆情视频子数据的相似度高于所述第二预设值的舆情视频数据。S401: In the separate text sub-database, filter according to the text similarity to obtain the public opinion text data whose similarity to the public opinion text sub-data is higher than the first preset value, and store it in the separate video sub-database , Filtering according to the video source address and video similarity to obtain public opinion video data whose similarity with the public opinion video sub-data is higher than the second preset value.
本实施例中,舆情系统获取的视频文本组合信息中的视频信息和文本信息是平台用户在公开时相互对应的,即用户在公开文本信息时附带在同一条信息中一起发布的视频信息。舆情系统构建的公开数据库中包括单独文本子数据库、单独视频子数据库和视频文本组合子数据库。其中,单独文本子数据库为多组单独的文本数据组成的数据库,单独视频子数据库为多组单独的视频数据组成的数据库,视频文本组合子数据库为多组视频和文本一一对应关联的数据组成的数据库。各组数据中包括公开信息和公开信息对应的传播途径。舆情系统将预设舆情话题输入视频文本组合子数据库中,首先解析预设舆情话题,得到至少一个第一舆情关键词,即预设舆情话题的关键词。然后,分别将各第一舆情关键词输入视频文本组合子数据库,筛选出文本信息中包含有第一舆情关键词的部分视频文本组合数据,然后将该部分视频文本组合数据中的相同视频进行归类,并统计各个相同视频的出现次数。如果相同视频的出现次数大于预设次数看,则舆情系统判定该相同视频为舆情视频子数据。舆情系统综合舆情视频子数据和该舆情视频子数据对应的舆情文本数据,生成舆情视频文本组合数据。其中,舆情视频文本数据包括舆情视频子数据和舆情文本子数据,两者之间一一对应,构建有索引关系,以便两者之间根据索引关系相互关联。比如,文本A对应视频a,两者的索引为1;文本B对应视频b,两者之间的索引为2。舆情系统在获得舆情视频文本组合数据后,舆情系统以舆情视频文本组合数据中的舆情视频子数据为基准,将各单独视频数据分别与舆情视频子数据一一比对,筛选得到具有相同原始源地址或视频相似值大于第二预设值的至少一个单独视频数据作为舆情视频数据。单独舆情视频数据由于与舆情视频数据相似,因此单独舆情视频数据与预设舆情话题相对应。同时,舆情系统基于舆情视频文本组合数据中的舆情文本子数据,通过词嵌入和词性分析,从舆情文本子数据中筛选出各个文本对应的第二舆情关键词,同时从单独文本数据中筛选出各个文本对应的单独关键词。然后,舆情系统对各个当前舆情关键词和单独关键词进行次数统计。如果两个文本中相同的关键词的出现次数越多,则说明两个文本的相似度越高。舆情系统以此判断两个文本之间的相似度,从单独文本数据中筛选与舆情视频文本组合数据中的文本信息相似度大于第一预设值的部分单独文本数据作为舆情文本数据。In this embodiment, the video information and text information in the video text combination information acquired by the public opinion system correspond to each other when the platform user discloses, that is, when the user discloses the text information, the video information published together in the same piece of information is attached. The public database constructed by the public opinion system includes a separate text sub-database, a separate video sub-database and a video text combined sub-database. Among them, the separate text sub-database is a database composed of multiple sets of separate text data, the separate video sub-database is a database composed of multiple sets of separate video data, and the video text combined sub-database is composed of multiple sets of videos and texts in one-to-one correspondence. Database. Each group of data includes public information and the corresponding transmission channels of public information. The public opinion system inputs the preset public opinion topic into the video text combination sub-database, and first parses the preset public opinion topic to obtain at least one first public opinion keyword, that is, the keyword of the preset public opinion topic. Then, each first public opinion keyword is input into the video text combination sub-database, and part of the video text combination data containing the first public opinion keyword in the text information is filtered out, and then the same video in the partial video text combination data is classified Class, and count the number of appearances of the same video. If the number of appearances of the same video is greater than the preset number of viewing times, the public opinion system determines that the same video is public opinion video sub-data. The public opinion system integrates public opinion video sub-data and public opinion text data corresponding to the public opinion video sub-data to generate public opinion video text combined data. Among them, the public opinion video text data includes public opinion video sub-data and public opinion text sub-data, and there is a one-to-one correspondence between the two, and an index relationship is constructed so that the two are related to each other according to the index relationship. For example, text A corresponds to video a, and the index of both is 1; text B corresponds to video b, and the index between the two is 2. After the public opinion system obtains the public opinion video text combination data, the public opinion system uses the public opinion video sub-data in the public opinion video text combined data as a benchmark, compares each individual video data with the public opinion video sub-data one by one, and filters to obtain the same original source At least one individual video data whose address or video similarity value is greater than the second preset value is used as public opinion video data. Since the individual public opinion video data is similar to the public opinion video data, the individual public opinion video data corresponds to the preset public opinion topic. At the same time, the public opinion system is based on the public opinion text sub-data in the public opinion video text combined data, through word embedding and part-of-speech analysis, from the public opinion text sub-data to filter out the second public opinion keywords corresponding to each text, and at the same time to filter out the separate text data Separate keywords for each text. Then, the public opinion system counts the number of times for each current public opinion keyword and individual keywords. If there are more occurrences of the same keyword in two texts, it means that the similarity of the two texts is higher. The public opinion system judges the similarity between the two texts based on this, and selects the part of the individual text data whose similarity with the text information in the combined public opinion video text data is greater than the first preset value from the individual text data as the public opinion text data.
进一步的,所述在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据的步骤,包括:Further, in the video text combination sub-database, the step of obtaining the public opinion video text combination data by filtering according to the keywords of the preset public opinion topic includes:
S3011:解析所述预设舆情话题,得到至少一个第一舆情关键词;S3011: Parse the preset public opinion topic to obtain at least one first public opinion keyword;
S3012:在所述视频文本组合子数据库中,筛选得到包含有第一舆情关键词的至少一组视频文本组合数据作为视频文本组合子数据,所述视频文本组合子数据由文本子数据和视频子数据对应组成;S3012: In the video text combination sub-database, filter to obtain at least one set of video text combination data containing the first public opinion keyword as video text combination sub-data, and the video text combination sub-data consists of text sub-data and video sub-data. Data corresponding composition;
S3013:分别计算各所述视频子数据的出现次数;S3013: Calculate the number of appearances of each of the video sub-data separately;
S3014:调取预设次数,并分别将各所述出现次数一一与预设次数进行比较,选择大于所述预设次数的出现次数对应的视频子数据作为舆情视频子数据;S3014: Retrieve a preset number of times, compare each of the number of appearances with the preset number of times, and select the video sub-data corresponding to the number of appearances greater than the preset number of times as the public opinion video sub-data;
S3015:从所述视频文本组合子数据库中,分别筛选各所述舆情视频子数据对应的文本子数据作为舆情文本子数据;S3015: From the video text combination sub-database, respectively filter the text sub-data corresponding to each of the public opinion video sub-data as the public opinion text sub-data;
S3016:分别将各所述舆情文本子数据和各所述舆情视频子数据一一对应,得到所述舆情视频文本组合数据。S3016: Correspond each of the public opinion text sub-data and each of the public opinion video sub-data respectively to obtain the public opinion video text combined data.
本实施例中,舆情系统基于词性,对预设舆情话题进行分词、去除停用词等解析,得到至少一个第一舆情关键词。视频文本组合子数据库中包括视频数据和文本数据一一对应的多组数据。舆情系统首先第一舆情关键词属于视频文本组合子数据库中,从多组文本子数据中筛选出文本内容包含有第一舆情关键词的至少一组文本数据作为文本子数据。进一步的,本实施例中还可以对文本子数据中各个文本的第一舆情关键词的出现次数进行统计,并筛选出出现次数超过预设次数的部分文本子数据,以提高文本子数据与舆情话题的关联度。由于文本数据与视频数据相对应,因此舆情系统可以直接从视频数据中筛选出所有文本子数据对应的视频子数据。舆情系统分别统计各个视频子数据出现次数,然后调取预先设置的预设次数,并分别将各个视频子数据对应的出现次数一一与预设次数进行比较,选择大于预设次数的出现次数对应的视频子数据作为舆情视频子数据。最后,舆情系统再次根据视频数据和文本数据之间的对应关系,从各组文本子数据中筛选出舆情视频子数据对应的舆情文本子数据。舆情汇总所有的舆情视频子数据和舆情文本子数据,按照对应关系分别一一关联,生成舆情视频文本组合数据。In this embodiment, based on the part of speech, the public opinion system performs analysis such as word segmentation and removal of stop words on a preset public opinion topic to obtain at least one first public opinion keyword. The video text combination sub-database includes multiple sets of data in one-to-one correspondence between video data and text data. In the public opinion system, the first public opinion keyword belongs to the video text combined sub-database, and at least one set of text data containing the first public opinion keyword is selected as the text sub-data from multiple sets of text sub-data. Further, in this embodiment, the number of occurrences of the first public opinion keyword of each text in the text sub-data can be counted, and part of the text sub-data whose occurrence number exceeds a preset number can be selected to improve the text sub-data and public opinion. The relevance of the topic. Since the text data corresponds to the video data, the public opinion system can directly filter out all the video sub-data corresponding to the text sub-data from the video data. The public opinion system separately counts the number of appearances of each video sub-data, then retrieves the preset number of times set in advance, and compares the number of appearances corresponding to each video sub-data with the preset number one by one, and selects the number of appearances greater than the preset number The corresponding video sub-data serves as the public opinion video sub-data. Finally, the public opinion system again filters out the public opinion text sub-data corresponding to the public opinion video sub-data from each group of text sub-data according to the correspondence between the video data and the text data. Public opinion summarizes all public opinion video sub-data and public opinion text sub-data, and associates them one by one according to the corresponding relationship to generate public opinion video text combination data.
进一步的,所述分别将各所述舆情文本子数据和各所述舆情视频子数据一一对应,得到所述舆情视频文本组合数据的步骤,包括:Further, the step of respectively matching each of the public opinion text sub-data and each of the public opinion video sub-data to obtain the public opinion video text combined data includes:
S30161:分别获取各所述舆情文本子数据的公开地址和各所述舆情视频子数据的公开地址;S30161: Obtain the public address of each public opinion text sub-data and the public address of each public opinion video sub-data respectively;
S30162:将具有同一公开地址的舆情文本子数据和舆情视频子数据相互关联分组,以实现将各所述舆情文本子数据和各所述舆情视频子数据一一对应;S30162: Associate and group the public opinion text sub-data and public opinion video sub-data with the same public address, so as to achieve a one-to-one correspondence between each public opinion text sub-data and each public opinion video sub-data;
S30163:根据相互关联分组后的各所述舆情文本子数据和各所述舆情视频子数据,得到所述舆情视频文本组合数据。S30163: Obtain the public opinion video text combined data according to each of the public opinion text sub-data and each of the public opinion video sub-data after being associated and grouped.
本实施例中,舆情系统分别获取各舆情文本子数据的公开地址以及各舆情视频子数据的公开地址,其中,公开地址即公开该数据的原始网络地址。舆情系统根据数据的公开地址区分各舆情文本子数据和各舆情视频子数据之间的对应关系,即具有相同的公开地址的两个数据为同一用户在公开时一起发布的信息。舆情系统将具有同一公开地址的舆情文本子数据和舆情视频子数据进行关联后分为一组。进一步的,若有多个舆情文本子数据和多个舆情视频子数据具有相同的公开地址,则舆情系统获取各个数据的公开时间,根据同一公开地址和同一公开时间将舆情文本子数据和舆情视频子数据相互关联。舆情系统将各舆情文本子数据和各舆情视频子数据关联分组后,即得到舆情视频文本组合数据。In this embodiment, the public opinion system separately obtains the public address of each public opinion text sub-data and the public address of each public opinion video sub-data, where the public address is the original network address that discloses the data. The public opinion system distinguishes the correspondence between each public opinion text sub-data and each public opinion video sub-data according to the public opinion of the data, that is, two data with the same public address are information released together by the same user when they are public. The public opinion system associates public opinion text sub-data and public opinion video sub-data with the same public address into one group. Further, if there are multiple public opinion text sub-data and multiple public opinion video sub-data with the same public address, the public opinion system obtains the publication time of each data, and combines the public opinion text sub-data and public opinion video according to the same public address and the same public time. The sub-data are related to each other. After the public opinion system associates each public opinion text sub-data and each public opinion video sub-data into groups, the combined public opinion video text data is obtained.
进一步的,所述单独视频子数据库包括多组单独视频数据,所述在所述单独视频子数据库中,根据视频源地址和视频相似性筛选得到与所述舆情视频子数据的相似度高于所述第二预设值的舆情视频数据的步骤中,包括:Further, the separate video sub-database includes multiple sets of separate video data, and in the separate video sub-database, the similarity with the public opinion video sub-data is higher than that of the public opinion video sub-data by filtering according to the video source address and video similarity. The step of describing the public opinion video data of the second preset value includes:
S4011:分别获取各所述单独视频数据和所述预设视频子数据的原始源地址;S4011: Obtain the original source addresses of each of the individual video data and the preset video sub-data respectively;
S4012:分别将各所述单独视频数据的原始源地址与所述舆情视频子数据的原始源地址一一进行比较,选择原始源地址一致的单独视频数据作为第一视频数据,并选择原始源地址不一致的单独视频数据作为第二视频数据;S4012: Compare the original source addresses of the individual video data with the original source addresses of the public opinion video sub-data one by one, select the individual video data with the same original source address as the first video data, and select the original source address Inconsistent individual video data is used as the second video data;
S4013:分别计算各所述第二视频数据与所述舆情视频子数据之间的视频相似值;S4013: Calculate video similarity values between each of the second video data and the public opinion video sub-data;
S4014:分别将各所述视频相似值一一与所述第二预设值进行比较,选择大于所述第二预设值的视频相似值对应的第二视频数据作为第三视频数据;S4014: Compare each of the video similarity values one by one with the second preset value, and select the second video data corresponding to the video similarity value greater than the second preset value as the third video data;
S4015:将所述第一视频数据和所述第三视频数据作为舆情视频数据。S4015: Use the first video data and the third video data as public opinion video data.
本实施例中,指定公开数据中还包括各公开信息的原始源地址。单独视频子数据库由多组单独视频数据组成。舆情系统首先从公开数据库中获取各个舆情视频子数据的原始源地址,并根据各个舆情视频子数据的原始源地址构建原始源地址数据库。然后获取单独视频数据的原始源地址,并以原始源地址数据库中的各个原始源地址为基准,将单独视频数据的原始源地址与原始源地址数据库中的各个原始源地址一一进行比对,判断单独视频数据的原始源地址是否包含在原始源地址数据库中。如果包含于原始源地址数据库中,则说明单独视频具有与舆情视频相同的原始源地址,属于同一个视频,即选择原始源地址一致的单独视频数据为第一视频数据。如果不具有与舆情视频子数据相同的原始源地址,则作为第二视频数据。舆情系统按照视频的播放时间,从各个舆情视频子数据中分别选取预设数量的播放帧图片,并将预设数量的播放帧图片与对应的舆情视频子数据对应分组。舆情视频子数据对各组播放帧图片通过聚类模型,比如DBSCAN聚类模型进行聚类形成各个群类,再从各个群类中随机选择一张播放帧图片作为对应的舆情视频子数据的关键帧图片。舆情系统将所有的关键帧图片按照对应的舆情视频子数据进行分组,构建关键帧图片库。比如,关键帧图片库中包含有:舆情视频A,关键帧图片:a1、b2、c3;舆情视频B,关键帧图片:a2、b3、c4。在构建关键帧图片库后,舆情系统按照上述步骤从单独视频数据中选择单独视频的关键帧图片,并将单独视频的关键帧图片与关键帧图片库进行一一比对。根据单独视频的关键帧图片和单组的舆情视频子数据的关键帧图片的相同或相似数量,除以关键帧图片的总数量,即可计算得到两个视频之间的视频相似值。比如,单独视频和舆情视频子数据的关键帧图片均为5张,其中,相同或相似的关键帧图片为3张,则两个视频的视频相似值为60%。关键帧图片之间的相同或相似判断,可以通过DBSCAN聚类模型进行聚类,如果两张关键帧图片在聚类后存在于同一个群类中,则这两张图片为相同或相似图片。舆情系统将各视频相似值分别与预先设置的第二预设值一一进行比较,将大于第二预设值的视频相似值对应的第二视频作为第三视频。即:如果两个视频之间的视频相似值大于第二预设值,舆情系统则判定两个视频相同或相似,将该第二视频数据选择为第三视频数据。其中,第二预设值优选为80%。舆情系统将第一视频数据和第三视频数据汇总,设定为舆情视频数据。In this embodiment, the designated public data also includes the original source address of each public information. The individual video sub-database is composed of multiple sets of individual video data. The public opinion system first obtains the original source address of each public opinion video sub-data from the public database, and constructs the original source address database according to the original source address of each public opinion video sub-data. Then obtain the original source address of the individual video data, and compare the original source address of the individual video data with each original source address in the original source address database based on each original source address in the original source address database. Determine whether the original source address of the individual video data is included in the original source address database. If it is included in the original source address database, it means that the individual video has the same original source address as the public opinion video and belongs to the same video, that is, the individual video data with the same original source address is selected as the first video data. If it does not have the same original source address as the public opinion video sub-data, it is regarded as the second video data. The public opinion system selects a preset number of play frame pictures from each public opinion video sub-data according to the play time of the video, and groups the preset number of play frame pictures and corresponding public opinion video sub-data into corresponding groups. The public opinion video sub-data uses a clustering model for each group of broadcast frame pictures, such as the DBSCAN clustering model, to form each group, and then randomly select a broadcast frame picture from each group as the key to the corresponding public opinion video sub-data Frame picture. The public opinion system groups all the key frame pictures according to the corresponding public opinion video sub-data to build a key frame picture library. For example, the key frame picture library contains: public opinion video A, key frame pictures: a1, b2, c3; public opinion video B, key frame pictures: a2, b3, c4. After constructing the key frame picture library, the public opinion system selects the key frame pictures of the individual video from the individual video data according to the above steps, and compares the key frame pictures of the individual video with the key frame picture library one by one. According to the same or similar number of key frame pictures of a single video and key frame pictures of a single group of public opinion video sub-data, divided by the total number of key frame pictures, the video similarity value between the two videos can be calculated. For example, if the key frame pictures of the sub-data of the individual video and the public opinion video are both 5, where there are 3 identical or similar key frame pictures, the video similarity value of the two videos is 60%. The same or similar judgment between the key frame pictures can be clustered through the DBSCAN clustering model. If two key frame pictures exist in the same group after clustering, the two pictures are the same or similar pictures. The public opinion system compares each video similarity value with the preset second preset value one by one, and uses the second video corresponding to the video similarity value greater than the second preset value as the third video. That is, if the video similarity value between the two videos is greater than the second preset value, the public opinion system determines that the two videos are the same or similar, and selects the second video data as the third video data. Wherein, the second preset value is preferably 80%. The public opinion system aggregates the first video data and the third video data and sets them as public opinion video data.
进一步的,所述在所述单独文本子数据库中,根据文本相似性筛选得到与所述舆情文本子数据的相似度高于所述第一预设值的舆情文本数据的步骤中,包括:Further, in the separate text sub-database, the step of filtering according to text similarity to obtain public opinion text data whose similarity with the public opinion text sub-data is higher than the first preset value includes:
S4016:根据词性,分别解析各所述舆情文本子数据和各所述单独文本数据,得到各所述舆情文本子数据对应的预设数量的第二舆情关键词,以及各所述单独文本数据对应的所述预设数量的单独关键词;S4016: According to the part of speech, analyze each of the public opinion text sub-data and each of the individual text data to obtain a preset number of second public opinion keywords corresponding to each of the public opinion text sub-data, and the corresponding individual text data The preset number of individual keywords;
S4017:分别将各所述第二舆情关键词和各所述单独关键词进行比对,筛选得到相同关键词;S4017: Compare each of the second public opinion keywords with each of the individual keywords, and filter to obtain the same keywords;
S4018:分别统计各所述相同关键词对应的出现次数;S4018: Count the number of appearances corresponding to each of the same keywords respectively;
S4019:分别将各所述出现次数一一与第一阈值进行比较,选择大于所述第一阈值的出现次数对应的相同关键词作为指定关键词;S4019: Compare each of the occurrence times with a first threshold value one by one, and select the same keywords corresponding to the occurrence times greater than the first threshold value as designated keywords;
S40110:选择包含有所述指定关键词的单独文本数据作为舆情文本数据。S40110: Select the individual text data containing the specified keywords as the public opinion text data.
本实施例中,舆情文本子数据中包括有多个舆情文本,单独文本数据库中包括有多个单独文本。舆情系统首先基于词性对各个舆情文本和单独文本分别进行分词、去除停用词的操作,得到单个舆情文本对应的舆情词库和单个单独文本对应的单独词库。其中,分词是指将文本分解为主语、谓语、宾语等单个词语的操作,分词后,根据主语、谓语和宾语原先在文本中的表述关系,建立有对应的关联关系。比如,“我去北京了”这句话里面主语是“我”,谓语是“去”,宾语是“北京”。在进行分词后,三个词语根据原先在文本中的顺序建立有对应的关联,将主语“我”设置为关键词时,在同一句子中的谓语“去”或宾语“北京”也会与其关联在一起,进行组合。而去除停用词则是去除无意义的词语,比如“啊”、“哦”之类的词语。舆情系统分别统计舆情词库中各个词语在对应的文本中的词频。然后,舆情系统分别计算各个词语的逆向文件频率,即舆情词库中所有词语对应的所有文本数除以包含该词语的文本的数目,再将得到的商取对数得到。各个词语在单个文本中的权重由该词语的词频乘以逆向文件频率得到。单个词语的重要性随着它在文件中出现的次数成正比增加,但同时会随着它在语料库中出现的频率成反比下降,即权重越大,重要程度越高。舆情系统将计算后的单个文本中的各个词语的按权重大小降序选择预设数量,作为第二舆情关键词。同时,舆情系统根据上述方式计算单独词库中各词语的权重,并降序从单独词库中筛选到同预设数量的单独关键词。舆情系统以第二舆情关键词为基础,从单独关键词中筛选出于第二舆情关键词相同的各个相同关键字,并统计各个相同关键字在对应的单个文本中的出现次数,再从单独文本数据中,筛选出现次数大于第一阈值的相同关键词,相同关键词对应的单独文本数据作为舆情文本数据。In this embodiment, the public opinion text sub-data includes multiple public opinion texts, and the separate text database includes multiple individual texts. The public opinion system first performs word segmentation and removal of stop words on each public opinion text and individual text based on part of speech, and obtains a public opinion vocabulary corresponding to a single public opinion text and a separate vocabulary corresponding to a single individual text. Among them, word segmentation refers to the operation of decomposing a text into individual words such as subject, predicate, and object. After word segmentation, a corresponding relationship is established based on the original expression relationship of the subject, predicate, and object in the text. For example, in the sentence "I went to Beijing", the subject is "I", the predicate is "Go", and the object is "Beijing". After word segmentation, the three words are related according to the original order in the text. When the subject "I" is set as a keyword, the predicate "Go" or the object "Beijing" in the same sentence will also be related to it Together, make a combination. And to remove stop words is to remove meaningless words, such as "ah" and "oh". The public opinion system separately counts the word frequency of each word in the public opinion lexicon in the corresponding text. Then, the public opinion system separately calculates the reverse file frequency of each word, that is, the number of all texts corresponding to all words in the public opinion lexicon divided by the number of texts containing the word, and then the obtained quotient is obtained by taking the logarithm. The weight of each word in a single text is obtained by multiplying the word frequency of the word by the reverse document frequency. The importance of a single word increases in proportion to the number of times it appears in the document, but at the same time it decreases in inverse proportion to the frequency of its appearance in the corpus, that is, the greater the weight, the higher the importance. The public opinion system selects a preset number of words in the calculated single text in descending order of weight as the second public opinion keyword. At the same time, the public opinion system calculates the weight of each word in the separate thesaurus according to the above method, and filters from the separate thesaurus to the same preset number of separate keywords in descending order. The public sentiment system is based on the second public opinion keyword, selects the same keywords with the same second public opinion keyword from the individual keywords, and counts the number of times each same keyword appears in the corresponding single text In the text data, the same keywords whose appearance times are greater than the first threshold are filtered, and the individual text data corresponding to the same keywords are used as public opinion text data.
进一步的,所述根据第一预设频率,从第一预设网络平台获取指定公开数据的步骤之前,包括:Further, before the step of obtaining designated public data from the first preset network platform according to the first preset frequency, the method includes:
S7:根据第二预设频率,从第二预设网络平台获取多组搜索数据,所述搜索数据包括搜索信息和所述搜索信息对应的搜索次数;S7: Acquire multiple sets of search data from a second preset network platform according to a second preset frequency, where the search data includes search information and the number of searches corresponding to the search information;
S8:分别将各搜索次数与第二阈值进行比较,将大于所述第二阈值的搜索次数对应的搜索信息作为所述预设舆情话题。S8: Compare each number of searches with a second threshold respectively, and use search information corresponding to the number of searches greater than the second threshold as the preset public opinion topic.
本实施例中,舆情话题可以由开发人员手动输入,也可以由舆情系统自动筛选。舆情系统预先与第二预设网络平台相关联,第二预设网络平台为搜索平台,比如百度、搜狗搜索等。舆情系统根据第二预设频率,从第二预设网络平台获取上次获取时间到当前时间这一时间段内的所有搜索数据。其中,搜索数据包括用户输入的搜索信息以及该搜索信息对应的总的搜索次数。舆情系统调用第二预设次数,并分别将各个搜索信息对应的搜索次数与第二预设次数进行比对,从搜索数据中筛选出搜索次数大于第二预设次数的指定搜索数据。舆情系统自动将指定搜索数据中的搜索信息设置为舆情话题。In this embodiment, the public opinion topic may be manually input by the developer, or may be automatically screened by the public opinion system. The public opinion system is associated with a second preset network platform in advance, and the second preset network platform is a search platform, such as Baidu and Sogou search. According to the second preset frequency, the public opinion system obtains all search data in the time period from the last acquisition time to the current time from the second preset network platform. The search data includes the search information input by the user and the total number of searches corresponding to the search information. The public opinion system calls the second preset times, and compares the search times corresponding to each search information with the second preset times respectively, and filters out the designated search data whose search times are greater than the second preset times from the search data. The public opinion system automatically sets the search information in the specified search data as public opinion topics.
进一步的,所述指定公开数据包括信息的公开时间,所述舆情系统与至少一个预设终端预先关联,所述根据所述舆情数据,得到所述预设舆情话题的热度变化趋势的步骤之后,包括:Further, the designated public data includes the public time of the information, the public opinion system is pre-associated with at least one preset terminal, and after the step of obtaining the trend of the preset public opinion topic based on the public opinion data, include:
S9:判断所述热度变化趋势是否满足触发自动推送的条件;S9: Determine whether the heat change trend meets the conditions for triggering automatic push;
S10:若满足触发自动推送的条件,则从所述舆情数据中,筛选得到指定的推送舆情数据,所述推送舆情数据是公开时间距离当前时间最短的数据;S10: If the conditions for triggering automatic push are met, filter the public opinion data to obtain the designated push public opinion data, and the push public opinion data is the data with the shortest time of publication from the current time;
S11:获取所述推送舆情数据的链接,并生成包含有所述推送舆情数据的链接的推送信息;S11: Obtain the link of the pushed public opinion data, and generate push information including the link of the pushed public opinion data;
S12:将所述推送信息分别发送到各所述预设终端。S12: Send the push information to each of the preset terminals respectively.
本实施例中,舆情系统解析舆情数据,可以得到舆情话题的热度变化趋势。舆情系统综合关注热度和地域的扩散程度,可以直观的得到该预设舆情话题的热度变化趋势。舆情系统根据热度变化趋势中的关注热度和地域的扩散程度,判断舆情话题的热度变化趋势是否满足触发自动推送的条件。如果热度变化趋势中的关注热度超过预设数量,同时地域的扩散程度超过预设范围,比如转发数、评论数大于50万,同时地域的扩散程度超过10万平方公里,则判断该舆情话题的热度变化趋势满足触发自动推送的条件。舆情系统从舆情数据中筛选出包括有文字和视频,同时公开发布的时间与当前时间相隔最短的舆情视频文本组合数据,作为推送舆情数据。舆情系统获取该推送舆情数据的链接,并生成包含有推送舆情数据的链接的推送信息,然后将推送信息自动发送到预设终端,以便让公众可以及时了解当前的人舆情话题。In this embodiment, the public opinion system analyzes the public opinion data, and can obtain the trend of the popularity of the public opinion topic. The public opinion system comprehensively pays attention to the degree of popularity and geographical spread, and can intuitively obtain the trend of the preset public opinion topic. The public opinion system judges whether the trend of public opinion topics meets the conditions for triggering automatic push according to the degree of attention and geographical spread in the trend of heat changes. If the focus of attention in the popularity trend exceeds the preset number, and the spread of the region exceeds the preset range, such as the number of reposts and comments are greater than 500,000, and the spread of the region exceeds 100,000 square kilometers, then judge the topic of public opinion The heat change trend meets the conditions for triggering automatic push. The public opinion system screens out the public opinion video text combination data that includes text and video from the public opinion data, and at the same time, the public opinion video text data with the shortest time interval between the public release and the current time is used as the push public opinion data. The public opinion system obtains the link to push the public opinion data, generates push information containing the link to push the public opinion data, and then automatically sends the push information to the preset terminal so that the public can understand the current public opinion topic in time.
本实施例提供的一种视频文本组合数据的舆情跟踪方法,通过将舆情信息中的文本和视频相结合进行跟踪,实现对舆情信息的全面覆盖,有效提高对舆情信息的热度变化趋势的解析准确度。The public opinion tracking method of video text combined data provided in this embodiment realizes comprehensive coverage of public opinion information by combining text and video in public opinion information for tracking, and effectively improves the accuracy of the analysis of the trend of public opinion information. degree.
参照图2,本申请一实施例中还提供了一种视频文本组合数据的舆情跟踪装置,包括:Referring to Figure 2, an embodiment of the present application also provides a public opinion tracking device for video text combined data, including:
第一获取模块1,用于根据第一预设频率,从第一预设网络平台获取指定公开数据;The first obtaining module 1 is configured to obtain designated public data from a first preset network platform according to a first preset frequency;
构建模块2,用于根据所述指定公开数据构建公开数据库;The construction module 2 is used to construct a public database according to the specified public data;
第一筛选模块3,用于在所述公开数据库中,筛选得到所述预设舆情话题对应的舆情视频文本组合数据;The first screening module 3 is configured to filter the public opinion video text combination data corresponding to the preset public opinion topic in the public database;
第二筛选模块4,用于根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;The second screening module 4 is configured to filter the public opinion text data whose similarity with the text data of the public opinion video text combination data is higher than the first preset value in the public database according to the text similarity; and according to the video The source address and the video similarity, and the public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value is obtained by screening in the public database;
第一生成模块5,用于根据所述舆情视频文本组合数据、所述舆情文本数据和所述舆情视频数据得到舆情数据;The first generating module 5 is configured to obtain public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
解析模块6,用于根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。The analysis module 6 is configured to obtain the trend of the popularity change of the preset public opinion topic according to the public opinion data.
本实施例中,舆情跟踪装置各模块的实施例与上述对应的方法步骤一致,在此不作详述。In this embodiment, the embodiments of the modules of the public opinion tracking device are consistent with the corresponding method steps described above, and will not be described in detail here.
本实施例提供的一种视频文本组合数据的舆情跟踪装置,通过将舆情信息中的文本和视频相结合进行跟踪,实现对舆情信息的全面覆盖,有效提高对舆情信息的热度变化趋势的解析准确度。The public opinion tracking device for video text combined data provided in this embodiment realizes comprehensive coverage of public opinion information by combining text and video in public opinion information for tracking, and effectively improves the accuracy of analyzing the trend of public opinion information. degree.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储公开数据库等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时执行如上述各方法的实施例的流程。本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。Referring to FIG. 3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3. The computer equipment includes a processor, a memory, a network interface and a database connected by a system bus. Among them, the computer designed processor is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store data such as public databases. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, the processes of the foregoing method embodiments are executed. Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
本申请一实施例还提供一种计算机非易失性可读存储介质,其上存储有计算机可读指令,计算机可读指令被处理器执行时,执行如上述各方法的实施例的流程。以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。An embodiment of the present application further provides a computer non-volatile readable storage medium, on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the processes of the above-mentioned method embodiments are executed. The above are only the preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly applied to other related The technical field is equally included in the scope of patent protection of this application.

Claims (20)

  1. 一种视频文本组合数据的舆情跟踪方法,其特征在于,包括:A public opinion tracking method for video text combined data, which is characterized in that it comprises:
    根据第一预设频率,从第一预设网络平台获取指定公开数据,所述指定公开数据包括所述第一预设网络平台的所有所述公开信息以及各所述公开信息分别对应的传播路径,所述公开信息包括只有文本的单独文本信息、只有视频的单独视频信息、视频和文本相关联的视频文本组合信息;According to a first preset frequency, obtain designated public data from a first preset network platform, the designated public data including all the public information of the first preset network platform and the propagation path corresponding to each public information , The public information includes individual text information with only text, individual video information with only video, and combined video and text information related to the video and the text;
    根据所述指定公开数据构建公开数据库;Construct a public database according to the specified public data;
    在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据;Screening and obtaining public opinion video text combination data corresponding to the preset public opinion topic in the public database;
    根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than the first preset value is filtered in the public database; and according to the video source address and the video similarity, The public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value is filtered from the public database;
    根据所述舆情视频文本组合数据、所述舆情文本数据和所述舆情视频数据得到舆情数据;Obtaining public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
    根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。According to the public opinion data, the popularity trend of the preset public opinion topic is obtained.
  2. 根据权利要求1所述的视频文本组合数据的舆情跟踪方法,其特征在于,所述公开数据库包括单独文本子数据库、单独视频子数据库和视频文本组合子数据库,所述单独文本子数据库为多组单独的文本数据组成的数据库,所述单独视频子数据库为多组单独的视频数据组成的数据库,所述视频文本组合子数据库为多组视频和文本一一对应关联的数据组成的数据库;所述在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据的步骤,包括:The method for tracking public opinion of video text combined data according to claim 1, wherein the public database includes a single text sub-database, a single video sub-database, and a video text combined sub-database, and the single text sub-database is a group of multiple A database composed of separate text data, the separate video sub-database is a database composed of multiple sets of separate video data, and the video-text combination sub-database is a database composed of multiple sets of video and text related data in a one-to-one correspondence; The step of screening and obtaining the public opinion video text combination data corresponding to the preset public opinion topic in the public database includes:
    在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据,其中,每一个所述舆情视频文本组合数据由一条舆情视频子数据和一条舆情文本子数据对应关联组成;In the video text combination sub-database, the public opinion video text combination data is filtered according to the keywords of the preset public opinion topic, wherein each of the public opinion video text combination data consists of one public opinion video sub-data and one public opinion Corresponding association composition of text sub-data;
    所述根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据的步骤包括:According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than a first preset value is filtered in the public database; and according to the video source address and the video similarity The step of screening the public opinion video data with the video data of the public opinion video text combination data whose similarity is higher than a second preset value in the public database includes:
    在所述单独文本子数据库中,根据文本相似性筛选得到与所述舆情文本子数据的相似度高于所述第一预设值的舆情文本数据,并在所述单独视频子数据库中,根据视频源地址和视频相似性筛选得到与所述舆情视频子数据的相似度高于所述第二预设值的舆情视频数据。In the separate text sub-database, the public opinion text data whose similarity with the public opinion text sub-data is higher than the first preset value is filtered according to the text similarity, and in the separate video sub-database, according to The video source address and the video similarity are screened to obtain the public opinion video data whose similarity with the public opinion video sub-data is higher than the second preset value.
  3. 根据权利要求2所述的视频文本组合数据的舆情跟踪方法,其特征在于,所述在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据的步骤,包括:The method for tracking public opinion of video text combination data according to claim 2, wherein said public opinion video text combination is obtained by filtering according to keywords of said preset public opinion topic in said video text combination sub-database The data steps include:
    解析所述预设舆情话题,得到至少一个第一舆情关键词;Parse the preset public opinion topic to obtain at least one first public opinion keyword;
    在所述视频文本组合子数据库中,筛选得到包含有第一舆情关键词的至少一组视频文本组合子数据,所述视频文本组合子数据由文本子数据和视频子数据对应组成;In the video text combination subdatabase, at least one group of video text combination subdata containing the first public opinion keyword is obtained by filtering, and the video text combination subdata is composed of text subdata and video subdata correspondingly;
    分别计算各所述视频子数据的出现次数;Respectively calculating the number of appearances of each of the video sub-data;
    调取预设次数,并分别将各所述出现次数一一与所述预设次数进行比较,选择大于所述预设次数的出现次数对应的视频子数据作为舆情视频子数据;Retrieve the preset times, compare each of the appearance times with the preset times, and select the video sub-data corresponding to the appearance times greater than the preset times as the public opinion video sub-data;
    从所述视频文本组合子数据库中,分别筛选各所述舆情视频子数据对应的文本子数据作为舆情文本子数据;From the video text combination sub-database, respectively filter the text sub-data corresponding to each of the public opinion video sub-data as the public opinion text sub-data;
    分别将各所述舆情文本子数据和各所述舆情视频子数据一一对应,得到所述舆情视频文本组合数据。Each of the public opinion text sub-data and each of the public opinion video sub-data are respectively one-to-one corresponding to obtain the public opinion video text combined data.
  4. 根据权利要求3所述的视频文本组合数据的舆情跟踪方法,其特征在于,所述分别将各所述舆情文本子数据和各所述舆情视频子数据一一对应,得到所述舆情视频文本组合数据的步骤,包括:The method for tracking public opinion of video text combined data according to claim 3, wherein said public opinion text sub-data and each said public opinion video sub-data are respectively corresponded one-to-one to obtain said public opinion video text combination The data steps include:
    分别获取各所述舆情文本子数据的公开地址和各所述舆情视频子数据的公开地址;Obtain the public address of each public opinion text sub-data and the public address of each public opinion video sub-data;
    将具有同一公开地址的所述舆情文本子数据和所述舆情视频子数据相互关联分组,以实现将各所述舆情文本子数据和各所述舆情视频子数据一一对应;Grouping the public opinion text sub-data and the public opinion video sub-data with the same public address in association with each other, so as to achieve a one-to-one correspondence between each of the public opinion text sub-data and each of the public opinion video sub-data;
    根据相互关联分组后的各所述舆情文本子数据和各所述舆情视频子数据,得到所述舆情视频文本组合数据。The public opinion video text combined data is obtained according to each of the public opinion text sub-data and each of the public opinion video sub-data after being associated and grouped.
  5. 根据权利要求2所述的视频文本组合数据的舆情跟踪方法,其特征在于,所述单独视频子数据库包括多组单独视频数据,所述在所述单独视频子数据库中,根据视频源地址和视频相似性筛选得到与所述舆情视频子数据的相似度高于所述第二预设值的舆情视频数据的步骤中,包括:The public opinion tracking method of video text combined data according to claim 2, wherein the separate video sub-database includes multiple sets of separate video data, and the separate video sub-database is based on the video source address and the video The step of similarity screening to obtain public opinion video data whose similarity with the public opinion video sub-data is higher than the second preset value includes:
    分别获取各所述单独视频数据和所述舆情视频子数据的原始源地址;Obtain the original source addresses of each of the individual video data and the public opinion video sub-data;
    分别将各所述单独视频数据的原始源地址与所述舆情视频子数据的原始源地址一一进行比较,选择原始源地址一致的单独视频数据作为第一视频数据,并选择原始源地址不一致的单独视频数据作为第二视频数据;Compare the original source address of each of the individual video data with the original source address of the public opinion video sub-data one by one, select the individual video data with the same original source address as the first video data, and select those with inconsistent original source addresses Separate video data as the second video data;
    分别计算各所述第二视频数据与所述舆情视频子数据之间的视频相似值;Respectively calculating video similarity values between each of the second video data and the public opinion video sub-data;
    分别将各所述视频相似值一一与所述第二预设值进行比较,选择大于所述第二预设值的视频相似值对应的第二视频数据作为第三视频数据;Respectively comparing each of the video similarity values with the second preset value one by one, and selecting second video data corresponding to a video similarity value greater than the second preset value as the third video data;
    将所述第一视频数据和所述第三视频数据作为舆情视频数据。Use the first video data and the third video data as public opinion video data.
  6. 根据权利要求要求2所述的视频文本组合数据的舆情跟踪方法,其特征在于,所述在所述单独文本子数据库中,根据文本相似性筛选得到与所述舆情文本子数据的相似度高于所述第一预设值的舆情文本数据的步骤中,包括:The method for tracking public opinion of video text combined data according to claim 2, wherein in the separate text sub-database, the similarity with the public opinion text sub-data is higher than that obtained by filtering according to text similarity. The step of the public opinion text data of the first preset value includes:
    根据词性,分别解析各所述舆情文本子数据和各所述单独文本数据,得到各所述舆情文本子数据对应的预设数量的第二舆情关键词,以及各所述单独文本数据对应的所述预设数量的单独关键词;According to the part of speech, each of the public opinion text sub-data and each of the individual text data are respectively analyzed to obtain a preset number of second public opinion keywords corresponding to each of the public opinion text sub-data, and all the individual text data corresponding to each. Separate keywords describing the preset number;
    分别将各所述第二舆情关键词和各所述单独关键词进行比对,筛选得到相同关键词;Compare each of the second public opinion keywords with each of the individual keywords, and filter to obtain the same keywords;
    分别统计各所述相同关键词对应的出现次数;Respectively count the number of appearances corresponding to each of the same keywords;
    分别将各所述出现次数一一与预设的第一阈值进行比较,选择大于所述第一阈值的出现次数对应的相同关键词作为指定关键词;Respectively comparing each of the occurrence times with a preset first threshold, and selecting the same keywords corresponding to the occurrence times greater than the first threshold as the designated keywords;
    选择包含有所述指定关键词的单独文本数据作为舆情文本数据。Select individual text data containing the specified keywords as public opinion text data.
  7. 根据权利要求1所述的视频文本组合数据的舆情跟踪方法,其特征在于,所述指定公开数据包括信息的公开时间,所述舆情系统与至少一个预设终端预先关联,所述根据所述舆情数据,得到所述预设舆情话题的热度变化趋势的步骤之后,包括:The public opinion tracking method of video-text combined data according to claim 1, wherein the designated public data includes the public time of the information, the public opinion system is pre-associated with at least one preset terminal, and the public opinion is based on the public opinion. Data, after the step of obtaining the trend of change in the popularity of the preset public opinion topic includes:
    判断所述热度变化趋势是否满足触发自动推送的条件;Judging whether the trend of heat change meets the conditions for triggering automatic push;
    若满足触发自动推送的条件,则从所述舆情数据中,筛选得到指定的推送舆情数据,所述推送舆情数据是公开时间距离当前时间最短的数据;If the conditions for triggering automatic push are met, the designated push public opinion data is filtered from the public opinion data, and the push public opinion data is the data with the shortest time of publication from the current time;
    获取所述推送舆情数据的链接,并生成包含有所述推送舆情数据的链接的推送信息;Acquiring the link of the pushed public opinion data, and generating push information including the link of the pushed public opinion data;
    将所述推送信息分别发送到各所述预设终端。Sending the push information to each of the preset terminals respectively.
  8. 一种视频文本组合数据的舆情跟踪装置,其特征在于,包括:A public opinion tracking device for video text combined data is characterized in that it comprises:
    第一获取模块,用于根据第一预设频率,从第一预设网络平台获取指定公开数据;The first obtaining module is configured to obtain designated public data from the first preset network platform according to the first preset frequency;
    构建模块,用于根据所述指定公开数据构建公开数据库;A building module for building a public database based on the designated public data;
    第一筛选模块,用于在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据;The first screening module is configured to filter the public opinion video text combination data corresponding to the preset public opinion topic in the public database;
    第二筛选模块,用于根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;The second filtering module is configured to filter the public opinion text data with the text data of the public opinion video text combined data that is higher than the first preset value in the public database according to the text similarity; and according to the video source Address and video similarity, filtering the public opinion video data with the video data of the public opinion video text combination data that has a similarity higher than a second preset value from the public database;
    第一生成模块,用于根据所述舆情视频文本组合数据、所述舆情文本数据和所述舆情视频数据得到舆情数据;The first generating module is configured to obtain public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
    解析模块,用于根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。The analysis module is used to obtain the trend of the hotness change of the preset public opinion topic according to the public opinion data.
  9. 根据权利要求8所述的视频文本组合数据的舆情跟踪装置,其特征在于,所述第一筛选模块,包括:The public opinion tracking device for video text combined data according to claim 8, wherein the first screening module comprises:
    第一筛选子模块,用于在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据;The first screening sub-module is configured to obtain the public opinion video text combination data by filtering according to the keywords of the preset public opinion topic in the video text combination sub-database;
    所述第二筛选模块,包括,The second screening module includes:
    第二筛选子模块,用于在所述单独文本子数据库中,根据文本相似性筛选得到与所述舆情文本子数据的相似度高于所述第一预设值的舆情文本数据;The second screening sub-module is configured to filter in the separate text sub-database according to the text similarity to obtain public opinion text data whose similarity with the public opinion text sub-data is higher than the first preset value;
    第三筛选子模块,用于在所述单独视频子数据库中,根据视频源地址和视频相似性筛选得到与所述舆情视频子数据的相似度高于所述第二预设值的舆情视频数据。The third screening sub-module is used to filter and obtain public opinion video data whose similarity with the public opinion video sub-data is higher than the second preset value in the separate video sub-database according to the video source address and video similarity .
  10. 根据权利要求9所述的视频文本组合数据的舆情跟踪装置,其特征在于,所述第一筛选子模块,包括:The public opinion tracking device for video text combined data according to claim 9, wherein the first screening submodule comprises:
    第一解析单元,用于解析所述预设舆情话题,得到至少一个第一舆情关键词;The first analysis unit is configured to analyze the preset public opinion topic to obtain at least one first public opinion keyword;
    第一筛选单元,用于在所述视频文本组合子数据库中,筛选得到包含有第一舆情关键词的至少一组视频文本组合数据作为视频文本组合子数据;The first screening unit is configured to filter in the video text combination sub-database to obtain at least one group of video text combination data containing the first public opinion keyword as the video text combination sub-data;
    第一计算单元,用于分别计算各所述视频子数据的出现次数;The first calculation unit is configured to calculate the number of appearances of each of the video sub-data;
    第一选择单元,用于调取预设次数,并分别将各所述出现次数一一与所述预设次数进行比较,选择大于所述预设次数的出现次数对应的视频子数据作为舆情视频子数据;The first selection unit is configured to retrieve a preset number of times, and respectively compare each of the number of appearances with the preset number of times, and select the video sub-data corresponding to the number of appearances greater than the preset number as public opinion Video sub-data;
    第二筛选单元,用于从所述视频文本组合子数据库中,分别筛选各所述舆情视频子数据对应的文本子数据作为舆情文本子数据;The second filtering unit is configured to filter the text sub-data corresponding to each of the public opinion video sub-data as the public opinion text sub-data from the video text combination sub-database;
    对应单元,用于分别将各所述舆情文本子数据和各所述舆情视频子数据一一对应,得到所述舆情视频文本组合数据。The corresponding unit is used to respectively correspond each of the public opinion text sub-data with each of the public opinion video sub-data to obtain the public opinion video text combined data.
  11. 根据权利要求10所述的视频文本组合数据的舆情跟踪装置,其特征在于,所述对应单元包括:The public opinion tracking device for video text combined data according to claim 10, wherein the corresponding unit comprises:
    获取子单元,用于分别获取各所述舆情文本子数据的公开地址和各所述舆情视频子数据的公开地址;The obtaining subunit is used to obtain the public address of each public opinion text sub-data and the public address of each public opinion video sub-data;
    关联子单元,用于将具有同一公开地址的所述舆情文本子数据和所述舆情视频子数据相互关联分组,以实现将各所述舆情文本子数据和各所述舆情视频子数据一一对应;The association subunit is used to associate and group the public opinion text subdata and the public opinion video subdata with the same public address, so as to realize one-to-one correspondence between each public opinion text subdata and each public opinion video subdata ;
    生成子单元,用于根据相互关联分组后的各所述舆情文本子数据和各所述舆情视频子数据,得到所述舆情视频文本组合数据。The generating subunit is configured to obtain the public opinion video text combined data according to each of the public opinion text sub-data and each of the public opinion video sub-data after being associated and grouped.
  12. 根据权利要求9所述的视频文本组合数据的舆情跟踪装置,其特征在于,所述第三筛选子模块,包括:The public opinion tracking device for video text combined data according to claim 9, wherein the third screening sub-module comprises:
    获取单元,用于分别获取各所述单独视频数据和所述舆情视频子数据的原始源地址;An obtaining unit, configured to obtain the original source addresses of each of the individual video data and the public opinion video sub-data;
    第二比较单元,用于分别将各所述单独视频数据的原始源地址与所述舆情视频子数据的原始源地址一一进行比较,选择原始源地址一致的单独视频数据作为第一视频数据,并选择原始源地址不一致的单独视频数据作为第二视频数据;The second comparison unit is configured to compare the original source addresses of the individual video data with the original source addresses of the public opinion video sub-data one by one, and select the individual video data with the same original source addresses as the first video data, And select separate video data with inconsistent original source addresses as the second video data;
    第二计算单元,用于分别计算各所述第二视频数据与所述舆情视频子数据之间的视频相似值;A second calculation unit, configured to calculate video similarity values between each of the second video data and the public opinion video sub-data;
    第二选择单元,用于调取第二预设值,并分别将各所述视频相似值一一与所述第二预设值进行比较,选择大于所述第二预设值的视频相似值对应的第二视频数据作为第三视频数据;The second selection unit is configured to retrieve a second preset value, compare each of the video similarity values with the second preset value one by one, and select a video similarity value greater than the second preset value The corresponding second video data is used as the third video data;
    设置单元,用于将所述第一视频数据和所述第三视频数据作为舆情视频数据。The setting unit is configured to use the first video data and the third video data as public opinion video data.
  13. 根据权利要求9所述的视频文本组合数据的舆情跟踪装置,其特征在于,所述第二筛选子模块,还包括:The public opinion tracking device for combined video text data according to claim 9, wherein the second screening submodule further comprises:
    第二解析单元,用于根据词性,分别解析各所述舆情文本子数据和各所述单独文本数据,得到各所述舆情文本子数据对应的预设数量的第二舆情关键词,以及各所述单独文本数据对应的所述预设数量的单独关键词;The second parsing unit is used to analyze each of the public opinion text sub-data and each of the individual text data according to the part of speech to obtain a preset number of second public opinion keywords corresponding to each of the public opinion text sub-data, and each post The preset number of individual keywords corresponding to the individual text data;
    比对单元,用于分别将各所述第二舆情关键词和各所述单独关键词进行比对,筛选得到相同关键词;The comparison unit is used to compare each of the second public opinion keywords with each of the individual keywords, and obtain the same keywords through screening;
    统计单元,用于分别统计各所述相同关键词对应的出现次数;The statistical unit is used to separately count the number of appearances corresponding to each of the same keywords;
    第三选择单元,用于分别将各所述出现次数一一与预设的第一阈值进行比较,选择大于所述第一阈值的出现次数对应的相同关键词作为指定关键词;The third selection unit is configured to compare each of the occurrence times with a preset first threshold, and select the same keywords corresponding to the occurrence times greater than the first threshold as designated keywords;
    第四选择单元,用于选择包含有所述指定关键词的单独文本数据作为舆情文本数据。The fourth selection unit is used to select individual text data containing the designated keywords as public opinion text data.
  14. 根据权利要求8所述的视频文本组合数据的舆情跟踪装置,其特征在于,所述舆情跟踪装置还包括:The public opinion tracking device for video text combined data according to claim 8, wherein the public opinion tracking device further comprises:
    判断模块,用于判断所述热度变化趋势是否满足触发自动推送的条件;A judging module for judging whether the heat change trend meets the conditions for triggering automatic push;
    第三筛选模块,用于从所述舆情数据中,筛选距离当前时间最短的公开时间对应的舆情数据作为推送舆情数据;The third screening module is used to screen the public opinion data corresponding to the public opinion with the shortest distance to the current time from the public opinion data as the push public opinion data;
    第二生成模块,用于获取所述推送舆情数据的链接,并生成包含有所述推送舆情数据的链接的推送信息;The second generating module is used to obtain the link of the pushed public opinion data and generate push information including the link of the pushed public opinion data;
    发送模块,用于将所述推送信息分别发送到各所述预设终端。The sending module is configured to send the push information to each of the preset terminals respectively.
  15. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,其特征在于,所述处理器执行时实现视频文本组合数据的舆情跟踪方法,该舆情跟踪方法包括:A computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and is characterized in that when the processor is executed, a public opinion tracking method for video text combined data is realized, and the public opinion tracking method includes:
    根据第一预设频率,从第一预设网络平台获取指定公开数据,所述指定公开数据包括所述第一预设网络平台的所有所述公开信息以及各所述公开信息分别对应的传播路径,所述公开信息包括只有文本的单独文本信息、只有视频的单独视频信息、视频和文本相关联的视频文本组合信息;According to a first preset frequency, obtain designated public data from a first preset network platform, the designated public data including all the public information of the first preset network platform and the propagation path corresponding to each public information , The public information includes individual text information with only text, individual video information with only video, and combined video and text information related to the video and the text;
    根据所述指定公开数据构建公开数据库;Construct a public database according to the specified public data;
    在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据;Screening and obtaining public opinion video text combination data corresponding to the preset public opinion topic in the public database;
    根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than the first preset value is filtered in the public database; and according to the video source address and the video similarity, The public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value is filtered from the public database;
    根据所述舆情视频文本组合数据、所述舆情文本数据和所述舆情视频数据得到舆情数据;Obtaining public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
    根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。According to the public opinion data, the popularity trend of the preset public opinion topic is obtained.
  16. 根据权利要求15所述的计算机设备,其特征在于,所述公开数据库包括单独文本子数据库、单独视频子数据库和视频文本组合子数据库,所述单独文本子数据库为多组单独的文本数据组成的数据库,所述单独视频子数据库为多组单独的视频数据组成的数据库,所述视频文本组合子数据库为多组视频和文本一一对应关联的数据组成的数据库;所述处理器执行在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据的步骤,包括:The computer device according to claim 15, wherein the public database includes a separate text sub-database, a separate video sub-database, and a video text combined sub-database, and the separate text sub-database is composed of multiple sets of separate text data A database, the separate video sub-database is a database composed of multiple sets of separate video data, the video-text combination sub-database is a database composed of multiple sets of video and text associated data in a one-to-one correspondence; the processor executes in the The step of screening and obtaining the public opinion video text combination data corresponding to the preset public opinion topic in the public database includes:
    在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据,其中,每一个所述舆情视频文本组合数据由一条舆情视频子数据和一条舆情文本子数据对应关联组成;In the video text combination sub-database, the public opinion video text combination data is filtered according to the keywords of the preset public opinion topic, wherein each of the public opinion video text combination data consists of one public opinion video sub-data and one public opinion Corresponding association composition of text sub-data;
    所述根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据的步骤包括:According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than a first preset value is filtered in the public database; and according to the video source address and the video similarity The step of screening the public opinion video data with the video data of the public opinion video text combination data whose similarity is higher than a second preset value in the public database includes:
    在所述单独文本子数据库中,根据文本相似性筛选得到与所述舆情文本子数据的相似度高于所述第一预设值的舆情文本数据,并在所述单独视频子数据库中,根据视频源地址和视频相似性筛选得到与所述舆情视频子数据的相似度高于所述第二预设值的舆情视频数据。In the separate text sub-database, the public opinion text data whose similarity with the public opinion text sub-data is higher than the first preset value is filtered according to the text similarity, and in the separate video sub-database, according to The video source address and the video similarity are screened to obtain the public opinion video data whose similarity with the public opinion video sub-data is higher than the second preset value.
  17. 根据权利要求16所述的计算机设备,其特征在于,所述处理器执行在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据的步骤,包括:The computer device according to claim 16, wherein the processor executes the step of obtaining the public opinion video text combination data in the video text combination sub-database, filtering according to the keywords of the preset public opinion topic ,include:
    解析所述预设舆情话题,得到至少一个第一舆情关键词;Parse the preset public opinion topic to obtain at least one first public opinion keyword;
    在所述视频文本组合子数据库中,筛选得到包含有第一舆情关键词的至少一组视频文本组合子数据,所述视频文本组合子数据由文本子数据和视频子数据对应组成;In the video text combination subdatabase, at least one group of video text combination subdata containing the first public opinion keyword is obtained by filtering, and the video text combination subdata is composed of text subdata and video subdata correspondingly;
    分别计算各所述视频子数据的出现次数;Respectively calculating the number of appearances of each of the video sub-data;
    调取预设次数,并分别将各所述出现次数一一与所述预设次数进行比较,选择大于所述预设次数的出现次数对应的视频子数据作为舆情视频子数据;Retrieve the preset times, compare each of the appearance times with the preset times, and select the video sub-data corresponding to the appearance times greater than the preset times as the public opinion video sub-data;
    从所述视频文本组合子数据库中,分别筛选各所述舆情视频子数据对应的文本子数据作为舆情文本子数据;From the video text combination sub-database, respectively filter the text sub-data corresponding to each of the public opinion video sub-data as the public opinion text sub-data;
    分别将各所述舆情文本子数据和各所述舆情视频子数据一一对应,得到所述舆情视频文本组合数据。Each of the public opinion text sub-data and each of the public opinion video sub-data are respectively one-to-one corresponding to obtain the public opinion video text combined data.
  18. 一种计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现视频文本组合数据的舆情跟踪方法,该舆情跟踪方法包括:A computer non-volatile readable storage medium having computer readable instructions stored thereon, wherein the computer readable instructions are executed by a processor to implement a public opinion tracking method for video text combined data, and the public opinion tracking method include:
    根据第一预设频率,从第一预设网络平台获取指定公开数据,所述指定公开数据包括所述第一预设网络平台的所有所述公开信息以及各所述公开信息分别对应的传播路径,所述公开信息包括只有文本的单独文本信息、只有视频的单独视频信息、视频和文本相关联的视频文本组合信息;According to a first preset frequency, obtain designated public data from a first preset network platform, the designated public data including all the public information of the first preset network platform and the propagation path corresponding to each public information , The public information includes individual text information with only text, individual video information with only video, and combined video and text information related to the video and the text;
    根据所述指定公开数据构建公开数据库;Construct a public database according to the specified public data;
    在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据;Screening and obtaining public opinion video text combination data corresponding to the preset public opinion topic in the public database;
    根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据;According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than the first preset value is filtered in the public database; and according to the video source address and the video similarity, The public opinion video data whose similarity with the video data of the public opinion video text combined data is higher than a second preset value is filtered from the public database;
    根据所述舆情视频文本组合数据、所述舆情文本数据和所述舆情视频数据得到舆情数据;Obtaining public opinion data according to the public opinion video text combination data, the public opinion text data, and the public opinion video data;
    根据所述舆情数据,得到所述预设舆情话题的热度变化趋势。According to the public opinion data, the popularity trend of the preset public opinion topic is obtained.
  19. 根据权利要求18所述的计算机非易失性可读存储介质,其特征在于,所述公开数据库包括单独文本子数据库、单独视频子数据库和视频文本组合子数据库,所述单独文本子数据库为多组单独的文本数据组成的数据库,所述单独视频子数据库为多组单独的视频数据组成的数据库,所述视频文本组合子数据库为多组视频和文本一一对应关联的数据组成的数据库;所述处理器执行在所述公开数据库中筛选得到所述预设舆情话题对应的舆情视频文本组合数据的步骤,包括:The computer non-volatile readable storage medium according to claim 18, wherein the public database includes a separate text sub-database, a separate video sub-database, and a video text combined sub-database, and the separate text sub-database is multiple A database composed of separate sets of text data, the separate video sub-database is a database composed of multiple sets of separate video data, and the video-text combined sub-database is a database composed of multiple sets of video and text related data in a one-to-one correspondence; The processor executes the step of screening the public opinion video text combination data corresponding to the preset public opinion topic in the public database, including:
    在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据,其中,每一个所述舆情视频文本组合数据由一条舆情视频子数据和一条舆情文本子数据对应关联组成;In the video text combination sub-database, the public opinion video text combination data is filtered according to the keywords of the preset public opinion topic, wherein each of the public opinion video text combination data consists of one public opinion video sub-data and one public opinion Corresponding association composition of text sub-data;
    所述根据文本相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的文本数据的相似度高于第一预设值的舆情文本数据;以及根据视频源地址和视频相似性,在所述公开数据库中筛选得到与所述舆情视频文本组合数据的视频数据的相似度高于第二预设值的舆情视频数据的步骤包括:According to the text similarity, the public opinion text data whose similarity with the text data of the public opinion video text combined data is higher than a first preset value is filtered in the public database; and according to the video source address and the video similarity The step of screening the public opinion video data with the video data of the public opinion video text combination data whose similarity is higher than a second preset value in the public database includes:
    在所述单独文本子数据库中,根据文本相似性筛选得到与所述舆情文本子数据的相似度高于所述第一预设值的舆情文本数据,并在所述单独视频子数据库中,根据视频源地址和视频相似性筛选得到与所述舆情视频子数据的相似度高于所述第二预设值的舆情视频数据。In the separate text sub-database, the public opinion text data whose similarity with the public opinion text sub-data is higher than the first preset value is filtered according to the text similarity, and in the separate video sub-database, according to The video source address and the video similarity are screened to obtain the public opinion video data whose similarity with the public opinion video sub-data is higher than the second preset value.
  20. 根据权利要求19所述的计算机非易失性可读存储介质,其特征在于,所述处理器执行在所述视频文本组合子数据库中,根据所述预设舆情话题的关键词筛选得到所述舆情视频文本组合数据的步骤,包括:The computer non-volatile readable storage medium according to claim 19, wherein the processor executes in the video text combination sub-database to filter according to the keywords of the preset public opinion topic to obtain the The steps of combining public opinion video text data include:
    解析所述预设舆情话题,得到至少一个第一舆情关键词;Parse the preset public opinion topic to obtain at least one first public opinion keyword;
    在所述视频文本组合子数据库中,筛选得到包含有第一舆情关键词的至少一组视频文本组合子数据,所述视频文本组合子数据由文本子数据和视频子数据对应组成;In the video text combination subdatabase, at least one group of video text combination subdata containing the first public opinion keyword is obtained by filtering, and the video text combination subdata is composed of text subdata and video subdata correspondingly;
    分别计算各所述视频子数据的出现次数;Respectively calculating the number of appearances of each of the video sub-data;
    调取预设次数,并分别将各所述出现次数一一与所述预设次数进行比较,选择大于所述预设次数的出现次数对应的视频子数据作为舆情视频子数据;Retrieve the preset times, compare each of the appearance times with the preset times, and select the video sub-data corresponding to the appearance times greater than the preset times as the public opinion video sub-data;
    从所述视频文本组合子数据库中,分别筛选各所述舆情视频子数据对应的文本子数据作为舆情文本子数据;From the video text combination sub-database, respectively filter the text sub-data corresponding to each of the public opinion video sub-data as the public opinion text sub-data;
    分别将各所述舆情文本子数据和各所述舆情视频子数据一一对应,得到所述舆情视频文本组合数据。Each of the public opinion text sub-data and each of the public opinion video sub-data are respectively one-to-one corresponding to obtain the public opinion video text combined data.
PCT/CN2019/089609 2019-01-31 2019-05-31 Public opinion tracking method and device for combined video-text data, and computer apparatus WO2020155496A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910100413.XA CN109933709B (en) 2019-01-31 2019-01-31 Public opinion tracking method and device for video text combined data and computer equipment
CN201910100413.X 2019-01-31

Publications (1)

Publication Number Publication Date
WO2020155496A1 true WO2020155496A1 (en) 2020-08-06

Family

ID=66985384

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/089609 WO2020155496A1 (en) 2019-01-31 2019-05-31 Public opinion tracking method and device for combined video-text data, and computer apparatus

Country Status (2)

Country Link
CN (1) CN109933709B (en)
WO (1) WO2020155496A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590914A (en) * 2021-06-23 2021-11-02 北京百度网讯科技有限公司 Information processing method, device, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837581B (en) * 2019-11-04 2023-05-23 云目未来科技(北京)有限公司 Method, device and storage medium for analyzing video public opinion
CN116737992B (en) * 2023-08-15 2023-10-13 明麦(南京)科技有限公司 Public opinion monitoring data processing method and processing system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186663A (en) * 2012-12-28 2013-07-03 中联竞成(北京)科技有限公司 Video-based online public opinion monitoring method and system
CN103455705A (en) * 2013-05-24 2013-12-18 中国科学院自动化研究所 Analysis and prediction system for cooperative correlative tracking and global situation of network social events
CN104915447A (en) * 2015-06-30 2015-09-16 北京奇艺世纪科技有限公司 Method and device for tracing hot topics and confirming keywords
US20160078458A1 (en) * 2013-05-24 2016-03-17 Zara A. Gold System of poll initiation and data collection through a global computer/communication network and methods thereof
CN107491499A (en) * 2017-07-27 2017-12-19 杭州中奥科技有限公司 A kind of public sentiment method for early warning based on unstructured data
CN107633084A (en) * 2017-09-28 2018-01-26 武汉虹旭信息技术有限责任公司 Based on the public sentiment managing and control system and its method from media

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923544B (en) * 2009-06-15 2012-08-08 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots
WO2015091893A1 (en) * 2013-12-19 2015-06-25 Koninklijke Philips N.V. System and method for topic-related detection of the emotional state of a person
CN105787049B (en) * 2016-02-26 2019-07-16 浙江大学 A kind of network video focus incident discovery method based on Multi-source Information Fusion analysis
CN107038178B (en) * 2016-08-03 2020-07-21 平安科技(深圳)有限公司 Public opinion analysis method and device
CN106529492A (en) * 2016-11-17 2017-03-22 天津大学 Video topic classification and description method based on multi-image fusion in view of network query
CN108959383A (en) * 2018-05-31 2018-12-07 平安科技(深圳)有限公司 Analysis method, device and the computer readable storage medium of network public-opinion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186663A (en) * 2012-12-28 2013-07-03 中联竞成(北京)科技有限公司 Video-based online public opinion monitoring method and system
CN103455705A (en) * 2013-05-24 2013-12-18 中国科学院自动化研究所 Analysis and prediction system for cooperative correlative tracking and global situation of network social events
US20160078458A1 (en) * 2013-05-24 2016-03-17 Zara A. Gold System of poll initiation and data collection through a global computer/communication network and methods thereof
CN104915447A (en) * 2015-06-30 2015-09-16 北京奇艺世纪科技有限公司 Method and device for tracing hot topics and confirming keywords
CN107491499A (en) * 2017-07-27 2017-12-19 杭州中奥科技有限公司 A kind of public sentiment method for early warning based on unstructured data
CN107633084A (en) * 2017-09-28 2018-01-26 武汉虹旭信息技术有限责任公司 Based on the public sentiment managing and control system and its method from media

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590914A (en) * 2021-06-23 2021-11-02 北京百度网讯科技有限公司 Information processing method, device, electronic equipment and storage medium
CN113590914B (en) * 2021-06-23 2024-02-20 北京百度网讯科技有限公司 Information processing method, apparatus, electronic device and storage medium

Also Published As

Publication number Publication date
CN109933709A (en) 2019-06-25
CN109933709B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
US11709901B2 (en) Personalized search filter and notification system
US10339161B2 (en) Expanding network relationships
US10521484B1 (en) Typeahead using messages of a messaging platform
US9171037B2 (en) Searching for associated events in log data
JP6538277B2 (en) Identify query patterns and related aggregate statistics among search queries
Guzman et al. On-line relevant anomaly detection in the Twitter stream: an efficient bursty keyword detection model
US10248715B2 (en) Media content recommendation method and apparatus
CN106557558B (en) Data analysis method and device
WO2020155496A1 (en) Public opinion tracking method and device for combined video-text data, and computer apparatus
CN110852095B (en) Statement hot spot extraction method and system
JP5952711B2 (en) Prediction server, program and method for predicting future number of comments in prediction target content
EP3622444A1 (en) Improved onboarding of entity data
CN111279333A (en) Language-based search of digital content in a network
US8838616B2 (en) Server device for creating list of general words to be excluded from search result
US10078686B2 (en) Combination filter for search query suggestions
Guo et al. Measuring media bias via masked language modeling
Santos et al. Voting for related entities
US20220292127A1 (en) Information management system
JP2011048524A (en) Problem or complaint data processing apparatus and method
CN116414968A (en) Information searching method, device, equipment, medium and product
CN109902099B (en) Public opinion tracking method and device based on graphic and text big data and computer equipment
WO2010060117A1 (en) Method and system for improving utilization of human searchers
JP7485029B2 (en) Information recommendation system, information search device, information recommendation method, and program
CN117891839B (en) Intelligent retrieval method and system
US11960522B2 (en) Information management system for database construction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913252

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19913252

Country of ref document: EP

Kind code of ref document: A1