CN116737992A - Public opinion monitoring data processing method and processing system - Google Patents

Public opinion monitoring data processing method and processing system Download PDF

Info

Publication number
CN116737992A
CN116737992A CN202311025431.9A CN202311025431A CN116737992A CN 116737992 A CN116737992 A CN 116737992A CN 202311025431 A CN202311025431 A CN 202311025431A CN 116737992 A CN116737992 A CN 116737992A
Authority
CN
China
Prior art keywords
comparison
frame
video
public opinion
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311025431.9A
Other languages
Chinese (zh)
Other versions
CN116737992B (en
Inventor
赵龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mingmai Nanjing Technology Co ltd
Original Assignee
Mingmai Nanjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mingmai Nanjing Technology Co ltd filed Critical Mingmai Nanjing Technology Co ltd
Priority to CN202311025431.9A priority Critical patent/CN116737992B/en
Publication of CN116737992A publication Critical patent/CN116737992A/en
Application granted granted Critical
Publication of CN116737992B publication Critical patent/CN116737992B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a public opinion monitoring data processing method and processing system, which are used for determining a screening frame set based on a push public opinion video, screening a plurality of screening frame sets according to a screening strategy to obtain a comparison frame set, and obtaining a plurality of selected frame sets in the push public opinion video according to a frame interval strategy, the comparison frame set and the push public opinion video; performing text comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to the text comparison strategy to obtain a comparison value, and obtaining a text comparison result of the push public opinion video according to the comparison value and a preset text comparison value; and performing image comparison on a plurality of comparison frames in the image comparison frame set and selected frames in each selected frame set according to the image comparison strategy to obtain two comparison values, obtaining an image comparison result of the push public opinion video according to the two comparison values and the preset image comparison value, and performing deletion judgment processing on the push public opinion video based on the text comparison result and/or the image comparison result.

Description

Public opinion monitoring data processing method and processing system
Technical Field
The present invention relates to data processing technologies, and in particular, to a public opinion monitoring data processing method and processing system.
Background
Along with the continuous development of informatization, the internet has become an important channel and carrier for people to transmit information, wherein video is taken as a convenient and effective information transmission mode and presents a state of high-speed development.
At present, people browse a large amount of public opinion video data when browsing information, and in the prior art, when pushing public opinion video data, the public opinion video data related to the public opinion video data are pushed by combining the preference and browsing habit of a user, so that the public opinion video data are likely to be pushed to a plurality of repeated public opinion videos for the user, and because the data amount of the public opinion videos is large, if the user repeatedly browses the same public opinion video, a large amount of time of the user can be wasted and the consumption of data such as flow is caused.
Therefore, how to combine the browsing record of the user to perform the de-duplication screening on the pushed public opinion videos, reduce the data processing amount during the screening of the public opinion videos, and improve the screening efficiency is a problem that needs to be solved nowadays.
Disclosure of Invention
The embodiment of the invention provides a public opinion monitoring data processing method and a processing system, which can be used for carrying out duplicate removal screening on a pushed public opinion video by combining a browsing record of a user, reduce the data processing amount during the screening of the public opinion video and improve the screening efficiency.
In a first aspect of the embodiment of the present invention, a method for processing public opinion monitoring data is provided, including:
obtaining a plurality of historical public opinion videos browsed by a first user side in a historical time period, obtaining a plurality of deduplication labels according to video labels of the historical public opinion videos, obtaining a deduplication frame set corresponding to the historical public opinion videos based on a frame number screening strategy, classifying the deduplication frame set consistent with the deduplication labels under the same deduplication label, and obtaining a deduplication database, wherein the deduplication frame set comprises a first frame, a tail frame and an intermediate frame;
extracting a push public opinion video corresponding to the first user side from a public opinion database, traversing a deduplication database corresponding to the first user side based on a push label of the push public opinion video, and acquiring a deduplication frame set, in which the deduplication label is consistent with the push label, as a screening frame set;
screening a plurality of screening frame sets according to the historical video duration of the historical public opinion videos corresponding to each screening frame set and the push video duration of the push public opinion video to obtain a comparison frame set, and obtaining a plurality of selected frame sets in the push public opinion videos according to a frame interval strategy, the comparison frame set and the push public opinion videos, wherein the selected frame sets comprise a selected first frame set, a selected tail frame set and a selected middle frame set;
Responding to the text comparison information, performing text comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to text comparison strategies to obtain an analog comparison value, and obtaining a text comparison result of the push public opinion video according to the analog comparison value and a preset text comparison value;
responding to the image comparison information, comparing the plurality of comparison frames in the comparison frame set with the selected frames in each selected frame set according to an image comparison strategy to obtain two analog comparison values, obtaining an image comparison result of the push public opinion video according to the two analog comparison values and a preset image comparison value, and deleting and judging the push public opinion video based on the text comparison result and/or the image comparison result.
Optionally, in one possible implementation manner of the first aspect, a plurality of historical public opinion videos browsed by the first user side in a historical time period are obtained, a plurality of duplication removal labels are obtained according to video labels of the historical public opinion videos, duplication removal frame sets corresponding to the historical public opinion videos are obtained based on a frame number screening policy, and duplication removal frame sets with identical duplication removal labels are classified under the same duplication removal label, so as to obtain a duplication removal database, including:
Counting public opinion browsing information of a user in a historical time period, acquiring a plurality of historical public opinion videos in the public opinion browsing information, and acquiring a plurality of duplication removal labels according to video labels of each historical public opinion video;
acquiring the historical video duration of each historical public opinion video, obtaining a frame number adjustment coefficient according to the ratio of the historical video duration to a preset video duration, and selecting a frame number on one side according to an upward rounding value of the product of a reference frame number and the frame number adjustment coefficient;
acquiring a video intermediate time corresponding to a corresponding historical public opinion video based on the historical video duration, and a first intermediate frame corresponding to the video intermediate time, and respectively selecting a video frame with a single side selected frame number forward and backward as a second intermediate frame by taking the first intermediate frame as a starting point;
and selecting a first frame, a tail frame, a first intermediate frame and a second intermediate frame corresponding to each historical public opinion video to generate a de-duplication frame set corresponding to each historical public opinion video, and classifying the de-duplication frame sets with consistent de-duplication labels under the same de-duplication label to obtain a de-duplication database.
Optionally, in one possible implementation manner of the first aspect, filtering the plurality of filtering frame sets according to a historical video duration of the historical public opinion video corresponding to each filtering frame set and a push video duration of the push public opinion video to obtain a comparison frame set, where the filtering includes:
Obtaining a video duration difference value corresponding to the push public opinion video and each screening frame set according to the push video duration of the push public opinion video and the historical video duration of the historical public opinion video corresponding to each screening frame set;
and acquiring a screening frame set of the video duration difference value in a preset duration difference value interval as a comparison frame set.
Optionally, in one possible implementation manner of the first aspect, the obtaining, according to a frame interval policy, the comparison frame set and the push public opinion video, a plurality of selected frame sets in the push public opinion video, where the selected frame sets include a selected first frame set, a selected last frame set and a selected middle frame set, includes:
obtaining comparison video time length corresponding to the comparison frame set, obtaining offset video time length according to the absolute value of the difference value between the comparison video time length and the push video time length, and obtaining a selected first frame number, a selected tail frame number and a selected single-side frame number according to the offset video time length;
acquiring a start frame corresponding to the start time of the push public opinion video, sequentially acquiring video frames of a selected first frame number backwards by taking the start frame as a starting point to generate a selected first frame set, acquiring an end frame corresponding to the end time of the push public opinion video, and sequentially acquiring video frames of a selected tail frame number forwards by taking the end frame as a starting point to generate a selected tail frame set;
And acquiring a push intermediate time corresponding to the push video frame based on the push video time length, acquiring a push intermediate frame corresponding to the push intermediate time, and respectively acquiring the video frames of the selected single-side frames forwards and backwards by taking the push intermediate frame as a starting point to generate a selected intermediate frame set.
Optionally, in one possible implementation manner of the first aspect, the obtaining the selected first frame number, the selected tail frame number and the selected single-side frame number according to the offset video duration includes:
if the offset video duration is equal to 0, acquiring a preset frame number as a first frame number and a last frame number, and summing one half of the preset frame number and the single-side selected frame number to obtain a single-side frame number;
if the offset video time length is greater than 0, obtaining an offset frame number according to an upward integral value of a product of a unit time length frame number and the offset video time length, taking the sum of the offset frame number and a preset frame number as a selected first frame number and a selected last frame number, obtaining a single-side offset frame number according to one half of the offset frame number, and summing the single-side selected frame number, one half of the preset frame number and the single-side offset frame number to obtain a selected single-side frame number.
Optionally, in a possible implementation manner of the first aspect, before responding to the text alignment information, the method further includes:
respectively obtaining a first comparison frame and a first frame set, a middle comparison frame and a middle frame set, and a tail comparison frame and a tail frame set as three comparison groups, and extracting a first extracted text in each comparison frame in each comparison group and a second extracted text in each frame set;
a non-comparison character table is called, and first extracted characters and second extracted characters existing in the non-comparison character table are removed, so that first comparison characters corresponding to each comparison frame and second comparison characters corresponding to each selected frame in each selected frame set are obtained;
if the number of characters of the first comparison characters is larger than 0 and the number of characters of the second comparison characters corresponding to the selected frame is larger than 0 in the selected frame set, generating character comparison information;
and if the number of characters of the first comparison characters is equal to 0 and/or the number of characters of the second comparison characters corresponding to the selected frame in the selected frame set is equal to 0, generating image comparison information.
Optionally, in one possible implementation manner of the first aspect, responding to the text comparison information, performing text comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to a text comparison policy to obtain an analog comparison value, and obtaining a text comparison result of the push public opinion video according to the analog comparison value and a preset text comparison value, where the text comparison result includes:
Responding to the text comparison information, acquiring a comparison frame in a comparison group for text comparison as a first comparison frame, and arranging each first comparison frame according to a time sequence by taking a selected frame set in the comparison group for text comparison as a first selected frame set to obtain a first comparison frame sequence;
sequentially acquiring first comparison frames in the first comparison frame sequence as first target frames, and counting the first comparison word number of the first target frames and the second comparison word number of each first selected frame in the first selected frame set;
obtaining a first selected frame with the same number of second comparison characters as the number of first comparison characters as a first screening frame, performing word segmentation on the first comparison characters of the first target frame to obtain a first comparison character sequence, and performing word segmentation on the second comparison characters of the first screening frame to obtain a second comparison character sequence;
sequentially performing one comparison on characters in the first comparison character sequence and the second comparison character sequence to obtain the same number of characters, and obtaining a class sub comparison value of the first target frame and each first screen sequence frame according to the ratio of the same number of characters to the first comparison character number to obtain a maximum class sub comparison value as a class comparison value corresponding to the first target frame;
Counting the class comparison values corresponding to all the first target frames, and taking the corresponding comparison group as a class similarity comparison group when all the class comparison values are larger than the preset character comparison value.
Optionally, in one possible implementation manner of the first aspect, in response to the image comparison information, performing image comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to an image comparison policy to obtain two analog comparison values, obtaining an image comparison result of the pushed public opinion video according to the two analog comparison values and a preset image comparison value, and performing deletion judgment processing on the pushed public opinion video based on the text comparison result and/or the image comparison result, where the deletion judgment processing includes:
responding to the image comparison information, acquiring comparison frames in a comparison group for image comparison as second comparison frames, and taking a selected frame set in the comparison group for image comparison as a second selected frame set, and arranging the second comparison frames according to a time sequence to obtain a second comparison frame sequence;
sequentially obtaining a second comparison frame in the second comparison frame sequence as a second target frame, obtaining a first brightness value of the second target frame and a second brightness value of each second selected frame in the second selected frame set, and obtaining a brightness difference value according to an absolute value of a difference value of the first brightness value and the second brightness value;
Obtaining a second selected frame with a brightness difference smaller than a preset brightness difference in the second selected frame set as a second screening frame, and carrying out region comparison on the second target frame and each second screening frame according to a region comparison strategy to obtain a second class comparison value corresponding to the second target frame and each second screening frame, and obtaining the maximum second class comparison value as a second class comparison value corresponding to the second target frame;
and counting the class II comparison values corresponding to all the second target frames, when all the class II comparison values are larger than the preset image comparison value, taking the corresponding comparison group as a class II similarity comparison group, and when all the comparison groups in the push public opinion video are class II similarity comparison groups and/or class II similarity comparison groups, deleting the push public opinion video.
Optionally, in one possible implementation manner of the first aspect, performing area comparison on the second target frame and each second screening frame according to an area comparison policy to obtain a second class comparison value corresponding to the second target frame and each second screening frame, and obtaining the maximum second class comparison value as the second class comparison value corresponding to the second target frame includes:
performing primary region division on the second target frame and each second screening frame according to a first direction to obtain an upper comparison region and a lower comparison region corresponding to the second target frame and each second screening frame;
Counting the screening frame quantity of all the second screening frames, obtaining quantity adjustment coefficients according to the ratio of the screening frame quantity to the preset screening frame quantity, and obtaining the comparison area quantity according to the upward rounding value of the product of the reference area quantity and the quantity adjustment coefficients;
performing secondary region division on the second target frame and the upper comparison region of each second screening frame according to the first direction based on the number of the comparison regions to obtain sub-comparison regions corresponding to the second target frame and each second screening frame;
sequentially selecting sub-comparison areas corresponding to the second target frames and the second screening frames according to a first direction, and comparing pixel values to obtain pixel similarity values of the corresponding sub-comparison areas in the second target frames and the second screening frames;
if the pixel similarity value is smaller than a preset pixel similarity value, deleting the corresponding second screening frame, repeating the deleting step until the sub comparison area is compared, and taking the rest second screening frames as third screening frames;
if the number of the third screening frames is 0, calling a preset two-class comparison value as the two-class comparison value;
if the number of the third screening frames is greater than 0, counting pixel similarity values corresponding to all sub-comparison areas in each third screening frame to obtain a total pixel similarity value, obtaining a class II sub-comparison value corresponding to the third screening frame according to an average value of the total pixel similarity values, and obtaining the maximum class II sub-comparison value as a class II comparison value corresponding to a second target frame.
In a second aspect of the embodiment of the present invention, there is provided a public opinion monitoring data processing system, including:
the database module is used for acquiring a plurality of historical public opinion videos browsed by a first user side in a historical time period, acquiring a plurality of deduplication labels according to video labels of the historical public opinion videos, acquiring a deduplication frame set corresponding to the historical public opinion videos based on a frame number screening strategy, classifying the deduplication frame set consistent with the deduplication labels under the same deduplication label, and acquiring a deduplication database, wherein the deduplication frame set comprises a first frame, a tail frame and an intermediate frame;
the aggregation module is used for extracting a push public opinion video corresponding to the first user side from a public opinion database, traversing a deduplication database corresponding to the first user side based on a push label of the push public opinion video, and acquiring a deduplication frame set, of which the deduplication label is consistent with the push label, as a screening frame set;
the comparison module is used for screening a plurality of screening frame sets according to the historical video duration of the historical public opinion videos corresponding to each screening frame set and the push video duration of the push public opinion videos to obtain a comparison frame set, and obtaining a plurality of selected frame sets in the push public opinion videos according to a frame interval strategy, the comparison frame sets and the push public opinion videos, wherein the selected frame sets comprise a selected first frame set, a selected tail frame set and a selected middle frame set;
The text module is used for responding to the text comparison information, comparing the text of a plurality of comparison frames in the comparison frame set with the text of the selected frame set according to the text comparison strategy to obtain an analog comparison value, and obtaining the text comparison result of the push public opinion video according to the analog comparison value and a preset text comparison value;
the image module is used for responding to the image comparison information, comparing the plurality of comparison frames in the comparison frame set with the selected frames in each selected frame set according to the image comparison strategy to obtain two analog comparison values, obtaining the image comparison result of the push public opinion video according to the two analog comparison values and the preset image comparison value, and deleting and judging the push public opinion video based on the text comparison result and/or the image comparison result.
The beneficial effects of the invention are as follows:
1. the invention can carry out de-duplication screening on the pushed public opinion videos by combining the browsing records of the user, reduces the data processing amount during the screening of the public opinion videos and improves the screening efficiency. According to the invention, a de-duplication database is generated according to the browsing record of a user, then a plurality of de-duplication labels in the de-duplication database are traversed through pushing labels of the push public opinion videos, a de-duplication frame set with the same de-duplication labels as the pushing labels is found and used as a screening frame set for subsequent screening, and when the de-duplication frame set is stored, the de-duplication frame set consisting of a first frame, a tail frame and an intermediate frame in the historical public opinion videos is stored instead of the whole video, so that the storage space in storage can be saved. Before comparison, the method and the device perform preliminary screening on the screening frame sets according to the time length, screen the screening frame sets with the time length which does not meet the conditions to obtain the comparison frame set, so that the data processing amount in the subsequent comparison can be reduced, the comparison efficiency is improved, then the selected first frame set, the selected last frame set and the selected intermediate frame set in the push public opinion video are respectively compared with the first frame, the last frame and the intermediate frame in the comparison frame set, and the accuracy in the comparison can be improved. When the comparison is carried out, the invention firstly judges whether the comparison group can compare through the text comparison strategy, if not, the corresponding comparison group can be compared according to the image comparison strategy, and because the processing capacity of the image is generally larger than the processing capacity of the text, the comparison efficiency can be improved while the comparison accuracy is ensured, and when all the comparison groups in the push public opinion videos are comparison groups with similar text content and/or comparison groups with similar image content, the invention can delete the corresponding push public opinion videos, thereby reducing the push of repeated videos.
2. When the corresponding comparison group is compared through the text comparison method, the first selected frames inconsistent with the text quantity of the first target frames are screened out, and the next screening is continued through the rest first screening frames, so that part of the first screening frames can be screened out through the text quantity, the number of the screening frames in the text comparison is reduced, the data processing quantity in the text comparison can be reduced, and the efficiency in the text comparison is improved. And then, carrying out first comparison on characters in a first comparison character sequence in the first target frame and characters in a second comparison character sequence in the first screening frame to obtain a class of comparison values corresponding to the first target frame, and when all the class of comparison values are larger than a preset character comparison value, taking the corresponding comparison group as a class of similarity comparison group, namely a comparison group with similar character content, so that whether video frames in the corresponding comparison group are consistent or not can be judged in a character comparison mode, and the data processing amount during comparison can be reduced.
2. When the corresponding comparison group is compared through the image comparison method, the second selected frame inconsistent with the brightness value of the second target frame is screened out first, and the next screening is continued through the rest second screening frames, so that part of the second screening frames can be screened out through the brightness value, the number of the screening frames in the image comparison is reduced, the data processing amount in the image comparison can be reduced, and the efficiency in the image comparison is improved. The invention divides the second target frame and the second screening frame into an upper comparison area and a lower comparison area, obtains a second class comparison value corresponding to the second target frame by comparing the upper comparison area with the image, takes the comparison group with all the second class comparison value larger than the preset image comparison value as the second class comparison group, namely the comparison group with similar image content, can reduce the interference caused by captions in the lower comparison area when the images are compared, improves the accuracy when the images are compared, and further refines the upper comparison area of the second target frame and the second screening frame into a plurality of subareas when the upper comparison area of the second target frame and the second screening frame is compared sequentially, screens out the second screening frame which does not meet the condition of the similarity value when each comparison is carried out, and can reduce the data processing amount when the comparison is carried out through the rest second screening frame to continue the next subarea comparison, thereby improving the efficiency when the comparison is carried out.
Drawings
Fig. 1 is a schematic flow chart of a public opinion monitoring data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a public opinion monitoring data processing system according to an embodiment of the present application;
fig. 3 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Referring to fig. 1, a flow chart of a public opinion monitoring data processing method according to an embodiment of the present application is shown, and an execution subject of the method shown in fig. 1 may be a software and/or hardware device. The execution body of the present application may include, but is not limited to, at least one of: user equipment, network equipment, etc. The user equipment may include, but is not limited to, computers, smart phones, personal digital assistants (Personal Digital Assistant, abbreviated as PDA), and the above-mentioned electronic devices. The network device may include, but is not limited to, a single network server, a server group of multiple network servers, or a cloud of a large number of computers or network servers based on cloud computing, where cloud computing is one of distributed computing, and a super virtual computer consisting of a group of loosely coupled computers. This embodiment is not limited thereto. The method comprises the steps S1 to S5, and specifically comprises the following steps:
S1, obtaining a plurality of historical public opinion videos browsed by a first user side in a historical time period, obtaining a plurality of deduplication labels according to video labels of the historical public opinion videos, obtaining a deduplication frame set corresponding to the historical public opinion videos based on a frame number screening strategy, classifying the deduplication frame sets consistent with the deduplication labels under the same deduplication label, and obtaining a deduplication database, wherein the deduplication frame set comprises a first frame, a tail frame and an intermediate frame.
In practical application, a user may include a large amount of video data when browsing information, and in order to reduce the pushing of repeated videos when pushing the video to the user next time, the scheme may first generate a corresponding deduplication database according to historical public opinion videos browsed by the user in a historical time period, and then perform deduplication screening on the pushed public opinion videos through the deduplication database.
Specifically, the scheme can firstly obtain the de-duplication labels corresponding to each historical public opinion video, and then store the de-duplication frame sets corresponding to the historical public opinion videos with the same de-duplication labels together, so as to generate a de-duplication database.
It is worth mentioning that when the duplicate frame removal set corresponding to each historical public opinion video is stored, the scheme also stores the historical video duration corresponding to each historical public opinion video, and binds the corresponding historical public opinion videos with the historical video duration.
The specific implementation manner of step S1 based on the above embodiment may be:
and S11, counting public opinion browsing information of a user in a historical time period, acquiring a plurality of historical public opinion videos in the public opinion browsing information, and acquiring a plurality of duplication removal labels according to video labels of the historical public opinion videos.
It should be noted that the server in this embodiment matches a video tag for each historical public opinion video.
S12, acquiring the historical video duration of each historical public opinion video, obtaining a frame number adjustment coefficient according to the ratio of the historical video duration to the preset video duration, and obtaining a single-side selected frame number according to the upward integral value of the product of the reference frame number and the frame number adjustment coefficient.
It can be appreciated that, since a video generally includes a plurality of frames of video frames, if each frame of video frame in the video is compared during de-duplication comparison, the data processing amount during de-duplication screening may be increased, so in order to reduce the data processing amount during de-duplication screening, the scheme may select the first frame, the last frame and the middle frame to generate a de-duplication frame set, and then de-duplication screening is performed on the pushed public opinion video through the de-duplication frame set.
In order to enable data during screening to be more accurate, the scheme can obtain a single-side selection frame number of the intermediate frame according to the historical video duration of the historical public opinion video, wherein the single-side selection frame number refers to the number of the single-side selection video frames based on the video frames corresponding to the intermediate moment.
It can be understood that the longer the historical video duration of the historical public opinion video is, the more the number of video frames corresponding to the historical public opinion video is, so that the more the number of the historical public opinion video can be set when the intermediate frames are selected, the more detailed the data during comparison can be, and the accuracy during duplicate removal screening is improved.
And S13, acquiring a video intermediate time corresponding to the corresponding historical public opinion video based on the historical video time length, and a first intermediate frame corresponding to the video intermediate time, and respectively selecting a video frame with a single side selected frame number forward and backward as a second intermediate frame by taking the first intermediate frame as a starting point.
For example, if the number of frames selected on one side is 3, the video frame corresponding to the middle time of the video may be used as the first video frame, and then the video frames of the front 3 frames and the rear 3 frames of the first video frame may be used as the second video frame. The plurality of video frames corresponding to the historical public opinion videos are arranged according to a time sequence.
S14, selecting a first frame, a tail frame, a first intermediate frame and a second intermediate frame corresponding to each historical public opinion video to generate a de-duplication frame set corresponding to each historical public opinion video, and classifying the de-duplication frame sets with consistent de-duplication labels under the same de-duplication label to obtain a de-duplication database.
Compared with the storage of the whole video, the deduplication database obtained by the method can reduce the data storage amount and save the storage space.
S2, extracting a push public opinion video corresponding to the first user side from a public opinion database, traversing a deduplication database corresponding to the first user side based on a push label of the push public opinion video, and acquiring a deduplication frame set, of which the deduplication label is consistent with the push label, as a screening frame set.
It can be understood that if the push label of the push public opinion video is consistent with the deduplication label in the deduplication database, it is explained that the push public opinion video is likely to be repeated with the historical public opinion video corresponding to the deduplication frame set in the deduplication database, so that the corresponding deduplication frame set can be used as a screening frame set, and the push public opinion video can be screened correspondingly in the follow-up process through the screening frame set.
It should be noted that if the deduplication database does not have a deduplication tag consistent with the push tag, the corresponding push tag may be added to the deduplication database, and a deduplication frame set corresponding to the push public opinion video may be generated and stored in the deduplication database in the same manner as the deduplication frame set is generated in S1.
S3, screening a plurality of screening frame sets according to historical video duration of the historical public opinion videos corresponding to each screening frame set and push video duration of the push public opinion videos to obtain a comparison frame set, and obtaining a plurality of selected frame sets in the push public opinion videos according to a frame interval strategy, the comparison frame set and the push public opinion videos, wherein the selected frame sets comprise a selected first frame set, a selected tail frame set and a selected middle frame set.
It can be understood that if the difference between the historical video duration of the historical public opinion video and the push video duration corresponding to the push public opinion video is too large, it is explained that the two videos are quite possibly not identical, so in this case, a plurality of screening frame sets can be screened through the historical video duration and the push video duration, a comparison frame set with a small difference between the duration and the push public opinion video can be screened, and then the video frames in the comparison frame set are compared with the video frames in the push public opinion video to judge whether the video frames are repeated or not.
Because the comparison frame set comprises the first frame, the tail frame and the middle frame, when the push public opinion video is compared with the comparison frame set, a plurality of selected frame sets corresponding to the push public opinion video are obtained, then the first frame set is compared with the first frame in the comparison frame set, the tail frame set is compared with the tail frame in the comparison frame set, the middle frame set is compared with the middle frame in the comparison frame set, and the push public opinion video is subjected to de-duplication screening through the comparison result.
Based on the above embodiment, the specific implementation manner of "filtering the plurality of screening frame sets according to the historical video duration of the historical public opinion video corresponding to each screening frame set and the push video duration of the push public opinion video to obtain the comparison frame set" in step S3 may be:
s31, obtaining a video duration difference value corresponding to the push public opinion video and each screening frame set according to the push video duration of the push public opinion video and the historical video duration of the historical public opinion video corresponding to each screening frame set.
S32, acquiring a screening frame set of the video duration difference value in a preset duration difference value interval as a comparison frame set.
In practical application, the preset duration difference interval may be set in advance by a worker, and it may be understood that if the video duration difference is within the preset duration difference interval, it is indicated that the duration corresponding to the historical public opinion video and the push public opinion video corresponding to the corresponding screening frame set is about the same, and the two may be repeated, so that the corresponding screening frame set may be used as the comparison frame set to continue the subsequent duplicate removal screening.
By the method, a part of screening frame sets with duration not meeting the conditions can be screened out, the data processing amount in the subsequent processing is reduced, and the processing efficiency is improved.
Based on the above embodiment, the specific implementation manner of "obtaining, according to the frame interval policy, the comparison frame set and the push public opinion video, a plurality of selected frame sets in the push public opinion video in step S3, where the selected frame sets include a selected first frame set, a selected last frame set and a selected middle frame set" may be:
s33, obtaining comparison video time length corresponding to the comparison frame set, obtaining offset video time length according to the absolute value of the difference value of the comparison video time length and the push video time length, and obtaining the selected first frame number, the selected tail frame number and the selected single-side frame number according to the offset video time length.
It can be understood that, in order to improve accuracy in comparison of the whole video segment, multiple frame video frames located at the beginning position, the middle position and the end position in the pushed public opinion video can be compared with the first frame, the middle frame and the end frame in the comparison frame set respectively, and because offset video duration may exist between the pushed public opinion video and the selected historical public opinion video, the selected first frame number, the selected end frame number and the selected single side frame number can be obtained according to the offset video duration, and then multiple frame video frames located at the beginning position, the middle position and the end position in the pushed public opinion video can be obtained through the obtained frame numbers respectively for subsequent comparison.
In some embodiments, the above-described selected head frame number, selected tail frame number, and selected single-side frame number may be obtained by:
s331, if the offset video duration is equal to 0, acquiring a preset frame number as a selected first frame number and a selected last frame number, and summing one half of the preset frame number and the single-side selected frame number to obtain a selected single-side frame number.
It will be appreciated that if the offset video duration is equal to 0, it is indicated that the video duration between the push public opinion video and the selected historical public opinion video is the same, in this case, the preset frame number may be directly used as the selected first frame number and the selected last frame number, and since the single-side selected frame number is the intermediate frame selected frame number on one side, the single-side frame number may be obtained by summing the half of the preset frame number and the single-side selected frame number when the single-side frame number is selected.
It should be noted that if one half of the preset frame number is not an integer, the integer is taken upwards and summed with the single-side selected frame number.
And S332, if the offset video time length is greater than 0, obtaining an offset frame number according to an upward integral value of the product of the unit time length frame number and the offset video time length, taking the sum of the offset frame number and the preset frame number as a selected first frame number and a selected last frame number, obtaining a single-side offset frame number according to one half of the offset frame number, and summing the single-side selected frame number, one half of the preset frame number and the single-side offset frame number to obtain a selected single-side frame number.
It can be understood that if the offset video time length is greater than 0, it is indicated that the video time length between the push public opinion video and the selected historical public opinion video is biased, in this case, in order to reduce the error in comparison, the offset frame number may be obtained according to the unit time length frame number and the offset video time length, then the sum of the offset frame number and the preset frame number is used as the selected first frame number and the selected last frame number, and similarly, when the selected single-side frame number is calculated, the offset frame number needs to be halved and then added.
S34, acquiring a start frame corresponding to the start time of the push public opinion video, sequentially acquiring video frames of the selected first frame number backwards by taking the start frame as a starting point to generate a selected first frame set, acquiring an end frame corresponding to the end time of the push public opinion video, and sequentially acquiring video frames of the selected last frame number forwards by taking the end frame as a starting point to generate a selected last frame set.
In practical application, a plurality of video frames corresponding to the push public opinion videos need to be arranged according to a time sequence, and when a selected first frame set is acquired, the video frames of the selected first frame number can be selected backwards from the first video frame at the beginning position to generate the selected first frame set.
When selecting the selected tail frame set, the video frame of the selected tail frame number can be selected forward from the last video frame at the tail position to generate the selected tail frame set.
S35, acquiring a push middle time corresponding to the push video frame based on the push video time length, acquiring a push middle frame corresponding to the push middle time, and respectively acquiring video frames of a single-side frame number in the forward and backward directions by taking the push middle frame as a starting point to generate a selected middle frame set.
When the selected intermediate frame set is selected, a push intermediate frame corresponding to the push intermediate time can be obtained first, and then the video frames of the selected single-side frame number are selected forwards and backwards respectively by taking the push intermediate frame as a starting point to generate the selected intermediate frame set.
By the method, a plurality of selected frame sets for comparison in the push public opinion videos can be obtained by combining the offset video duration, and accuracy in subsequent screening and comparison is improved.
And S4, responding to the text comparison information, performing text comparison on a plurality of comparison frames in the comparison frame set and the selected frames in each selected frame set according to the text comparison strategy to obtain an analog comparison value, and obtaining the text comparison result of the push public opinion video according to the analog comparison value and the preset text comparison value.
Before the response text comparison information in S4, the scheme further comprises the following steps:
a1, respectively obtaining a first comparison frame and a first frame set, a middle comparison frame and a middle frame set, and a tail comparison frame and a tail frame set as three comparison groups, extracting a first extracted text in each comparison frame in each comparison group, and extracting a second extracted text in each selected frame in each frame set.
When screening and comparing, the scheme can respectively compare the first frame with the selected first frame set, the middle frame with the selected middle frame set and the tail frame with the selected tail frame set as three groups of comparison groups, and it can be understood that the processing amount of the characters is generally smaller than that of the images, so that when comparing, the scheme can firstly extract the characters in each video frame in the comparison groups to carry out preliminary comparison and screening, thereby reducing the processing amount when comparing.
In practical application, when extracting the text in each video frame in the comparison group, the text extraction technology in the prior art can be adopted to extract the text, for example, the text extraction can be performed through the OCR technology, which is the prior art, and the scheme is not described here in detail.
And A2, calling a non-comparison character table, and removing the first extracted characters and the second extracted characters in the non-comparison character table to obtain first comparison characters corresponding to each comparison frame and second comparison characters corresponding to each selected frame in each selected frame set.
It can be understood that when the text comparison is performed, some interfered text may occur, for example, the interfered text with high repeatability, such as "haha", and the like, and no comparison meaning is provided, so before the text comparison is performed, the interfered text in the first extracted text and the second extracted text can be removed through the non-comparison text table, and the subsequent comparison is performed through the first comparison text and the second comparison text after the interference removal. The non-comparison text table can be set in advance by a worker.
A3, if the number of characters of the first comparison characters is larger than 0 and the number of characters of the second comparison characters corresponding to the selected frame is larger than 0 in the selected frame set, generating character comparison information.
It can be understood that if the number of characters of the first comparison character is greater than 0 and the number of characters of the second comparison character corresponding to the selected frame is greater than 0 in the selected frame set, the comparison frame and the selected frame in the selected frame set are both characters which can be compared, so that character comparison information can be generated, and the characters can be compared through a character comparison strategy.
And A4, if the number of characters of the first comparison characters is equal to 0 and/or the number of characters of the second comparison characters corresponding to the selected frame in the selected frame set is equal to 0, generating image comparison information.
It can be understood that if the number of characters of the first comparison character is equal to 0 and/or the number of characters of the second comparison character corresponding to the selected frame in the selected frame set is equal to 0, it is explained that the comparison frame and the selected frame in the selected frame set do not both have characters which can be compared, so that image comparison information can be generated, and the image comparison information can be compared through an image comparison strategy.
Through the method, the comparison method of each comparison group can be obtained, and the corresponding comparison group is compared in the mode that the text comparison is preferentially selected when the text comparison can be carried out, so that the data processing amount during the comparison can be reduced.
In some embodiments, the specific implementation manner of step S4 may be:
s41, responding to the text comparison information, acquiring comparison frames in a comparison group for text comparison as first comparison frames, and acquiring a selected frame set in the comparison group for text comparison as a first selected frame set, and arranging the first comparison frames according to a time sequence to obtain a first comparison frame sequence.
It will be appreciated that there may be a plurality of first comparison frames, for example, when the first comparison frame is an intermediate frame, and thus, in order to compare the first comparison frames one by one, the first comparison frames may be arranged in time sequence to obtain a first comparison frame sequence, and then each first comparison frame is compared with a selected frame in the first selected frame set by the first comparison frame sequence.
S42, sequentially obtaining first comparison frames in the first comparison frame sequence as first target frames, counting the first comparison word numbers of the first target frames, and counting the second comparison word numbers of all first selected frames in the first selected frame set.
The first comparison word number refers to the total number of the first comparison words in the first target frame, and the second comparison word number refers to the total number of the second comparison words in the first selected frame.
S43, obtaining a first selected frame with the same number of second comparison characters as the first screening frame, performing word segmentation on the first comparison characters of the first target frame to obtain a first comparison character sequence, and performing word segmentation on the second comparison characters of the first screening frame to obtain a second comparison character sequence.
It will be appreciated that when the number of second comparison words is the same as the number of first comparison words, the words corresponding to the first selected frame and the first target frame are likely to be the same, so that the corresponding first selected frame may be further compared similarly with the first target frame as the first screening frame.
When the comparison is performed, in order to perform one comparison on a plurality of characters in a first screening frame and a first target frame, the scheme can perform word division processing on the first comparison characters and the second comparison characters to obtain a first comparison character sequence and a second comparison character sequence.
The characters at the same position in the first comparison character sequence and the second comparison character sequence are mutually corresponding to each other so as to perform one comparison.
S44, sequentially comparing the characters in the first comparison character sequence and the second comparison character sequence to obtain the same character quantity, and obtaining a class sub comparison value of the first target frame and each first screening frame according to the ratio of the same character quantity to the first comparison character quantity to obtain the largest class sub comparison value as the class comparison value corresponding to the first target frame.
It can be understood that the more the same number of words, the more the same words in the first target frame and the first frame of the first order are, the more the likelihood that the contents of the words in the first target frame and the first frame of the first order are the same is also greater, so that the sub-comparison values of the first target frame and each first frame of the first order can be obtained by the ratio of the same number of words to the first comparison number of words. The first comparison text quantity is the same as the second comparison text quantity, so that a sub comparison value of the first target frame and each first screening frame can be obtained by the ratio of the same text quantity to the second comparison text quantity.
Since there may be a plurality of first filter frames, the largest sub-comparison value may be used as the comparison value corresponding to the first target frame.
S45, counting the class comparison values corresponding to all the first target frames, and taking the corresponding comparison group as a class similarity comparison group when all the class comparison values are larger than the preset character comparison value.
It can be understood that if the comparison values corresponding to all the first target frames are greater than the preset text comparison values, it is indicated that the text contents in all the first target frames and the corresponding first screening frames are likely to be the same, so that the corresponding comparison group can be used as a similar comparison group, i.e. a comparison group with similar text contents.
Through the mode, the plurality of video frames in the comparison group can be compared in a text way, so that the data processing amount during comparison can be reduced, and the comparison efficiency is improved.
And S5, responding to the image comparison information, performing image comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to an image comparison strategy to obtain two types of comparison values, obtaining an image comparison result of the push public opinion video according to the two types of comparison values and a preset image comparison value, and performing deletion judgment processing on the push public opinion video based on the text comparison result and/or the image comparison result.
Specifically, the specific implementation manner of step S5 may be:
s51, responding to the image comparison information, acquiring comparison frames in the comparison group for image comparison as second comparison frames, and taking a selected frame set in the comparison group for image comparison as a second selected frame set, and arranging the second comparison frames according to the time sequence to obtain a second comparison frame sequence.
It will be appreciated that there may be a plurality of second comparison frames, so that when the second comparison frames are compared, they may be sorted according to time sequence to obtain a second comparison frame sequence, and then each second comparison frame and a selected frame in the second selected set are compared through the second comparison frame sequence.
S52, sequentially obtaining second comparison frames in the second comparison frame sequence as second target frames, obtaining first brightness values of the second target frames, and second brightness values of second selected frames in the second selected frame set, and obtaining brightness difference values according to absolute values of difference values of the first brightness values and the second brightness values.
It can be understood that the number of the second selected frames may be quite large, so in order to reduce the data processing amount during the subsequent image comparison, the second selected frames may be initially screened by the brightness values, a part of the second selected frames with the brightness values not meeting the requirements are screened, and then the remaining second selected frames are compared with the second target frames.
It will also be appreciated that if the luminance values of the second target frame and the second selected frame are not the same, they are likely not the same, so that the second selected frame may be first screened by the luminance values.
And S53, obtaining a second selected frame with the brightness difference smaller than the preset brightness difference in the second selected frame set as a second screening frame, and carrying out area comparison on the second target frame and each second screening frame according to an area comparison strategy to obtain a class-II comparison value corresponding to the second target frame and each second screening frame, and obtaining the maximum class-II comparison value as a class-II comparison value corresponding to the second target frame.
It will be appreciated that if the luminance difference value corresponding to the second selected frame is smaller than the preset luminance difference value, the second selected frame may be identical to the second target frame, so that the corresponding second selected frame may be used as the second screening frame to continue further comparison.
When the second screening frame and the second target frame are further compared, the second screening frame and the second target frame are divided into a plurality of areas for comparison, specifically, in some embodiments, the second target frame and each second screening frame are compared with each other according to an area comparison strategy by the following steps to obtain second class comparison values corresponding to the second target frame and each second screening frame:
And S531, performing primary region division on the second target frame and each second screening frame according to the first direction to obtain an upper comparison region and a lower comparison region corresponding to the second target frame and each second screening frame.
The first direction may be a top-to-bottom direction, and when the second target frame and each second screening frame are divided into the first region, the second target frame and each second screening frame may be equally divided from top to bottom, so as to obtain an upper comparison region and a lower comparison region corresponding to the second target frame and each second screening frame.
It will be appreciated that the second target frame and each second screening frame are divided into an upper comparison area and a lower comparison area, because subtitles may exist in the second target frame or the second screening frame during comparison, which affects the comparison result, and subtitles are typically added in the lower half of the image, so that in order to improve the accuracy during comparison, the scheme may subsequently perform comparison according to the upper comparison area in the second target frame and each second screening frame.
S532, counting the screening frame quantity of all the second screening frames, obtaining quantity adjustment coefficients according to the ratio of the screening frame quantity to the preset screening frame quantity, and obtaining the comparison area quantity according to the upward rounding value of the product of the reference area quantity and the quantity adjustment coefficients.
It can be understood that, because there may be multiple second screening frames, when performing region comparison, the present solution may obtain the comparison region number according to the screening frame number, so that region division may be performed on the second target frame and the upper comparison region in each second screening frame according to the comparison region number, and then the divided regions are compared one by one, and the second screening frames that do not meet the similarity condition are screened out in sequence in successive comparison, so as to reduce the data processing amount during screening comparison.
And S533, performing secondary region division on the second target frame and the upper comparison region of each second screening frame according to the first direction based on the number of the comparison regions, and obtaining sub-comparison regions corresponding to the second target frame and each second screening frame.
It can be understood that the more the number of screening frames, the more the number of comparison regions, the more the corresponding second target frames and sub-comparison regions corresponding to the second screening frames, and the smaller the region area of each sub-comparison region, so that the data processing amount in each subsequent screening comparison is correspondingly smaller.
And S534, sequentially selecting sub-comparison areas corresponding to the second target frame and the second screening frames according to a first direction, and comparing pixel values to obtain pixel similarity values of the sub-comparison areas corresponding to the second target frame and the second screening frames.
In practical application, when obtaining the pixel similarity values of the corresponding sub-comparison areas in the second target frame and each second screening frame, the second target frame and each second screening frame may be subjected to coordinate processing, where the coordinate origins of the second target frame and each second screening frame are consistent, then, whether the pixel values of the pixel points with the same coordinates are the same is judged, if the pixel values of the pixel points with the same coordinates are the same, the pixel points with the same coordinates may be used as the same pixel points, then, the number of the same pixel points and the total number of the pixel points in each sub-comparison area are counted, and the pixel similarity values of the corresponding sub-comparison areas in the second target frame and each second screening frame are obtained through the ratio of the number of the same pixel points and the total number of the pixel points.
And S535, deleting the corresponding second screening frame if the pixel similarity value is smaller than a preset pixel similarity value, repeating the deleting step until the sub-comparison area is compared, and taking the rest second screening frames as third screening frames.
It will be appreciated that if the pixel similarity value is smaller than the preset pixel similarity value, it is indicated that the similarity of the corresponding sub-regions in the second screening frame and the second target frame is smaller than the preset similarity, and the two sub-regions are likely to be different from each other, so that the corresponding second screening frame can be screened out, and the comparison of the next sub-region can be continued through the remaining second screening frame.
S536, if the number of the third screening frames is 0, a preset two-class comparison value is called as the two-class comparison value.
If the number of the third screening frames is 0, which indicates that there are no second screening frames with the similarity of all the subareas being greater than or equal to the preset similarity, the preset two analog comparison values can be called as the two analog comparison values, and in practical application, the preset two analog comparison values can be comparison values when the similarity preset by the staff does not meet the similarity condition.
And S537, if the number of the third screening frames is greater than 0, counting the pixel similarity values corresponding to all sub-comparison areas in each third screening frame to obtain a total pixel similarity value, and obtaining a class II sub-comparison value corresponding to the third screening frame according to the average value of the total pixel similarity values to obtain a maximum class II sub-comparison value as a class II comparison value corresponding to a second target frame.
If the number of the third screening frames is greater than 0, the fact that the second screening frames with the similarity of all the subareas being greater than or equal to the preset similarity exists is indicated, in this case, an average pixel similarity value corresponding to each third screening frame can be calculated to serve as a second class comparison value, and then the maximum second class comparison value is taken as the second class comparison value corresponding to the second target frame.
And S54, counting the class II comparison values corresponding to all the second target frames, when all the class II comparison values are larger than the preset image comparison value, taking the corresponding comparison group as a class II similarity comparison group, and when all the comparison groups in the push public opinion videos are class II similarity comparison groups and/or class II similarity comparison groups, deleting the push public opinion videos.
It can be understood that if the second comparison values corresponding to all the second target frames are greater than the preset image comparison values, it is indicated that the image contents of all the second target frames and the corresponding second screening frames are likely to be identical, so that the corresponding comparison group can be used as the second similarity comparison group, i.e. the comparison group with similar image contents.
It can also be understood that if all the comparison groups in the push public opinion videos are one type of similarity comparison groups and/or two types of similarity comparison groups, it is explained that the push public opinion videos and the historical public opinion videos corresponding to the corresponding comparison frame sets are likely to be repeated, so that the push public opinion videos can be deleted.
By the method, the data processing amount during image comparison can be reduced, the screening efficiency is improved, the pushed public opinion videos repeated with the historical public opinion videos can be deleted, and the pushing of the repeated videos is reduced.
Referring to fig. 2, a schematic structural diagram of a public opinion monitoring data processing system according to an embodiment of the present invention includes:
the database module is used for acquiring a plurality of historical public opinion videos browsed by a first user side in a historical time period, acquiring a plurality of deduplication labels according to video labels of the historical public opinion videos, acquiring a deduplication frame set corresponding to the historical public opinion videos based on a frame number screening strategy, classifying the deduplication frame set consistent with the deduplication labels under the same deduplication label, and acquiring a deduplication database, wherein the deduplication frame set comprises a first frame, a tail frame and an intermediate frame;
the aggregation module is used for extracting a push public opinion video corresponding to the first user side from a public opinion database, traversing a deduplication database corresponding to the first user side based on a push label of the push public opinion video, and acquiring a deduplication frame set, of which the deduplication label is consistent with the push label, as a screening frame set;
the comparison module is used for screening a plurality of screening frame sets according to the historical video duration of the historical public opinion videos corresponding to each screening frame set and the push video duration of the push public opinion videos to obtain a comparison frame set, and obtaining a plurality of selected frame sets in the push public opinion videos according to a frame interval strategy, the comparison frame sets and the push public opinion videos, wherein the selected frame sets comprise a selected first frame set, a selected tail frame set and a selected middle frame set;
The text module is used for responding to the text comparison information, comparing the text of a plurality of comparison frames in the comparison frame set with the text of the selected frame set according to the text comparison strategy to obtain an analog comparison value, and obtaining the text comparison result of the push public opinion video according to the analog comparison value and a preset text comparison value;
the image module is used for responding to the image comparison information, comparing the plurality of comparison frames in the comparison frame set with the selected frames in each selected frame set according to the image comparison strategy to obtain two analog comparison values, obtaining the image comparison result of the push public opinion video according to the two analog comparison values and the preset image comparison value, and deleting and judging the push public opinion video based on the text comparison result and/or the image comparison result.
The apparatus of the embodiment shown in fig. 2 may be correspondingly used to perform the steps in the embodiment of the method shown in fig. 1, and the implementation principle and technical effects are similar, and are not repeated here.
Referring to fig. 3, a schematic hardware structure of an electronic device according to an embodiment of the present invention is shown, where the electronic device 30 includes: a processor 31, a memory 32 and a computer program; wherein the method comprises the steps of
A memory 32 for storing said computer program, which memory may also be a flash memory (flash). Such as application programs, functional modules, etc. implementing the methods described above.
A processor 31 for executing the computer program stored in the memory to implement the steps executed by the apparatus in the above method. Reference may be made in particular to the description of the embodiments of the method described above.
Alternatively, the memory 32 may be separate or integrated with the processor 31.
When the memory 32 is a device separate from the processor 31, the apparatus may further include:
a bus 33 for connecting the memory 32 and the processor 31.
The present invention also provides a readable storage medium having stored therein a computer program for implementing the methods provided by the various embodiments described above when executed by a processor.
The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media can be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). In addition, the ASIC may reside in a user device. The processor and the readable storage medium may reside as discrete components in a communication device. The readable storage medium may be read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tape, floppy disk, optical data storage device, etc.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, the execution instructions being executed by the at least one processor to cause the device to implement the methods provided by the various embodiments described above.
In the above embodiment of the apparatus, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. A method for processing public opinion monitoring data, comprising:
obtaining a plurality of historical public opinion videos browsed by a first user side in a historical time period, obtaining a plurality of deduplication labels according to video labels of the historical public opinion videos, obtaining a deduplication frame set corresponding to the historical public opinion videos based on a frame number screening strategy, classifying the deduplication frame set consistent with the deduplication labels under the same deduplication label, and obtaining a deduplication database, wherein the deduplication frame set comprises a first frame, a tail frame and an intermediate frame;
extracting a push public opinion video corresponding to the first user side from a public opinion database, traversing a deduplication database corresponding to the first user side based on a push label of the push public opinion video, and acquiring a deduplication frame set, in which the deduplication label is consistent with the push label, as a screening frame set;
screening a plurality of screening frame sets according to the historical video duration of the historical public opinion videos corresponding to each screening frame set and the push video duration of the push public opinion video to obtain a comparison frame set, and obtaining a plurality of selected frame sets in the push public opinion videos according to a frame interval strategy, the comparison frame set and the push public opinion videos, wherein the selected frame sets comprise a selected first frame set, a selected tail frame set and a selected middle frame set;
Responding to the text comparison information, performing text comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to text comparison strategies to obtain an analog comparison value, and obtaining a text comparison result of the push public opinion video according to the analog comparison value and a preset text comparison value;
responding to the image comparison information, comparing the plurality of comparison frames in the comparison frame set with the selected frames in each selected frame set according to an image comparison strategy to obtain two analog comparison values, obtaining an image comparison result of the push public opinion video according to the two analog comparison values and a preset image comparison value, and deleting and judging the push public opinion video based on the text comparison result and/or the image comparison result.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
obtaining a plurality of historical public opinion videos browsed by a first user side in a historical time period, obtaining a plurality of deduplication labels according to video labels of the historical public opinion videos, obtaining a deduplication frame set corresponding to the historical public opinion videos based on a frame number screening strategy, classifying the deduplication frame set with consistent deduplication labels under the same deduplication label, and obtaining a deduplication database, wherein the method comprises the following steps:
Counting public opinion browsing information of a user in a historical time period, acquiring a plurality of historical public opinion videos in the public opinion browsing information, and acquiring a plurality of duplication removal labels according to video labels of each historical public opinion video;
acquiring the historical video duration of each historical public opinion video, obtaining a frame number adjustment coefficient according to the ratio of the historical video duration to a preset video duration, and selecting a frame number on one side according to an upward rounding value of the product of a reference frame number and the frame number adjustment coefficient;
acquiring a video intermediate time corresponding to a corresponding historical public opinion video based on the historical video duration, and a first intermediate frame corresponding to the video intermediate time, and respectively selecting a video frame with a single side selected frame number forward and backward as a second intermediate frame by taking the first intermediate frame as a starting point;
and selecting a first frame, a tail frame, a first intermediate frame and a second intermediate frame corresponding to each historical public opinion video to generate a de-duplication frame set corresponding to each historical public opinion video, and classifying the de-duplication frame sets with consistent de-duplication labels under the same de-duplication label to obtain a de-duplication database.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
screening the plurality of screening frame sets according to the historical video duration of the historical public opinion videos corresponding to the screening frame sets and the push video duration of the push public opinion videos to obtain comparison frame sets, wherein the method comprises the following steps:
Obtaining a video duration difference value corresponding to the push public opinion video and each screening frame set according to the push video duration of the push public opinion video and the historical video duration of the historical public opinion video corresponding to each screening frame set;
and acquiring a screening frame set of the video duration difference value in a preset duration difference value interval as a comparison frame set.
4. The method of claim 3, wherein the step of,
obtaining a plurality of selected frame sets in the push public opinion video according to a frame interval strategy, the comparison frame set and the push public opinion video, wherein the selected frame sets comprise a selected first frame set, a selected tail frame set and a selected middle frame set, and the method comprises the following steps:
obtaining comparison video time length corresponding to the comparison frame set, obtaining offset video time length according to the absolute value of the difference value between the comparison video time length and the push video time length, and obtaining a selected first frame number, a selected tail frame number and a selected single-side frame number according to the offset video time length;
acquiring a start frame corresponding to the start time of the push public opinion video, sequentially acquiring video frames of a selected first frame number backwards by taking the start frame as a starting point to generate a selected first frame set, acquiring an end frame corresponding to the end time of the push public opinion video, and sequentially acquiring video frames of a selected tail frame number forwards by taking the end frame as a starting point to generate a selected tail frame set;
And acquiring a push intermediate time corresponding to the push video frame based on the push video time length, acquiring a push intermediate frame corresponding to the push intermediate time, and respectively acquiring the video frames of the selected single-side frames forwards and backwards by taking the push intermediate frame as a starting point to generate a selected intermediate frame set.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,
obtaining the selected first frame number, the selected tail frame number and the selected single-side frame number according to the offset video duration, including:
if the offset video duration is equal to 0, acquiring a preset frame number as a first frame number and a last frame number, and summing one half of the preset frame number and the single-side selected frame number to obtain a single-side frame number;
if the offset video time length is greater than 0, obtaining an offset frame number according to an upward integral value of a product of a unit time length frame number and the offset video time length, taking the sum of the offset frame number and a preset frame number as a selected first frame number and a selected last frame number, obtaining a single-side offset frame number according to one half of the offset frame number, and summing the single-side selected frame number, one half of the preset frame number and the single-side offset frame number to obtain a selected single-side frame number.
6. The method of claim 5, further comprising, prior to responding to the text comparison information:
respectively obtaining a first comparison frame and a first frame set, a middle comparison frame and a middle frame set, and a tail comparison frame and a tail frame set as three comparison groups, and extracting a first extracted text in each comparison frame in each comparison group and a second extracted text in each frame set;
a non-comparison character table is called, and first extracted characters and second extracted characters existing in the non-comparison character table are removed, so that first comparison characters corresponding to each comparison frame and second comparison characters corresponding to each selected frame in each selected frame set are obtained;
if the number of characters of the first comparison characters is larger than 0 and the number of characters of the second comparison characters corresponding to the selected frame is larger than 0 in the selected frame set, generating character comparison information;
and if the number of characters of the first comparison characters is equal to 0 and/or the number of characters of the second comparison characters corresponding to the selected frame in the selected frame set is equal to 0, generating image comparison information.
7. The method of claim 5, wherein the step of determining the position of the probe is performed,
Responding to the text comparison information, performing text comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to text comparison strategies to obtain an analog comparison value, and obtaining text comparison results of the push public opinion videos according to the analog comparison value and a preset text comparison value, wherein the text comparison results comprise:
responding to the text comparison information, acquiring a comparison frame in a comparison group for text comparison as a first comparison frame, and arranging each first comparison frame according to a time sequence by taking a selected frame set in the comparison group for text comparison as a first selected frame set to obtain a first comparison frame sequence;
sequentially acquiring first comparison frames in the first comparison frame sequence as first target frames, and counting the first comparison word number of the first target frames and the second comparison word number of each first selected frame in the first selected frame set;
obtaining a first selected frame with the same number of second comparison characters as the number of first comparison characters as a first screening frame, performing word segmentation on the first comparison characters of the first target frame to obtain a first comparison character sequence, and performing word segmentation on the second comparison characters of the first screening frame to obtain a second comparison character sequence;
Sequentially performing one comparison on characters in the first comparison character sequence and the second comparison character sequence to obtain the same number of characters, and obtaining a class sub comparison value of the first target frame and each first screen sequence frame according to the ratio of the same number of characters to the first comparison character number to obtain a maximum class sub comparison value as a class comparison value corresponding to the first target frame;
counting the class comparison values corresponding to all the first target frames, and taking the corresponding comparison group as a class similarity comparison group when all the class comparison values are larger than the preset character comparison value.
8. The method of claim 7, wherein the step of determining the position of the probe is performed,
responding to the image comparison information, performing image comparison on a plurality of comparison frames in the comparison frame set and selected frames in each selected frame set according to an image comparison strategy to obtain two analog comparison values, obtaining an image comparison result of the push public opinion video according to the two analog comparison values and a preset image comparison value, and performing deletion judgment processing on the push public opinion video based on the text comparison result and/or the image comparison result, wherein the method comprises the following steps:
responding to the image comparison information, acquiring comparison frames in a comparison group for image comparison as second comparison frames, and taking a selected frame set in the comparison group for image comparison as a second selected frame set, and arranging the second comparison frames according to a time sequence to obtain a second comparison frame sequence;
Sequentially obtaining a second comparison frame in the second comparison frame sequence as a second target frame, obtaining a first brightness value of the second target frame and a second brightness value of each second selected frame in the second selected frame set, and obtaining a brightness difference value according to an absolute value of a difference value of the first brightness value and the second brightness value;
obtaining a second selected frame with a brightness difference smaller than a preset brightness difference in the second selected frame set as a second screening frame, and carrying out region comparison on the second target frame and each second screening frame according to a region comparison strategy to obtain a second class comparison value corresponding to the second target frame and each second screening frame, and obtaining the maximum second class comparison value as a second class comparison value corresponding to the second target frame;
and counting the class II comparison values corresponding to all the second target frames, when all the class II comparison values are larger than the preset image comparison value, taking the corresponding comparison group as a class II similarity comparison group, and when all the comparison groups in the push public opinion video are class II similarity comparison groups and/or class II similarity comparison groups, deleting the push public opinion video.
9. The method of claim 8, wherein the step of determining the position of the first electrode is performed,
Performing region comparison on the second target frame and each second screening frame according to a region comparison strategy to obtain second class comparison values corresponding to the second target frame and each second screening frame, and obtaining the maximum second class comparison value as the second class comparison value corresponding to the second target frame, wherein the method comprises the following steps:
performing primary region division on the second target frame and each second screening frame according to a first direction to obtain an upper comparison region and a lower comparison region corresponding to the second target frame and each second screening frame;
counting the screening frame quantity of all the second screening frames, obtaining quantity adjustment coefficients according to the ratio of the screening frame quantity to the preset screening frame quantity, and obtaining the comparison area quantity according to the upward rounding value of the product of the reference area quantity and the quantity adjustment coefficients;
performing secondary region division on the second target frame and the upper comparison region of each second screening frame according to the first direction based on the number of the comparison regions to obtain sub-comparison regions corresponding to the second target frame and each second screening frame;
sequentially selecting sub-comparison areas corresponding to the second target frames and the second screening frames according to a first direction, and comparing pixel values to obtain pixel similarity values of the corresponding sub-comparison areas in the second target frames and the second screening frames;
If the pixel similarity value is smaller than a preset pixel similarity value, deleting the corresponding second screening frame, repeating the deleting step until the sub comparison area is compared, and taking the rest second screening frames as third screening frames;
if the number of the third screening frames is 0, calling a preset two-class comparison value as the two-class comparison value;
if the number of the third screening frames is greater than 0, counting pixel similarity values corresponding to all sub-comparison areas in each third screening frame to obtain a total pixel similarity value, obtaining a class II sub-comparison value corresponding to the third screening frame according to an average value of the total pixel similarity values, and obtaining the maximum class II sub-comparison value as a class II comparison value corresponding to a second target frame.
10. A public opinion monitoring data processing system, comprising:
the database module is used for acquiring a plurality of historical public opinion videos browsed by a first user side in a historical time period, acquiring a plurality of deduplication labels according to video labels of the historical public opinion videos, acquiring a deduplication frame set corresponding to the historical public opinion videos based on a frame number screening strategy, classifying the deduplication frame set consistent with the deduplication labels under the same deduplication label, and acquiring a deduplication database, wherein the deduplication frame set comprises a first frame, a tail frame and an intermediate frame;
The aggregation module is used for extracting a push public opinion video corresponding to the first user side from a public opinion database, traversing a deduplication database corresponding to the first user side based on a push label of the push public opinion video, and acquiring a deduplication frame set, of which the deduplication label is consistent with the push label, as a screening frame set;
the comparison module is used for screening a plurality of screening frame sets according to the historical video duration of the historical public opinion videos corresponding to each screening frame set and the push video duration of the push public opinion videos to obtain a comparison frame set, and obtaining a plurality of selected frame sets in the push public opinion videos according to a frame interval strategy, the comparison frame sets and the push public opinion videos, wherein the selected frame sets comprise a selected first frame set, a selected tail frame set and a selected middle frame set;
the text module is used for responding to the text comparison information, comparing the text of a plurality of comparison frames in the comparison frame set with the text of the selected frame set according to the text comparison strategy to obtain an analog comparison value, and obtaining the text comparison result of the push public opinion video according to the analog comparison value and a preset text comparison value;
The image module is used for responding to the image comparison information, comparing the plurality of comparison frames in the comparison frame set with the selected frames in each selected frame set according to the image comparison strategy to obtain two analog comparison values, obtaining the image comparison result of the push public opinion video according to the two analog comparison values and the preset image comparison value, and deleting and judging the push public opinion video based on the text comparison result and/or the image comparison result.
CN202311025431.9A 2023-08-15 2023-08-15 Public opinion monitoring data processing method and processing system Active CN116737992B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311025431.9A CN116737992B (en) 2023-08-15 2023-08-15 Public opinion monitoring data processing method and processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311025431.9A CN116737992B (en) 2023-08-15 2023-08-15 Public opinion monitoring data processing method and processing system

Publications (2)

Publication Number Publication Date
CN116737992A true CN116737992A (en) 2023-09-12
CN116737992B CN116737992B (en) 2023-10-13

Family

ID=87904795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311025431.9A Active CN116737992B (en) 2023-08-15 2023-08-15 Public opinion monitoring data processing method and processing system

Country Status (1)

Country Link
CN (1) CN116737992B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933709A (en) * 2019-01-31 2019-06-25 平安科技(深圳)有限公司 Public sentiment tracking, device and the computer equipment of videotext data splitting
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph
CN112711705A (en) * 2020-11-30 2021-04-27 泰康保险集团股份有限公司 Public opinion data processing method, equipment and storage medium
CN114925286A (en) * 2022-07-20 2022-08-19 开鑫科技信息服务(南京)有限公司 Public opinion data processing method and device
CN116469039A (en) * 2023-04-28 2023-07-21 青岛尘元科技信息有限公司 Hot video event determination method and system, storage medium and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933709A (en) * 2019-01-31 2019-06-25 平安科技(深圳)有限公司 Public sentiment tracking, device and the computer equipment of videotext data splitting
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph
CN112711705A (en) * 2020-11-30 2021-04-27 泰康保险集团股份有限公司 Public opinion data processing method, equipment and storage medium
CN114925286A (en) * 2022-07-20 2022-08-19 开鑫科技信息服务(南京)有限公司 Public opinion data processing method and device
CN116469039A (en) * 2023-04-28 2023-07-21 青岛尘元科技信息有限公司 Hot video event determination method and system, storage medium and electronic equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YANPING FU ET AL.: "Contrastive transformer based domain adaptation for multi-source cross-domain sentiment classification", 《KNOWLEDGE-BASED SYSTEMS》, pages 1 - 14 *
刘润奇 等: "网络多媒体数据中舆情关联主题的挖掘方法", 《深圳大学学报(理工版)》, pages 72 - 78 *
董娜: "基于用户生成内容的短视频网络舆情传播生态系统构建", 《图书馆》, pages 73 - 81 *

Also Published As

Publication number Publication date
CN116737992B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
US8107689B2 (en) Apparatus, method and computer program for processing information
US10831814B2 (en) System and method for linking multimedia data elements to web pages
US7930647B2 (en) System and method for selecting pictures for presentation with text content
JP4643829B2 (en) System and method for analyzing video content using detected text in a video frame
US9098585B2 (en) Clustering multimedia search
CN109766457B (en) Media content searching method, device and storage medium
CN101650740B (en) Method and device for detecting television advertisements
US20080181492A1 (en) Detection Apparatus, Detection Method, and Computer Program
EP1516264A2 (en) Image retrieval by generating a descriptor for each spot of an image the cells of which having visual characteristics within a selected tolerance
CN110502664A (en) Video tab indexes base establishing method, video tab generation method and device
CN111356015B (en) Duplicate video detection method and device, computer equipment and storage medium
CN116737992B (en) Public opinion monitoring data processing method and processing system
CN117671696A (en) OCR recognition result processing method and device
CN108681549A (en) Method and device for acquiring multimedia resources
US20210073262A1 (en) Media retrieval method and apparatus
JP2016181042A (en) Search apparatus, search method, and program
US9396759B2 (en) Generating content data for a video file
JP4040905B2 (en) Reduced image display device, method, program, and recording medium recording program
CN113378902A (en) Video plagiarism detection method based on optimized video characteristics
WO2003105489A1 (en) Method and device for online dynamic semantic video compression and video indexing
WO2020181903A1 (en) Webpage illustration processing method, system and device, and storage medium
Li et al. Streaming news image summarization
US20140082211A1 (en) System and method for generation of concept structures based on sub-concepts
CN109492023A (en) Automobile information processing method and equipment and computer storage medium
CN118246425B (en) Report generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant