CN108446296B - Information processing method and device - Google Patents

Information processing method and device Download PDF

Info

Publication number
CN108446296B
CN108446296B CN201810068768.0A CN201810068768A CN108446296B CN 108446296 B CN108446296 B CN 108446296B CN 201810068768 A CN201810068768 A CN 201810068768A CN 108446296 B CN108446296 B CN 108446296B
Authority
CN
China
Prior art keywords
event
report
calculating
reports
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810068768.0A
Other languages
Chinese (zh)
Other versions
CN108446296A (en
Inventor
张轩玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201810068768.0A priority Critical patent/CN108446296B/en
Publication of CN108446296A publication Critical patent/CN108446296A/en
Application granted granted Critical
Publication of CN108446296B publication Critical patent/CN108446296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the invention provides an information processing method and device, wherein the method comprises the following steps: acquiring a hotspot event label and a plurality of event reports; calculating first text similarity of the hotspot event labels and the event reports, calculating second text similarity between the event reports, and acquiring aging characteristic values of the event reports; calculating the maximum edge correlation values of the event reports according to the first text similarity, the second text similarity and the aging characteristic value; and aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain report aggregation results. According to the embodiment of the invention, the time and the energy for the user to obtain the event reports with diversity and timeliness are saved.

Description

Information processing method and device
Technical Field
The present invention relates to the field of information processing, and in particular, to an information processing method, and an information processing apparatus, and a mobile terminal, and a computer-readable storage medium.
Background
Currently, more and more users pay attention to information through the internet, especially to current hot events.
Generally, a user may search for an event report of a hot event to obtain a related event report, or a website operator may recommend the event report of the hot event to the user. When searching for an event report or recommending an event report, a large number of event reports need to be aggregated, and the aggregated result is sent to a user as a search result or a recommendation result.
However, in the current event report aggregation manner, only the relevance between the event report and the hotspot event is considered, so that the repeated content in the aggregated event reports is too much, and the aggregated event report may be expired. By the above event report aggregation manner, a large number of repeated and expired event reports can be provided to the user, and the user needs to spend time and effort to filter the event reports to meet the requirements of the user on diversity, timeliness and the like of the event reports.
Therefore, the event report aggregation method in the prior art has a problem of consuming time and energy of users.
Disclosure of Invention
The embodiment of the invention provides an information processing method and an information processing device aiming at the technical problem to be solved.
In order to solve the above problem, the present invention provides an information processing method, including:
acquiring a hotspot event label and a plurality of event reports;
calculating first text similarity of the hotspot event labels and the event reports, calculating second text similarity between the event reports, and acquiring aging characteristic values of the event reports;
calculating the maximum edge correlation values of the event reports according to the first text similarity, the second text similarity and the aging characteristic value;
and aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain report aggregation results.
Optionally, the hotspot event tag has a corresponding first text vector, and the step of calculating the similarity between the hotspot event tag and the first text of the event reports includes:
selecting event reports to be evaluated from the event reports;
performing word segmentation processing on the event report to be evaluated to obtain a plurality of report word segmentation texts;
calculating a second text vector for the plurality of story segment texts;
and calculating cosine values of the first text vector and the second text vector as the first text similarity.
Optionally, the event reports have N, where the N event reports include M evaluated event reports, 0 < M < N, the evaluated event reports have corresponding third text vectors, and the step of calculating a second text similarity between the event reports includes:
calculating M cosine values of a second text vector of the event report to be evaluated and third text vectors of the M evaluated event reports;
and extracting the maximum cosine value from the M cosine values to serve as the second text similarity.
Optionally, the hotspot event tag has an event time, the event report has a report time, and the step of obtaining the aging characteristic values of the plurality of event reports includes:
calculating a time interval value between the report time of the event report to be evaluated and the event time of the hot event label;
and calculating the aging characteristic value reported by the event to be evaluated by adopting the time interval value and a preset aging attenuation value.
Optionally, the aging characteristic values include a first aging characteristic value and a second aging characteristic value, and the step of calculating the maximum edge correlation value of the event reports according to the first text similarity, the second text similarity, and the aging characteristic values includes:
calculating a first product of the first text similarity and the first time-effect characteristic value;
calculating a second product of the second text similarity and the second aging characteristic value;
calculating a difference of the first product and the second product as the maximum edge correlation value.
Optionally, the step of aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain a report aggregation result includes:
sorting the plurality of event reports according to the size of the maximum edge correlation value;
and taking the sequenced multiple event reports as the report aggregation result.
Optionally, the step of aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain a report aggregation result includes:
extracting event reports of which the maximum edge correlation value is greater than a preset threshold value from the event reports;
and taking the extracted event report as the report aggregation result.
Optionally, the method further comprises:
when a report searching request of a user for the hotspot event label is received, sending the report aggregation result to the user; or
Recommending the report aggregation result to the user.
In order to solve the above problem, the present invention also provides an information processing apparatus comprising:
the system comprises a tag and report acquisition module, a report acquisition module and a report generation module, wherein the tag and report acquisition module is used for acquiring a hotspot event tag and a plurality of event reports;
the text similarity and aging characteristic value calculation module is used for calculating first text similarity between the hotspot event labels and the event reports, calculating second text similarity between the event reports and acquiring aging characteristic values of the event reports;
a maximum edge correlation value calculation module, configured to calculate maximum edge correlation values of the event reports according to the first text similarity, the second text similarity, and the aging characteristic value;
and the report aggregation module is used for aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain a report aggregation result.
Optionally, the hotspot event tag has a corresponding first text vector, and the text similarity and aging characteristic value calculating module includes:
the to-be-evaluated event report selection submodule is used for selecting an to-be-evaluated event report from the plurality of event reports;
the word segmentation processing submodule is used for carrying out word segmentation processing on the event report to be evaluated to obtain a plurality of report word segmentation texts;
a second text vector calculation sub-module for calculating a second text vector of the plurality of story segment texts;
and the first text similarity operator module is used for calculating cosine values of the first text vector and the second text vector to serve as the first text similarity.
Optionally, the event reports have N, where the N event reports include M evaluated event reports, 0 < M < N, the evaluated event reports have corresponding third text vectors, and the text similarity and aging characteristic value calculation module includes:
the M cosine value calculation submodules are used for calculating M cosine values of the second text vector of the event report to be evaluated and the third text vectors of the M evaluated event reports;
and the second text similarity extraction submodule is used for extracting the maximum cosine value from the M cosine values to serve as the second text similarity.
Optionally, the hotspot event tag has an event time, the event report has a report time, and the text similarity and aging characteristic value calculation module includes:
the time interval value calculating operator module is used for calculating the time interval value between the report time of the event report to be evaluated and the event time of the hotspot event label;
and the aging characteristic value operator module is used for calculating the aging characteristic value reported by the event to be evaluated by adopting the time interval value and a preset aging attenuation value.
Optionally, the aging characteristic values include a first aging characteristic value and a second aging characteristic value, and the maximum edge correlation value calculating module includes:
the first product calculating submodule is used for calculating a first product of the first text similarity and the first time-effect characteristic value;
the second product calculating submodule is used for calculating a second product of the second text similarity and the second aging characteristic value;
a product difference calculation sub-module for calculating a difference between the first product and the second product as the maximum edge correlation value.
Optionally, the reporter polymerization module comprises:
a report sorting submodule, configured to sort the event reports according to the size of the maximum edge correlation value;
and the first report aggregation result generation submodule is used for taking the sequenced multiple event reports as the report aggregation result.
Optionally, the reporter polymerization module comprises:
an event report extraction submodule, configured to extract an event report of which the maximum edge correlation value is greater than a preset threshold value from the event reports;
and the second report polymerization result generation submodule is used for taking the extracted event report as the report polymerization result.
Optionally, the apparatus further comprises:
the report aggregation result sending module is used for sending the report aggregation result to the user when receiving a report search request of the user for the hotspot event label; or recommending the report aggregation result to the user.
In order to solve the above problem, the present invention further provides a mobile terminal, which includes a processor, a memory, and a computer program stored in the memory and operable on the processor, and when the computer program is executed by the processor, the mobile terminal implements any of the information processing methods described above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any of the information processing methods described above.
The embodiment of the invention can achieve the following beneficial effects:
according to the embodiment of the invention, the maximum edge correlation values of the event reports are calculated by combining the first text similarity between the hotspot event label and the event reports, the second text similarity between the event reports and the aging characteristic values of the event reports, and the event reports are aggregated according to the maximum edge correlation values to obtain a report aggregation result. The report aggregation result is provided for the user, the user does not need to spend time and energy to screen the event reports meeting the requirements of the user on diversity, timeliness and the like of the event reports, and the time and the energy for the user to obtain the event reports with diversity and timeliness are saved.
Drawings
FIG. 1 is a flowchart illustrating steps of an information processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps of an information processing method according to a second embodiment of the present invention;
fig. 3 is a block diagram of an information processing apparatus according to a third embodiment of the present invention;
fig. 4 is a block diagram of an information processing apparatus according to a fourth embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example one
Fig. 1 is a flowchart of steps of an information processing method according to an embodiment of the present invention, where the method may specifically include the following steps:
step 110, obtain a hotspot event label and a plurality of event reports.
The hotspot event tag may be one or more keywords for identifying a hotspot event.
In specific implementation, the hotspot event labels can be obtained by collecting the search keywords of the user and filtering and sorting the search keywords according to the popularity of the search keywords.
For example, for a hotspot event "article of this mountain is a joss actor and plum is a great beautiful vanish", for a search keyword, corresponding hotspot event tags "article of this mountain is a joss actor and plum is a great beautiful vanish", "article of this mountain is a great beautiful vanish", and the like can be obtained.
The event report may be a report of a news draft, a microblog, a blog article, and the like for a hot event on a network.
In a specific implementation, a search may be performed based on the hotspot event tags, and a plurality of event reports may be extracted from the search results.
On the basis of the foregoing example, a search for "actor Lian Mei Ender the dead" can result in a title "sad! Zhao Benshan partner Li Damei Wai Ming Dynasty! The son red child blogs the member of the family teardrop! "Zhao Ben shan partner Li Damei De-Shi son is a Shen brother of Xiaosheng Yang, and" Song Xiao partner Li Damei De-Shi, enjoying 55 years! The net friend Zhao family is really a multiple autumn rule! "no one lost the home mountain media, love plum died well, no thought was the son to be him? "etc. are reported.
Thus, a hotspot event label and a plurality of event reports for the hotspot event label are acquired.
Step 120, calculating a first text similarity between the hotspot event label and the plurality of event reports, calculating a second text similarity between the plurality of event reports, and obtaining an aging characteristic value of the plurality of event reports.
In a specific implementation, text vectors of each keyword in the hotspot event tag can be calculated, text vectors of each keyword in the event report are calculated, and a cosine value between the two text vectors is calculated to serve as the text similarity. To distinguish the explanation, the text similarity between the hotspot event label and the event report is named as the first text similarity. The first text similarity can be used for evaluating the relevance of the event report and the hotspot event.
In a specific implementation, for two event reports, text vectors of keywords of the two event reports are calculated, and a cosine value between the two text vectors is calculated as the text similarity. To distinguish the explanation, the text similarity between two event stories is named as the second text similarity. The first text similarity can be used for evaluating the relevance between two event reports.
In a specific implementation, the event time of the hot event may be determined first, the report time of the event report may be determined, and the aging characteristic value of the event report may be calculated according to the event time and the report time. The aging characteristic value can be used for evaluating the aging property reported by the event.
In practical applications, the plurality of event reports may include an evaluated event report and an event report to be evaluated. The evaluated event report may be an event report which has been evaluated to meet the user requirement and is marked as a final aggregation result, and the event report to be evaluated may be an event report which is currently evaluated to meet the user requirement and is not marked as a final aggregation result. In step 120, more specifically, one event report to be evaluated may be selected first, a first text similarity between the hotspot event tag and the event report to be evaluated is calculated, a second text similarity between the event report to be evaluated and at least one evaluated event report is calculated, and an aging characteristic value of the event report to be evaluated is calculated according to the event time and the report time of the event report to be evaluated.
Of course, in an actual application scenario, the first text similarity, the second text similarity, and the aging characteristic value may also be calculated for all event reports, and a person skilled in the art may set a specific manner and a specific order for calculating the first text similarity, the second text similarity, and the aging characteristic value according to an actual requirement, which is not limited in this embodiment of the present invention.
Step 130, calculating the maximum edge correlation values of the event reports according to the first text similarity, the second text similarity and the time efficiency characteristic value.
In a specific implementation, the maximum edge correlation value may be calculated by using an MMR model (maximum edge correlation). The MMR model can comprehensively evaluate the relevance, diversity and timeliness of a plurality of event reports related to the hotspot event based on the dimensions of the similarity between the hotspot event tag and the event report, the similarity between the event reports, the interval between the event time and the report time and the like.
Step 140, aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain a report aggregation result.
In a specific implementation, the event reports can be sorted according to the maximum edge correlation values of the event reports, and the sorted event reports are used as report aggregation results; and screening a plurality of event reports according to the maximum edge correlation value, and screening a plurality of event reports with the maximum edge correlation value larger than a preset threshold value as a report aggregation result. The event reports in the aggregation result are all event reports which are closely related to the hot spot events, have low repetition rate and are time-efficient, or the event reports in the aggregation result are all event reports which are closely related to the hot spot events, have low repetition rate and are time-efficient, so that the user does not need to spend additional time and energy, and the event reports meeting the requirements of the user on diversity, time efficiency and the like of the event reports are screened from the aggregation result.
According to the embodiment of the invention, the maximum edge correlation values of the event reports are calculated by combining the first text similarity between the hotspot event label and the event reports, the second text similarity between the event reports and the aging characteristic values of the event reports, and the event reports are aggregated according to the maximum edge correlation values to obtain a report aggregation result. The report aggregation result is provided for the user, the user does not need to spend time and energy to screen the event reports meeting the requirements of the user on diversity, timeliness and the like of the event reports, and the time and the energy for the user to obtain the event reports with diversity and timeliness are saved.
Example two
Fig. 2 is a flowchart of steps of an information processing method according to a second embodiment of the present invention, where the method may specifically include the following steps:
step 210, obtain a hotspot event label and a plurality of event reports.
Step 220, calculating a first text similarity between the hotspot event label and the plurality of event reports, calculating a second text similarity between the plurality of event reports, and obtaining an aging characteristic value of the plurality of event reports.
Optionally, the hotspot event tag has a corresponding first text vector, and the step of calculating the similarity between the hotspot event tag and the first text of the event reports includes:
step 221, selecting event reports to be evaluated from the event reports;
step 222, performing word segmentation processing on the event report to be evaluated to obtain a plurality of report word segmentation texts;
step 223, calculating a second text vector of the report participle texts;
step 224, calculating cosine values of the first text vector and the second text vector as the first text similarity.
In a specific implementation, a text vector of each keyword in the hotspot event tag can be calculated as a first text vector of the hotspot event tag. For a plurality of event reports, an event report with the maximal edge correlation value which is not evaluated can be selected as an event report to be evaluated. And then, performing word segmentation processing on the event report to be evaluated to obtain a plurality of report word segmentation texts. In practical application, word segmentation processing can be performed only on the titles of the event reports to be evaluated. For example, for the title of the report of the event to be evaluated, that is, "this mountain media is lost again, apprentices about the fact that the son is really he", word segmentation processing is performed to obtain a plurality of word segmentation texts such as "this mountain", "apprentices", "li mamei", "died", "son", and text vectors of the plurality of word segmentation texts are calculated to serve as a second text vector of the report of the event to be evaluated.
After the second text vector of the event report to be evaluated is obtained, a cosine value between the first text vector of the hotspot event label and the second text vector of the event report to be evaluated is calculated and used as the similarity of the first text.
For example, the first text vector of the hotspot Event tag Event is eeventSelecting the ith event report Feed when N event reports exist currentlyiAs the report of the event to be evaluated, the second text vector is efeed_iCalculating a first text vector eeventAnd a second text vector efeed_iCosine value sim between1The formula of (1) is as follows:
Figure BDA0001557554290000101
optionally, the event reports have N, where the N event reports include M evaluated event reports, 0 < M < N, the evaluated event reports have corresponding third text vectors, and the step of calculating a second text similarity between the event reports includes:
step 225, calculating M cosine values of the second text vector of the report of the event to be evaluated and the third text vectors of the M reports of the evaluated event;
step 226, extracting the maximum cosine value from the M cosine values as the second text similarity.
In a specific implementation, it is assumed that there are currently N event reports, and M event reports among the N event reports have been evaluated, i.e., there are M evaluated event reports among the N event reports. For the rated event reports, a text vector of each keyword thereof may be calculated as a third text vector of the rated event reports.
After the second text vector of the event report to be evaluated is obtained, cosine values between the second text vector of the event report to be evaluated and the third text vectors of the M evaluated event reports can be calculated, and M cosine values are obtained. And searching the cosine value with the maximum value as the second text similarity aiming at the M cosine values.
Based on the above example, M evaluated event reports are selected from N event reports, and the jth evaluated event report Feed is selectedjThe third text vector is efeed_jCalculating a second text vector efeed_iAnd a third text vector efeed_jCosine value sim between2The formula of (1) is as follows:
Figure BDA0001557554290000102
optionally, the hotspot event tag has an event time, the event report has a report time, and the step of obtaining the aging characteristic values of the plurality of event reports includes:
step 227, calculating a time interval value between the report time of the event report to be evaluated and the event time of the hotspot event label;
and step 228, calculating the aging characteristic value reported by the event to be evaluated by using the time interval value and a preset aging attenuation value.
In specific implementation, the occurrence time of the hotspot event can be determined in a manual definition or network crawling manner, and the occurrence time is used as the event time of the hotspot event tag. In addition, the published event of the event report can be regarded as its report event. Therefore, the event of the hot spot event label and the report time of the event report are obtained.
And calculating a time interval value aiming at the report time of the event report to be evaluated and the event time of the hot event label. For example, the Event time of the hotspot Event tag Event is TeventEvent to be evaluated report FeediIs reported as Tfeed_iThe time interval value delta T is Tevent-Tfeed_i
The above-mentioned aging attenuation value t may be set in advance based on an empirical valueoAnd the method is used for adjusting the degree of proportion of the decay of the event heat degree of a certain type of hotspot event along with time. For the time interval value delta t and the aging attenuation value toAnd calculating the time efficiency characteristic value reported by the event to be evaluated through an exponential function formula. For example, the aging characteristic value λ may be calculated using the following formula:
Figure BDA0001557554290000111
through the above formula, the timeliness of the event reports relative to the hotspot events can be quantified.
Step 230, calculating the maximum edge correlation values of the event reports according to the first text similarity, the second text similarity and the aging characteristic value.
Optionally, the aging characteristic values include a first aging characteristic value and a second aging characteristic value, and the step 230 includes:
step 231, calculating a first product of the first text similarity and the first time-effect characteristic value;
step 232, calculating a second product of the second text similarity and the second aging characteristic value;
step 233, calculating a difference between the first product and the second product as the maximum edge correlation value.
In practical applications, the MMR model may be used to calculate the maximum edge correlation value. The MMR formula is as follows:
Figure BDA0001557554290000121
as can be seen from the above formula, the first text similarity sim is calculated1(FeediEvent) and the first time-effect characteristic value lambda, and calculating the similarity of the second text
Figure BDA0001557554290000122
The second product of the time characteristic value and the second time characteristic value (1-lambda) can be calculated according to the difference value of the two products to obtain the maximum edge correlation value MMR of the event report to be evaluatedfeed_i
And repeating the steps for each event report to obtain the maximum edge correlation value of a plurality of event reports.
After the maximum edge correlation value for an event report is obtained, it can be labeled as an evaluated event report accordingly.
Step 240, aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain a report aggregation result.
Optionally, the step 240 includes:
step 241, sorting the event reports according to the size of the maximum edge correlation value;
and 242, taking the sequenced multiple event reports as the report aggregation result.
In a specific implementation, the maximum edge correlation values of a plurality of event reports may be ranked, and the ranked event reports are aggregated as the report aggregation result. According to the report aggregation result, a plurality of event reports ranked at the top are all event reports which are closely related to the hotspot events, have low repetition rate and are time-sensitive, so that a user can conveniently browse the required event reports.
Optionally, the step 240 includes:
step 243, extracting event reports of which the maximum edge correlation value is greater than a preset threshold value from the plurality of event reports;
and step 244, taking the extracted event report as the report aggregation result.
In a specific implementation, the maximum edge correlation value of the event report may be compared with a preset threshold, and if the maximum edge correlation value is greater than the preset threshold, the maximum edge correlation value is retained, and if the maximum edge correlation value is less than the preset threshold, the maximum edge correlation value is discarded. Finally, the remaining event reports are aggregated as the report aggregation result. The event reports included in the report aggregation result are all event reports which are closely related to the hot spot events, have low repetition rate and are time-sensitive, so that a user can conveniently browse the required event reports.
Step 250, when receiving a report search request of a user for the hotspot event label, sending the report aggregation result to the user; alternatively, the story aggregation results are recommended to the user.
In specific implementation, when a user searches for an event report, the user usually submits a certain search keyword, and if the search keyword is matched with a certain hotspot event tag, a report aggregation result corresponding to the hotspot event tag can be sent to the user.
In addition, report aggregation results of a certain hotspot event label can be sent to the user periodically, so that a plurality of event reports can be recommended to the user for browsing.
It should be noted that the information processing method provided by the embodiment of the present invention may be applied to a server, and the server may obtain a report aggregation result according to the obtained hotspot event tag and a plurality of event reports by using the above method, and send the report aggregation result to a user terminal for displaying to a user. Of course, in practical application, the method can also be applied to a user terminal, and the user terminal obtains a report aggregation result according to the obtained hotspot event label and a plurality of event reports by the method, and displays the report aggregation result to the user.
EXAMPLE III
Fig. 3 is a block diagram of an information processing apparatus according to a third embodiment of the present invention, where the information processing apparatus 300 may specifically include the following modules:
a tag and report acquiring module 310, configured to acquire a hotspot event tag and a plurality of event reports;
a text similarity and aging characteristic value calculating module 320, configured to calculate first text similarities between the hotspot event tags and the event reports, calculate second text similarities between the event reports, and obtain aging characteristic values of the event reports;
a maximum edge correlation value calculating module 330, configured to calculate maximum edge correlation values of the event reports according to the first text similarity, the second text similarity, and the age characteristic value;
and the report aggregation module 340 is configured to aggregate the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain a report aggregation result.
According to the embodiment of the present invention, a tag and report acquiring module 310 acquires a hotspot event tag and a plurality of event reports, a text similarity and aging characteristic value calculating module 320 acquires a first text similarity, a second text similarity and an aging characteristic value, a maximum edge correlation value calculating module 330 calculates a maximum edge correlation value of the plurality of event reports by combining the hotspot event tag and the first text similarity of the plurality of event reports, the second text similarity between the plurality of event reports, and the aging characteristic values of the plurality of event reports, and a report aggregating module 340 aggregates the plurality of event reports according to the maximum edge correlation value to obtain a report aggregating result. The report aggregation result is provided for the user, the user does not need to spend time and energy to screen out the event reports meeting the requirements of the user on diversity, timeliness and the like of the event reports, and the time and the energy for the user to acquire the event reports with diversity and timeliness are saved.
Example four
Fig. 4 is a block diagram of an information processing apparatus according to a fourth embodiment of the present invention, where the information processing apparatus 400 may specifically include the following modules:
a tag and report acquisition module 410, configured to acquire a hotspot event tag and a plurality of event reports;
a text similarity and aging characteristic value calculating module 420, configured to calculate first text similarities between the hotspot event tags and the event reports, calculate second text similarities between the event reports, and obtain aging characteristic values of the event reports;
a maximum edge correlation value calculation module 430, configured to calculate maximum edge correlation values of the event reports according to the first text similarity, the second text similarity, and the aging characteristic value;
a report aggregation module 440, configured to aggregate the plurality of event reports according to the maximum edge correlation values of the plurality of event reports, so as to obtain a report aggregation result;
a report aggregation result sending module 450, configured to send the report aggregation result to the user when a report search request for the hotspot event tag is received from the user; or recommending the report aggregation result to the user.
Optionally, the hotspot event tag has a corresponding first text vector, and the text similarity and aging characteristic value calculating module 420 may specifically include the following sub-modules:
an event report to be evaluated selecting submodule 421, configured to select an event report to be evaluated from the multiple event reports;
the word segmentation processing submodule 422 is configured to perform word segmentation processing on the report of the event to be evaluated to obtain a plurality of report word segmentation texts;
a second text vector calculation sub-module 423 for calculating a second text vector of the plurality of story segment texts;
a first text similarity operator module 424, configured to calculate cosine values of the first text vector and the second text vector as the first text similarity.
Optionally, the event reports have N, where the N event reports include M evaluated event reports, 0 < M < N, the evaluated event reports have corresponding third text vectors, and the text similarity and aging characteristic value calculating module 420 may specifically include the following sub-modules:
the M cosine value calculation submodule 425 is configured to calculate M cosine values of the second text vector of the event report to be evaluated and the third text vectors of the M evaluated event reports;
and a second text similarity extraction sub-module 426, configured to extract a maximum cosine value from the M cosine values as the second text similarity.
Optionally, the hotspot event tag has an event time, the event report has a report time, and the text similarity and aging characteristic value calculating module 420 may specifically include the following sub-modules:
a time interval value calculating operator module 427, configured to calculate a time interval value between the report time of the event report to be evaluated and the event time of the hotspot event tag;
and the aging characteristic value operator module 428 is configured to calculate an aging characteristic value reported by the event to be evaluated by using the time interval value and a preset aging attenuation value.
Optionally, the aging characteristic values include a first aging characteristic value and a second aging characteristic value, and the maximum edge correlation value calculating module 430 includes:
a first product calculating submodule 431, configured to calculate a first product of the first text similarity and the first time-efficient feature value;
a second product calculating submodule 432, configured to calculate a second product of the second text similarity and the second aging characteristic value;
a product difference calculation submodule 433, configured to calculate a difference between the first product and the second product as the maximum edge correlation value.
Optionally, the report aggregation module 440 may specifically include the following sub-modules:
a report ordering submodule 441, configured to order the event reports according to the size of the maximum edge correlation value;
the first report aggregation result generation sub-module 442 is configured to use the sorted event reports as the report aggregation result.
Optionally, the report aggregation module 440 may specifically include the following sub-modules:
the event report extraction sub-module 443 is configured to extract an event report of which the maximum edge correlation value is greater than a preset threshold value from the plurality of event reports;
the second report aggregation result generation submodule 444 is configured to use the extracted event report as the report aggregation result.
EXAMPLE five
The fifth embodiment of the present invention provides a mobile terminal, where the mobile terminal may include a processor, a memory, and a computer program that is stored in the memory and is executable on the processor;
when the computer program is executed by the processor, the steps of any of the information processing methods in the above method embodiments may be implemented, and the same technical effects may be achieved.
The processor is a control center of the mobile terminal, connects various parts of the whole terminal by using various interfaces and lines, and executes various functions of the terminal and processes data by operating or executing software programs and/or modules stored in the memory and calling data stored in the memory. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
EXAMPLE six
A sixth embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of any information processing method in the foregoing method embodiments can be implemented, and the same technical effects can be achieved, and are not described herein again to avoid repetition.
The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
In a typical configuration, the computer system includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage systems, or any other non-transmission medium that can be used to store information that can be accessed by a computing system. As defined herein, computer readable media does not include non-transitory computer readable media (fransitory media), such as modulated data signals and carrier waves.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal systems (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal system to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal system, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal system to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal system to cause a series of operational steps to be performed on the computer or other programmable terminal system to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal system provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or end system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or end system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or end system that comprises the element.
The technical solutions provided by the present invention are described in detail above, and the principle and the implementation of the present invention are explained in this document by applying specific examples, and the descriptions of the above examples are only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (18)

1. An information processing method, characterized in that the method comprises:
acquiring a hotspot event label and a plurality of event reports; the hot event tag is obtained according to a search keyword of a user and the popularity of the keyword; the event report is extracted from a search result obtained by searching based on the hotspot event label;
calculating first text similarity of the hotspot event labels and the event reports, calculating second text similarity between the event reports, and acquiring aging characteristic values of the event reports; the hotspot event label is provided with event time, the event report is provided with report time, and the aging characteristic value is calculated according to the event time and the report time;
calculating the maximum edge correlation values of the event reports according to the first text similarity, the second text similarity and the aging characteristic value;
and aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain report aggregation results.
2. The method of claim 1, wherein the hotspot event tag has a corresponding first text vector, and wherein the step of calculating the first text similarity of the hotspot event tag to the plurality of event reports comprises:
selecting event reports to be evaluated from the event reports;
performing word segmentation processing on the event report to be evaluated to obtain a plurality of report word segmentation texts;
calculating a second text vector for the plurality of story segment texts;
and calculating cosine values of the first text vector and the second text vector as the first text similarity.
3. The method of claim 2, wherein the event reports have N, wherein the N event reports include M rated event reports, 0 < M < N, and wherein the rated event reports have corresponding third text vectors, and wherein the step of calculating the second text similarity between the event reports comprises:
calculating M cosine values of a second text vector of the event report to be evaluated and third text vectors of the M evaluated event reports;
and extracting the maximum cosine value from the M cosine values to serve as the second text similarity.
4. The method of claim 2, wherein the step of obtaining the age characteristic values of the plurality of event reports comprises:
calculating a time interval value between the report time of the event report to be evaluated and the event time of the hot event label;
and calculating the aging characteristic value reported by the event to be evaluated by adopting the time interval value and a preset aging attenuation value.
5. The method of claim 1, wherein the age feature values comprise a first age feature value and a second age feature value, and wherein the step of calculating the maximum edge correlation value for the plurality of event reports based on the first text similarity, the second text similarity, and the age feature values comprises:
calculating a first product of the first text similarity and the first time-effect characteristic value;
calculating a second product of the second text similarity and the second aging characteristic value;
calculating a difference between the first product and the second product as the maximum edge correlation value;
the first time-effect characteristic value is calculated by the following formula:
Figure FDA0003106318050000021
the second aging characteristic value is 1-lambda;
wherein λ is a first time-efficient characteristic value, toThe method comprises the steps that a preset aging attenuation value is obtained, delta t is a time interval value between the report time of an event report to be evaluated and the event time of a hot spot event label, the hot spot event label is used for identifying a hot spot event, and the first time efficiency characteristic value is used for quantifying the timeliness of the event report relative to the hot spot event.
6. The method of claim 1, wherein said step of aggregating said plurality of event reports according to their maximum edge correlation values to obtain a report aggregation result comprises:
sorting the plurality of event reports according to the size of the maximum edge correlation value;
and taking the sequenced multiple event reports as the report aggregation result.
7. The method of claim 1, wherein said step of aggregating said plurality of event reports according to their maximum edge correlation values to obtain a report aggregation result comprises:
extracting event reports of which the maximum edge correlation value is greater than a preset threshold value from the event reports;
and taking the extracted event report as the report aggregation result.
8. The method of claim 1, further comprising:
when a report searching request of a user for the hotspot event label is received, sending the report aggregation result to the user; or
Recommending the report aggregation result to the user.
9. An information processing apparatus characterized in that the apparatus comprises:
the system comprises a tag and report acquisition module, a report acquisition module and a report generation module, wherein the tag and report acquisition module is used for acquiring a hotspot event tag and a plurality of event reports; the hot event tag is obtained according to a search keyword of a user and the popularity of the keyword; the event report is extracted from a search result obtained by searching based on the hotspot event label;
the text similarity and aging characteristic value calculation module is used for calculating first text similarity between the hotspot event labels and the event reports, calculating second text similarity between the event reports and acquiring aging characteristic values of the event reports; the hotspot event label is provided with event time, the event report is provided with report time, and the aging characteristic value is calculated according to the event time and the report time;
a maximum edge correlation value calculation module, configured to calculate maximum edge correlation values of the event reports according to the first text similarity, the second text similarity, and the aging characteristic value;
and the report aggregation module is used for aggregating the plurality of event reports according to the maximum edge correlation values of the plurality of event reports to obtain a report aggregation result.
10. The apparatus of claim 9, wherein the hotspot event tag has a corresponding first text vector, and wherein the text similarity and aging feature value calculating module comprises:
the to-be-evaluated event report selection submodule is used for selecting an to-be-evaluated event report from the plurality of event reports;
the word segmentation processing submodule is used for carrying out word segmentation processing on the event report to be evaluated to obtain a plurality of report word segmentation texts;
a second text vector calculation sub-module for calculating a second text vector of the plurality of story segment texts;
and the first text similarity operator module is used for calculating cosine values of the first text vector and the second text vector to serve as the first text similarity.
11. The apparatus of claim 10, wherein the event reports have N, wherein the N event reports include M rated event reports, 0 < M < N, and wherein the rated event reports have corresponding third text vectors, and wherein the text similarity and age feature value calculation module comprises:
the M cosine value calculation submodules are used for calculating M cosine values of the second text vector of the event report to be evaluated and the third text vectors of the M evaluated event reports;
and the second text similarity extraction submodule is used for extracting the maximum cosine value from the M cosine values to serve as the second text similarity.
12. The apparatus of claim 10, wherein the text similarity and age feature value calculation module comprises:
the time interval value calculating operator module is used for calculating the time interval value between the report time of the event report to be evaluated and the event time of the hotspot event label;
and the aging characteristic value operator module is used for calculating the aging characteristic value reported by the event to be evaluated by adopting the time interval value and a preset aging attenuation value.
13. The apparatus of claim 9, wherein the age eigenvalues comprise a first age eigenvalue and a second age eigenvalue, and wherein the maximum edge correlation calculation module comprises:
the first product calculating submodule is used for calculating a first product of the first text similarity and the first time-effect characteristic value;
the second product calculating submodule is used for calculating a second product of the second text similarity and the second aging characteristic value;
a product difference calculation sub-module for calculating a difference between the first product and the second product as the maximum edge correlation value;
the first time-effect characteristic value is calculated by the following formula:
Figure FDA0003106318050000041
the second aging characteristic value is 1-lambda;
wherein λ is a first time-efficient characteristic value, toThe method comprises the steps that a preset aging attenuation value is obtained, delta t is a time interval value between the report time of an event report to be evaluated and the event time of a hot spot event label, the hot spot event label is used for identifying a hot spot event, and the first time efficiency characteristic value is used for quantifying the timeliness of the event report relative to the hot spot event.
14. The apparatus of claim 9, wherein the report aggregation module comprises:
a report sorting submodule, configured to sort the event reports according to the size of the maximum edge correlation value;
and the first report aggregation result generation submodule is used for taking the sequenced multiple event reports as the report aggregation result.
15. The apparatus of claim 9, wherein the report aggregation module comprises:
an event report extraction submodule, configured to extract an event report of which the maximum edge correlation value is greater than a preset threshold value from the event reports;
and the second report polymerization result generation submodule is used for taking the extracted event report as the report polymerization result.
16. The apparatus of claim 9, further comprising:
the report aggregation result sending module is used for sending the report aggregation result to the user when receiving a report search request of the user for the hotspot event label; or recommending the report aggregation result to the user.
17. An information processing mobile terminal, comprising a processor, a memory, and a computer program stored on the memory and operable on the processor, wherein the computer program, when executed by the processor, implements the information processing method according to any one of claims 1 to 8.
18. A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the information processing method of any one of claims 1 to 8.
CN201810068768.0A 2018-01-24 2018-01-24 Information processing method and device Active CN108446296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810068768.0A CN108446296B (en) 2018-01-24 2018-01-24 Information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810068768.0A CN108446296B (en) 2018-01-24 2018-01-24 Information processing method and device

Publications (2)

Publication Number Publication Date
CN108446296A CN108446296A (en) 2018-08-24
CN108446296B true CN108446296B (en) 2021-10-15

Family

ID=63191142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810068768.0A Active CN108446296B (en) 2018-01-24 2018-01-24 Information processing method and device

Country Status (1)

Country Link
CN (1) CN108446296B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918627B (en) * 2019-01-08 2024-03-19 平安科技(深圳)有限公司 Text generation method, device, electronic equipment and storage medium
CN110543914B (en) * 2019-09-04 2022-06-24 软通智慧信息技术有限公司 Event data processing method and device, computing equipment and medium
CN110929018B (en) * 2019-12-04 2023-03-21 Oppo(重庆)智能科技有限公司 Text processing method and device, storage medium and electronic equipment

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4231298B2 (en) * 2003-01-14 2009-02-25 日本電信電話株式会社 Information extraction rule creation system, information extraction rule creation program, information extraction system, and information extraction program
US8086600B2 (en) * 2006-12-07 2011-12-27 Google Inc. Interleaving search results
CN101174273B (en) * 2007-12-04 2010-06-23 清华大学 News event detecting method based on metadata analysis
CN102937960B (en) * 2012-09-06 2015-06-17 北京邮电大学 Device for identifying and evaluating emergency hot topic
CN102929977B (en) * 2012-10-16 2015-07-22 浙江大学 Event tracing method aiming at news website
CN103020159A (en) * 2012-11-26 2013-04-03 百度在线网络技术(北京)有限公司 Method and device for news presentation facing events
CN103984757B (en) * 2014-05-29 2016-08-24 奇飞翔艺(北京)软件有限公司 Search results pages is inserted the method and system of news information entry
CN104217033B (en) * 2014-09-29 2017-11-07 北京奇虎科技有限公司 Based on ageing searching method and device
CN104915447B (en) * 2015-06-30 2018-04-20 北京奇艺世纪科技有限公司 A kind of much-talked-about topic tracking and keyword determine method and device
CN107038193B (en) * 2016-11-17 2020-11-27 创新先进技术有限公司 Text information processing method and device
CN106776933A (en) * 2016-12-01 2017-05-31 厦门市美亚柏科信息股份有限公司 A kind of processing method and system that polymerization is analyzed to similar case information
CN106874419B (en) * 2017-01-22 2019-09-10 北京航空航天大学 A kind of real-time hot spot polymerization of more granularities

Also Published As

Publication number Publication date
CN108446296A (en) 2018-08-24

Similar Documents

Publication Publication Date Title
CN108694223B (en) User portrait database construction method and device
US9460117B2 (en) Image searching
US8290927B2 (en) Method and apparatus for rating user generated content in search results
US20190018900A1 (en) Method and Apparatus for Displaying Search Results
US10528907B2 (en) Automated categorization of products in a merchant catalog
US8738613B2 (en) Relevancy ranking of search results in a network based upon a user&#39;s computer-related activities
CN110909182B (en) Multimedia resource searching method, device, computer equipment and storage medium
CN102054003B (en) Methods and systems for recommending network information and creating network resource index
US20110246457A1 (en) Ranking of search results based on microblog data
US8463785B2 (en) Method and system for generating search collection of query
CN105005582A (en) Recommendation method and device for multimedia information
CN108446296B (en) Information processing method and device
JP2014515514A (en) Method and apparatus for providing suggested words
CN102855256A (en) Method, device and equipment for determining evaluation information of websites
US11423096B2 (en) Method and apparatus for outputting information
US20230004608A1 (en) Method for content recommendation and device
CN110046278B (en) Video classification method and device, terminal equipment and storage medium
CN105808773A (en) News pushing method and device
CN108540860B (en) Video recall method and device
KR20180075234A (en) Method and device for recommending contents based on inflow keyword and relevant keyword for contents
JP2018504686A (en) Method and apparatus for processing search data
CN111061954B (en) Search result sorting method and device and storage medium
CN111241381A (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
CN113220974A (en) Click rate prediction model training and search recall method, device, equipment and medium
CN108304453B (en) Method and device for determining video related search terms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant