CN114282092A - Information processing method, device, equipment and computer readable storage medium - Google Patents

Information processing method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN114282092A
CN114282092A CN202111487593.5A CN202111487593A CN114282092A CN 114282092 A CN114282092 A CN 114282092A CN 202111487593 A CN202111487593 A CN 202111487593A CN 114282092 A CN114282092 A CN 114282092A
Authority
CN
China
Prior art keywords
keyword
determining
degree
text
title
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111487593.5A
Other languages
Chinese (zh)
Inventor
郭金坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
MIGU Music Co Ltd
MIGU Culture Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
MIGU Music Co Ltd
MIGU Culture Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, MIGU Music Co Ltd, MIGU Culture Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202111487593.5A priority Critical patent/CN114282092A/en
Publication of CN114282092A publication Critical patent/CN114282092A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information processing method, an information processing device, information processing equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a keyword set corresponding to a title of information to be processed; determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high. The method and the device realize the determination of the association degree between the title and the text according to the position of each keyword in the title in the text, and improve the accuracy of the association degree between the title and the text by considering the relative position relation of each keyword in the title information in the text.

Description

Information processing method, device, equipment and computer readable storage medium
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to an information processing method, an information processing apparatus, an information processing device, and a computer-readable storage medium.
Background
With the popularization of computers and the rapid development of networks, the quantity of various news on the internet is rapidly accumulated, but the news with explosive growth often contains a large quantity of headline party news, namely news with news headlines irrelevant to news text.
Currently, identification of headline party news is generally to extract keywords in news text and headline, and then identify whether news headline is related to news text based on whether the keywords are matched. However, the position of the keyword in the title in the news body is not considered in the relevance identification process, so that the identification result is inaccurate.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide an information processing method, an information processing device, information processing equipment and a computer readable storage medium, and aims to solve the technical problem that the correlation identification of the existing news headlines and the news texts is inaccurate.
In order to achieve the above object, the present invention provides an information processing method including:
acquiring a keyword set corresponding to a title of information to be processed;
determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
Further, the step of determining the association degree between the title and the text based on the relative position relationship of different keywords in the keyword set in the text of the information to be processed includes:
determining paragraph information corresponding to the text based on the line break in the text, wherein the paragraph information comprises a first segment, a middle segment and a tail segment;
determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in a keyword set;
and determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword.
Further, the step of determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword comprises:
acquiring a first weight corresponding to a first keyword, a second weight corresponding to a second keyword and a third weight corresponding to a third keyword;
and determining the association degree between the title and the text based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords and the number of the third keywords.
Further, the step of determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword comprises:
determining a first degree of correlation between the title and the text based on the first keyword, the second keyword and the third keyword;
determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set;
and determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree.
Further, the step of determining a second degree of correlation between the title and the body based on the sentence information corresponding to the body and the keyword set includes:
determining a target sentence matched with the keywords of the keyword set in each sentence of the sentence information;
determining a first target sentence matched with a plurality of keywords of a keyword set in a target sentence, and determining the first sentence number of sentences of which the same sentence has adjacent keywords in the first target sentence;
determining adjacent second target sentences in the target sentences and determining the number of second sentences with adjacent keywords in the adjacent second target sentences based on the sequence of each sentence in the sentence information;
determining the second degree of correlation based on the first sentence quantity and the second sentence quantity.
Further, the step of determining a third correlation between the title and the text based on the text keywords corresponding to the text and the keyword set includes:
obtaining synonyms corresponding to all the keywords in the keyword set, and determining a set of words to be matched, which comprises the synonyms and all the keywords in the keyword set;
determining the number of matched keywords of target words to be matched, wherein each word to be matched of the word set to be matched belongs to the text keywords;
and determining the third correlation degree based on the number of the matched keywords.
Further, the step of determining the association degree between the title and the body text based on the first, second and third association degrees comprises:
acquiring the correlation sum of the first correlation, the second correlation and the third correlation;
determining a relevancy threshold based on the number of keywords in the keyword set;
determining the degree of association based on the sum of degrees of correlation and a threshold of degrees of correlation.
Further, to achieve the above object, the present invention also provides an information processing apparatus comprising:
the acquisition module is used for acquiring a keyword set corresponding to the title of the information to be processed;
the determining module is used for determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
Further, to achieve the above object, the present invention also provides an information processing apparatus comprising: the information processing system comprises a memory, a processor and an information processing program which is stored on the memory and can run on the processor, wherein the information processing program realizes the steps of the information processing method when being executed by the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an information processing program which, when executed by a processor, realizes the steps of the aforementioned information processing method.
The method comprises the steps of acquiring a keyword set corresponding to a title of information to be processed; then determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; when the correlation degree corresponding to the relative position relation is larger than a preset threshold value, the correlation degree between the title and the text is high, the correlation degree between the title and the text is determined according to the position of each keyword in the title in the text, and the accuracy of the correlation degree between the title and the text is improved by considering the relative position relation of each keyword in the title information in the text.
Drawings
FIG. 1 is a schematic diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of an information processing method according to the present invention;
FIG. 3 is a functional block diagram of an information processing apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention.
The information processing device of the embodiment of the invention can be a PC. As shown in fig. 1, the information processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Alternatively, the information processing apparatus may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Of course, the information processing device may also be configured with other sensors such as barometer, hygrometer, thermometer, infrared sensor, etc., which are not described herein again.
Those skilled in the art will appreciate that the terminal architecture shown in fig. 1 does not constitute a limitation of the information processing apparatus, and may include more or fewer components than those shown, or some of the components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an information processing program.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call up an information processing program stored in the memory 1005.
In the present embodiment, an information processing apparatus includes: the information processing method comprises a memory 1005, a processor 1001 and an information processing program which is stored on the memory 1005 and can run on the processor 1001, wherein when the processor 1001 calls the information processing program stored in the memory 1005, the steps of the information processing method in each embodiment are executed.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the information processing method according to the present invention.
In this embodiment, the information processing method includes:
step S101, acquiring a keyword set corresponding to a title of information to be processed;
in this embodiment, a title of information to be processed is obtained first, then, a keyword set corresponding to the title is obtained, specifically, the title is subjected to word segmentation processing to obtain a word segmentation processing result, and the word segmentation processing result is subjected to stop word removal processing to obtain a keyword set Twords=[wt1,wt2,wt3,wt4,wt5...]. The information to be processed is information such as news articles and papers.
Step S102, determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
In this embodiment, when the keyword set is obtained, the relative position relationship of different keywords in the keyword set in the text of the information to be processed is obtained, specifically, each paragraph of the text is determined, paragraphs to which different keywords in the keyword set belong are determined, and the number of keywords of different keywords in the keyword set included in each paragraph is obtained. Then, a weight corresponding to each paragraph is set according to the number of each paragraph of the body, and the association degree between the title and the body is determined according to the weight corresponding to each paragraph and the number of keywords corresponding to each paragraph, for example, the association degree between the title and the body is D1W 1+ D2W 2+ … … Dn Wn, where Dn is the number of keywords n, Wn is the weight n, n ranges from 1 to k, and k is the number of paragraphs of the body.
In other embodiments, the paragraphs of the body may be merged, for example, the paragraphs of the body are divided into a first paragraph (first paragraph), a middle paragraph, and a last paragraph (last paragraph).
In the data processing method provided by the embodiment, a keyword set corresponding to a title of information to be processed is obtained; then determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; when the correlation degree corresponding to the relative position relation is larger than a preset threshold value, the correlation degree between the title and the text is high, the correlation degree between the title and the text is determined according to the position of each keyword in the title in the text, and the accuracy of the correlation degree between the title and the text is improved by considering the relative position relation of each keyword in the title information in the text.
A second embodiment of the data processing method of the present invention is proposed based on the first embodiment, and in this embodiment, step S102 includes:
step S201, determining paragraph information corresponding to the text based on the line break in the text, wherein the paragraph information comprises a head section, a middle section and a tail section;
step S202, determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in a keyword set;
step S203, determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword.
In this embodiment, a text is segmented based on a line break in the text, and paragraph information corresponding to the text is determined, where the paragraph information includes a first segment, a middle segment, and a last segment; specifically, when multiple paragraphs of the body are acquired, a leading paragraph (a first paragraph), a trailing paragraph (a last paragraph), and intermediate paragraphs (all paragraphs between the leading paragraph and the trailing paragraph) among the multiple paragraphs are taken as the paragraph information.
Then, determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information, and a third keyword matched with three paragraphs in the paragraph information, in particular, the paragraph information includes a first segment FP, a middle segment MP and a last segment EP, and for a keyword set TwordsIf Wti belongs to FP or Wti belongs to MP or Wti belongs to 0EP, determining that the keyword Wti is a first keyword, if Wti belongs to 1FP and Wti belongs to MP, or Wti belongs to MP and Wti belongs to EP, or Wti belongs to FP and Wti belongs to EP, determining that the keyword Wti is a second keyword, and if Wti belongs to FP, Wti belongs to MP and Wti belongs to MP, determining that the keyword Wti is a third keyword.
In this embodiment, after the first keyword, the second keyword, and the third keyword are obtained, the association degree between the title and the text is determined based on the first keyword, the second keyword, and the third keyword, and specifically, in an embodiment, the step S203 includes:
step S2031, acquiring a first weight corresponding to the first keyword, a second weight corresponding to the second keyword and a third weight corresponding to the third keyword;
step S2032, determining a degree of association between the title and the text based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords, and the number of the third keywords.
In this embodiment, weights corresponding to a keyword belonging to only one end, keywords belonging to both ends, and a keyword belonging to three sections are set in advance according to a first section, a middle section, and a last section of a text, and when a first keyword, a second keyword, and a third keyword are obtained, a first weight corresponding to the first keyword, a second weight corresponding to the second keyword, and a third weight corresponding to the third keyword are obtained.
Then, based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords, and the number of the third keywords, the degree of association between the title and the text is determined, specifically, the degree of association is the number of the first keywords + the number of the second keywords + the number of the third keywords + the third weight, for example, the first weight, the second weight, and the third weight are 1, 2, and 3, respectively, the number of the first keywords is 2, the number of the second keywords is 3, and the number of the third keywords is 1, and the degree of association is 1+ 2+ 3+ 1 is 11. And through the weights of the first keyword, the second keyword and the third keyword, the association degree can be accurately obtained.
In the information processing method provided in this embodiment, paragraph information corresponding to the text is determined based on the line break in the text, where the paragraph information includes a first segment, a middle segment, and a last segment; then determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in the keyword set; and then determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword. The relevancy can be accurately obtained according to the position of each keyword in each paragraph in the text, and the accuracy of the relevancy between the title and the text is further improved.
A third embodiment of the data processing method of the present invention is proposed based on the second embodiment, and in this embodiment, step S203 includes:
step S301, determining a first correlation degree between the title and the text based on the first keyword, the second keyword and the third keyword;
step S302, determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set;
step S303, determining a degree of association between the title and the text based on the first degree of association, the second degree of association, and the third degree of association.
In this embodiment, first, a first degree of correlation between the title and the text is determined based on the first keyword, the second keyword, and the third keyword.
And then, determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, specifically, firstly carrying out sentence segmentation on the text of the text to obtain the sentence information, and determining the second degree of correlation according to the position of each sentence in the sentence information where the keyword sequence in the keyword set appears.
Meanwhile, determining a third degree of relevance corresponding to the title based on the keyword set and the keyword set, specifically, determining the third degree of relevance according to the number of keywords in the keyword set included in the keyword set.
And finally, determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree, specifically, adding the first association degree, the second association degree and the third association degree to obtain a correlation sum, and calculating the association degree according to the correlation sum, so that the association degree between the title and the text is calculated according to the position of each keyword in the title in the text, and the accuracy of association calculation is improved.
It should be noted that after the association degree between the title and the text of the information to be processed is obtained, if the association degree is greater than a preset threshold, it is determined that the information to be processed is normal information (non-title party), and the accuracy of selecting the reference news is improved, where the preset threshold may be reasonably set, for example, the preset threshold is 0.8.
In the information processing method provided by this embodiment, a first degree of correlation between the title and the text is determined based on the first keyword, the second keyword, and the third keyword; then, determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set; and then determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree, so that the association degree between the title and the text is determined according to the relative position relation of each keyword in the title in the text and the position of each keyword in the sentence of the text, and the accuracy of the association degree between the title and the text is further improved.
A fourth embodiment of the information processing method of the present invention is proposed based on the third embodiment, and in this embodiment, step S302 includes:
step S401, in each sentence of the sentence information, determining a target sentence matched with the keywords of the keyword set;
step S402, determining a first target sentence matched with a plurality of keywords in a keyword set in a target sentence, and determining the first sentence number of sentences with adjacent keywords in the same sentence in the first target sentence;
step S403, determining adjacent second target sentences in the target sentences and determining the number of second sentences with adjacent keywords in the adjacent second target sentences based on the sequence of each sentence in the sentence information;
step S404, determining the second degree of correlation based on the first sentence number and the second sentence number.
In this embodiment, a text is subjected to sentence breaking operation based on a preset punctuation mark to obtain sentence information, where the preset punctuation mark includes'. ','? ',' |! ' equal table punctuation at the end of the sentence. In each sentence of the sentence information, a target sentence matching the keywords of the keyword set is determined, wherein the target sentence comprises the keywords of one or more keyword sets.
In this embodiment, when the target sentence is obtained, a first target sentence in the target sentence, which matches with the plurality of keywords in the keyword set, is determined, that is, each sentence in the first target sentence includes a plurality of different keywords in the keyword set, and then, a first sentence number of sentences in which adjacent keywords exist in the same sentence in the first target sentence is determined, that is, the sentences in which adjacent keywords exist at least include any two adjacent keywords in the keyword set, for example, for any sentence Si, if the sentence Si includes Wti and Wt (i +1) in the keyword set, the first sentence number is added by 1.
Next, based on the sequence of each sentence in the sentence information, determining a second adjacent target sentence in the target sentences, where the second target sentence includes one or more groups of adjacent sentences, and determining a second sentence number in which adjacent keywords exist in the second adjacent target sentences, for example, corresponding to any one group of adjacent second target sentences Si and S (i +1), where the second sentence number is increased by 1 if Wti e Si and Wt (i +1) e S (i + 1).
Then, the second degree of correlation is determined based on the first number of sentences and the second number of sentences, and specifically, the second degree of correlation is calculated according to the first number of sentences, the second number of sentences, and the corresponding weights, for example, if the weight corresponding to the first number of sentences and the weight corresponding to the second number of sentences are both 1, the first number of sentences is 5, and the second number of sentences is 8, then the second degree of correlation is 1+ 5+ 8-13. And then, according to the relative position relation of the keywords in the plurality of titles in the text, determining a second degree of correlation, and further improving the accuracy of the degree of correlation between the news titles and the news text.
In the information processing method provided by the embodiment, a target sentence matched with a keyword of a keyword set is determined in each sentence of sentence information; then, determining a first target sentence matched with a plurality of keywords of the keyword set in the target sentence, and determining the first sentence number of sentences of which the same sentence has adjacent keywords in the first target sentence; then, based on the sequence of each sentence in the sentence information, determining adjacent second target sentences in the target sentences, and determining the number of second sentences with adjacent keywords in the adjacent second target sentences; and then, determining the second degree of correlation based on the first sentence quantity and the second sentence quantity, accurately obtaining the second degree of correlation according to the relative position relation of the keywords in the text, further determining the degree of correlation between the titles and the text according to the relative character relation of the keywords of the titles in the text, and further improving the accuracy of the degree of correlation between the news titles and the news text.
A fifth embodiment of the information processing method of the present invention is proposed based on the third embodiment, and in this embodiment, step S302 includes:
step S501, obtaining synonyms corresponding to each keyword of the keyword set, and determining a set of words to be matched, wherein the sets of words to be matched comprise the synonyms and each keyword of the keyword set;
step S502, determining the number of matched keywords of target words to be matched, wherein each word to be matched in the word set to be matched belongs to the text keywords;
step S503, determining the third degree of correlation based on the number of the matched keywords.
In this embodiment, since the text may have synonyms of the keywords, the synonym expansion needs to be performed on the keyword set, that is, synonyms corresponding to the keywords of the keyword set are obtained, and the to-be-matched word set including the synonyms and the keywords of the keyword set is determined, for example, for a keyword "occurrence", the obtaining of the "occurrence" near-synonym set based on the current "synonym thesaurus" includes: [ find, occur, burst, generate ], then "occur" and [ find, occur, burst, generate ] are added to the set of words to be matched.
And then, determining the number of matched keywords of target words to be matched, wherein each word to be matched of the word set to be matched belongs to the text keywords, and the text keywords comprise the target words to be matched.
Next, the third relevance is determined based on the number of matching keywords, specifically, a relevance weight of the matching keywords is set, and the third relevance is, for example, 3, 1, or 3 if the relevance weight of the matching keywords is 1 and the number of matching keywords is 3.
In the information processing method provided by this embodiment, a synonym corresponding to each keyword of the keyword set is obtained, and a to-be-matched word set including the synonym and each keyword of the keyword set is determined; then determining the number of matched keywords of target words to be matched, wherein each word to be matched in the word set to be matched belongs to the text keywords; and then determining the third degree of correlation based on the number of the matched keywords, and determining the third degree of correlation by matching the keyword set with the text keywords, thereby further improving the accuracy of the degree of correlation between the title and the text.
On the basis of the above-described respective embodiments, a sixth embodiment of the information processing method of the present invention is proposed, in which step S403 includes:
step S601, obtaining the correlation sum of the first correlation, the second correlation and the third correlation;
step S602, determining a correlation threshold value based on the number of keywords in the keyword set;
step S603, determining the degree of association based on the correlation sum and a correlation threshold.
In this embodiment, after the first correlation degree, the second correlation degree, and the third correlation degree are obtained, a correlation sum of the first correlation degree, the second correlation degree, and the third correlation degree is obtained, that is, the first correlation degree, the second correlation degree, and the third correlation degree are added to obtain a correlation sum.
Then, a correlation threshold is determined based on the number of keywords in the keyword set, the first correlation, the second correlation, and the third correlation, specifically, a first correlation threshold is determined based on the number N of keywords and the first correlation, a second correlation threshold is determined based on the number N of keywords and the second correlation, and a third correlation threshold is determined based on the number N of keywords and the third correlation, for example, the first correlation threshold is not greater than N (sum of correlation weights of the first keyword, the second keyword, and the third keyword)/3, the second correlation threshold is not greater than 1N, and the correlation threshold is first correlation threshold + second correlation threshold + third correlation threshold.
Finally, the relevance is determined based on the relevance sum and a relevance threshold, the relevance being a relevance and/or a relevance threshold.
In the information processing method provided by this embodiment, the sum of the correlation degrees of the first correlation degree, the second correlation degree, and the third correlation degree is obtained; then, determining a correlation threshold value based on the number of the keywords in the keyword set; and then, the relevancy is determined based on the relevancy sum and the relevancy threshold, and the relevancy can be accurately obtained according to the first relevancy, the second relevancy and the third relevancy, so that the accuracy of the relevancy between the news title and the news text is further improved.
The present invention also provides an information processing apparatus, referring to fig. 3, comprising:
an obtaining module 10, configured to obtain a keyword set corresponding to a title of information to be processed;
a determining module 20, configured to determine, based on a relative position relationship between different keywords in the keyword set and the text of the information to be processed, a degree of association between the title and the text; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
The methods executed by the program units can refer to various embodiments of the information processing method of the present invention, and are not described herein again.
The invention also provides a computer readable storage medium.
The computer-readable storage medium of the present invention stores thereon an information processing program that realizes the steps of the information processing method described above when executed by a processor.
The method implemented when the information processing program running on the processor is executed may refer to each embodiment of the information processing method of the present invention, and details are not described here.
Furthermore, an embodiment of the present invention further provides a computer program product, which includes an information processing program, and when the information processing program is executed by a processor, the information processing program implements the steps of the information processing method described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An information processing method, characterized in that the method comprises:
acquiring a keyword set corresponding to a title of information to be processed;
determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
2. The information processing method according to claim 1, wherein the step of determining the association degree between the title and the body of the information to be processed based on the relative position relationship of different keywords in the keyword set in the body of the information to be processed comprises:
determining paragraph information corresponding to the text based on the line break in the text, wherein the paragraph information comprises a first segment, a middle segment and a tail segment;
determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in a keyword set;
and determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword.
3. The information processing method according to claim 2, wherein the step of determining the degree of association between the title and the body text based on the first keyword, the second keyword, and the third keyword comprises:
acquiring a first weight corresponding to a first keyword, a second weight corresponding to a second keyword and a third weight corresponding to a third keyword;
and determining the association degree between the title and the text based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords and the number of the third keywords.
4. The information processing method according to claim 2, wherein the step of determining the degree of association between the title and the body text based on the first keyword, the second keyword, and the third keyword comprises:
determining a first degree of correlation between the title and the text based on the first keyword, the second keyword and the third keyword;
determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set;
and determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree.
5. The information processing method according to claim 4, wherein the step of determining the second degree of correlation between the title and the body based on the sentence information corresponding to the body and the keyword set comprises:
determining a target sentence matched with the keywords of the keyword set in each sentence of the sentence information;
determining a first target sentence matched with a plurality of keywords of a keyword set in a target sentence, and determining the first sentence number of sentences of which the same sentence has adjacent keywords in the first target sentence;
determining adjacent second target sentences in the target sentences and determining the number of second sentences with adjacent keywords in the adjacent second target sentences based on the sequence of each sentence in the sentence information;
determining the second degree of correlation based on the first sentence quantity and the second sentence quantity.
6. The information processing method according to claim 4, wherein the step of determining a third degree of correlation between the title and the body based on the body keyword corresponding to the body and the keyword set comprises:
obtaining synonyms corresponding to all the keywords in the keyword set, and determining a set of words to be matched, which comprises the synonyms and all the keywords in the keyword set;
determining the number of matched keywords of target words to be matched, wherein each word to be matched of the word set to be matched belongs to the text keywords;
and determining the third correlation degree based on the number of the matched keywords.
7. The information processing method according to any one of claims 4 to 6, wherein the step of determining the degree of association between the title and the body text based on the first degree of association, the second degree of association, and the third degree of association includes:
acquiring the correlation sum of the first correlation, the second correlation and the third correlation;
determining a relevancy threshold based on the number of keywords in the keyword set;
determining the degree of association based on the sum of degrees of correlation and a threshold of degrees of correlation.
8. An information processing apparatus characterized by comprising:
the acquisition module is used for acquiring a keyword set corresponding to the title of the information to be processed;
the determining module is used for determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
9. An information processing apparatus characterized by comprising: memory, processor and information processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the information processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that an information processing program is stored thereon, which when executed by a processor implements the steps of the information processing method according to any one of claims 1 to 7.
CN202111487593.5A 2021-12-07 2021-12-07 Information processing method, device, equipment and computer readable storage medium Pending CN114282092A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111487593.5A CN114282092A (en) 2021-12-07 2021-12-07 Information processing method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111487593.5A CN114282092A (en) 2021-12-07 2021-12-07 Information processing method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN114282092A true CN114282092A (en) 2022-04-05

Family

ID=80871194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111487593.5A Pending CN114282092A (en) 2021-12-07 2021-12-07 Information processing method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN114282092A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115409035A (en) * 2022-06-02 2022-11-29 北京金堤科技有限公司 Conversation information acquisition method and device, storage medium and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011221894A (en) * 2010-04-13 2011-11-04 Hitachi Ltd Secure document detection method, secure document detection program, and optical character reader
CN103617213A (en) * 2013-11-19 2014-03-05 北京奇虎科技有限公司 Method and system for identifying newspage attributive characters
CN106033445A (en) * 2015-03-16 2016-10-19 北京国双科技有限公司 Method and device for obtaining article association degree data
CN106202150A (en) * 2016-06-22 2016-12-07 北京小米移动软件有限公司 Method for information display and device
CN107357781A (en) * 2017-06-29 2017-11-17 胡玥莹 For differentiating the system and method for web page title and the text degree of association
CN109614625A (en) * 2018-12-17 2019-04-12 北京百度网讯科技有限公司 Determination method, apparatus, equipment and the storage medium of the title text degree of correlation
WO2021051599A1 (en) * 2019-09-19 2021-03-25 平安科技(深圳)有限公司 Method and apparatus for extracting locally optimized keywords, device and storage medium
US20210174024A1 (en) * 2018-12-07 2021-06-10 Tencent Technology (Shenzhen) Company Limited Method for training keyword extraction model, keyword extraction method, and computer device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011221894A (en) * 2010-04-13 2011-11-04 Hitachi Ltd Secure document detection method, secure document detection program, and optical character reader
CN103617213A (en) * 2013-11-19 2014-03-05 北京奇虎科技有限公司 Method and system for identifying newspage attributive characters
CN106033445A (en) * 2015-03-16 2016-10-19 北京国双科技有限公司 Method and device for obtaining article association degree data
CN106202150A (en) * 2016-06-22 2016-12-07 北京小米移动软件有限公司 Method for information display and device
CN107357781A (en) * 2017-06-29 2017-11-17 胡玥莹 For differentiating the system and method for web page title and the text degree of association
US20210174024A1 (en) * 2018-12-07 2021-06-10 Tencent Technology (Shenzhen) Company Limited Method for training keyword extraction model, keyword extraction method, and computer device
CN109614625A (en) * 2018-12-17 2019-04-12 北京百度网讯科技有限公司 Determination method, apparatus, equipment and the storage medium of the title text degree of correlation
WO2021051599A1 (en) * 2019-09-19 2021-03-25 平安科技(深圳)有限公司 Method and apparatus for extracting locally optimized keywords, device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115409035A (en) * 2022-06-02 2022-11-29 北京金堤科技有限公司 Conversation information acquisition method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
JP5116775B2 (en) Information retrieval method and apparatus, program, and computer-readable recording medium
US20160275148A1 (en) Database query method and device
US10002188B2 (en) Automatic prioritization of natural language text information
US20120330955A1 (en) Document similarity calculation device
EP3540613A1 (en) Matching device, matching method, and program
CN107885717B (en) Keyword extraction method and device
CN107085568B (en) Text similarity distinguishing method and device
CN111460131A (en) Method, device and equipment for extracting official document abstract and computer readable storage medium
CN112559672B (en) Information detection method, electronic device and computer storage medium
CN109710834B (en) Similar webpage detection method and device, storage medium and electronic equipment
CN104281275B (en) The input method of a kind of English and device
US20180137098A1 (en) Methods and systems for providing universal portability in machine learning
CN104636415A (en) Method of extracting important keyword and server performing the same
CN112214576B (en) Public opinion analysis method, public opinion analysis device, terminal equipment and computer readable storage medium
CN102955773B (en) For identifying the method and system of chemical name in Chinese document
JPWO2011129198A1 (en) Inconsistency detection system, method, and program
CN114282092A (en) Information processing method, device, equipment and computer readable storage medium
WO2018213783A1 (en) Computerized methods of data compression and analysis
CN109446417B (en) Intelligent retrieval method and device
JP7172187B2 (en) INFORMATION DISPLAY METHOD, INFORMATION DISPLAY PROGRAM AND INFORMATION DISPLAY DEVICE
CN112749258A (en) Data searching method and device, electronic equipment and storage medium
CN112527954A (en) Unstructured data full-text search method and system and computer equipment
JP5179564B2 (en) Query segment position determination device
CN109918661B (en) Synonym acquisition method and device
CN115952276A (en) Document retrieval method, system, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination