CN114282092A - Information processing method, device, equipment and computer readable storage medium - Google Patents
Information processing method, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN114282092A CN114282092A CN202111487593.5A CN202111487593A CN114282092A CN 114282092 A CN114282092 A CN 114282092A CN 202111487593 A CN202111487593 A CN 202111487593A CN 114282092 A CN114282092 A CN 114282092A
- Authority
- CN
- China
- Prior art keywords
- keyword
- determining
- degree
- text
- title
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 70
- 238000003672 processing method Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000004891 communication Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an information processing method, an information processing device, information processing equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a keyword set corresponding to a title of information to be processed; determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high. The method and the device realize the determination of the association degree between the title and the text according to the position of each keyword in the title in the text, and improve the accuracy of the association degree between the title and the text by considering the relative position relation of each keyword in the title information in the text.
Description
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to an information processing method, an information processing apparatus, an information processing device, and a computer-readable storage medium.
Background
With the popularization of computers and the rapid development of networks, the quantity of various news on the internet is rapidly accumulated, but the news with explosive growth often contains a large quantity of headline party news, namely news with news headlines irrelevant to news text.
Currently, identification of headline party news is generally to extract keywords in news text and headline, and then identify whether news headline is related to news text based on whether the keywords are matched. However, the position of the keyword in the title in the news body is not considered in the relevance identification process, so that the identification result is inaccurate.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide an information processing method, an information processing device, information processing equipment and a computer readable storage medium, and aims to solve the technical problem that the correlation identification of the existing news headlines and the news texts is inaccurate.
In order to achieve the above object, the present invention provides an information processing method including:
acquiring a keyword set corresponding to a title of information to be processed;
determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
Further, the step of determining the association degree between the title and the text based on the relative position relationship of different keywords in the keyword set in the text of the information to be processed includes:
determining paragraph information corresponding to the text based on the line break in the text, wherein the paragraph information comprises a first segment, a middle segment and a tail segment;
determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in a keyword set;
and determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword.
Further, the step of determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword comprises:
acquiring a first weight corresponding to a first keyword, a second weight corresponding to a second keyword and a third weight corresponding to a third keyword;
and determining the association degree between the title and the text based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords and the number of the third keywords.
Further, the step of determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword comprises:
determining a first degree of correlation between the title and the text based on the first keyword, the second keyword and the third keyword;
determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set;
and determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree.
Further, the step of determining a second degree of correlation between the title and the body based on the sentence information corresponding to the body and the keyword set includes:
determining a target sentence matched with the keywords of the keyword set in each sentence of the sentence information;
determining a first target sentence matched with a plurality of keywords of a keyword set in a target sentence, and determining the first sentence number of sentences of which the same sentence has adjacent keywords in the first target sentence;
determining adjacent second target sentences in the target sentences and determining the number of second sentences with adjacent keywords in the adjacent second target sentences based on the sequence of each sentence in the sentence information;
determining the second degree of correlation based on the first sentence quantity and the second sentence quantity.
Further, the step of determining a third correlation between the title and the text based on the text keywords corresponding to the text and the keyword set includes:
obtaining synonyms corresponding to all the keywords in the keyword set, and determining a set of words to be matched, which comprises the synonyms and all the keywords in the keyword set;
determining the number of matched keywords of target words to be matched, wherein each word to be matched of the word set to be matched belongs to the text keywords;
and determining the third correlation degree based on the number of the matched keywords.
Further, the step of determining the association degree between the title and the body text based on the first, second and third association degrees comprises:
acquiring the correlation sum of the first correlation, the second correlation and the third correlation;
determining a relevancy threshold based on the number of keywords in the keyword set;
determining the degree of association based on the sum of degrees of correlation and a threshold of degrees of correlation.
Further, to achieve the above object, the present invention also provides an information processing apparatus comprising:
the acquisition module is used for acquiring a keyword set corresponding to the title of the information to be processed;
the determining module is used for determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
Further, to achieve the above object, the present invention also provides an information processing apparatus comprising: the information processing system comprises a memory, a processor and an information processing program which is stored on the memory and can run on the processor, wherein the information processing program realizes the steps of the information processing method when being executed by the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an information processing program which, when executed by a processor, realizes the steps of the aforementioned information processing method.
The method comprises the steps of acquiring a keyword set corresponding to a title of information to be processed; then determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; when the correlation degree corresponding to the relative position relation is larger than a preset threshold value, the correlation degree between the title and the text is high, the correlation degree between the title and the text is determined according to the position of each keyword in the title in the text, and the accuracy of the correlation degree between the title and the text is improved by considering the relative position relation of each keyword in the title information in the text.
Drawings
FIG. 1 is a schematic diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of an information processing method according to the present invention;
FIG. 3 is a functional block diagram of an information processing apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention.
The information processing device of the embodiment of the invention can be a PC. As shown in fig. 1, the information processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Alternatively, the information processing apparatus may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Of course, the information processing device may also be configured with other sensors such as barometer, hygrometer, thermometer, infrared sensor, etc., which are not described herein again.
Those skilled in the art will appreciate that the terminal architecture shown in fig. 1 does not constitute a limitation of the information processing apparatus, and may include more or fewer components than those shown, or some of the components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an information processing program.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call up an information processing program stored in the memory 1005.
In the present embodiment, an information processing apparatus includes: the information processing method comprises a memory 1005, a processor 1001 and an information processing program which is stored on the memory 1005 and can run on the processor 1001, wherein when the processor 1001 calls the information processing program stored in the memory 1005, the steps of the information processing method in each embodiment are executed.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of the information processing method according to the present invention.
In this embodiment, the information processing method includes:
step S101, acquiring a keyword set corresponding to a title of information to be processed;
in this embodiment, a title of information to be processed is obtained first, then, a keyword set corresponding to the title is obtained, specifically, the title is subjected to word segmentation processing to obtain a word segmentation processing result, and the word segmentation processing result is subjected to stop word removal processing to obtain a keyword set Twords=[wt1,wt2,wt3,wt4,wt5...]. The information to be processed is information such as news articles and papers.
Step S102, determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
In this embodiment, when the keyword set is obtained, the relative position relationship of different keywords in the keyword set in the text of the information to be processed is obtained, specifically, each paragraph of the text is determined, paragraphs to which different keywords in the keyword set belong are determined, and the number of keywords of different keywords in the keyword set included in each paragraph is obtained. Then, a weight corresponding to each paragraph is set according to the number of each paragraph of the body, and the association degree between the title and the body is determined according to the weight corresponding to each paragraph and the number of keywords corresponding to each paragraph, for example, the association degree between the title and the body is D1W 1+ D2W 2+ … … Dn Wn, where Dn is the number of keywords n, Wn is the weight n, n ranges from 1 to k, and k is the number of paragraphs of the body.
In other embodiments, the paragraphs of the body may be merged, for example, the paragraphs of the body are divided into a first paragraph (first paragraph), a middle paragraph, and a last paragraph (last paragraph).
In the data processing method provided by the embodiment, a keyword set corresponding to a title of information to be processed is obtained; then determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; when the correlation degree corresponding to the relative position relation is larger than a preset threshold value, the correlation degree between the title and the text is high, the correlation degree between the title and the text is determined according to the position of each keyword in the title in the text, and the accuracy of the correlation degree between the title and the text is improved by considering the relative position relation of each keyword in the title information in the text.
A second embodiment of the data processing method of the present invention is proposed based on the first embodiment, and in this embodiment, step S102 includes:
step S201, determining paragraph information corresponding to the text based on the line break in the text, wherein the paragraph information comprises a head section, a middle section and a tail section;
step S202, determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in a keyword set;
step S203, determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword.
In this embodiment, a text is segmented based on a line break in the text, and paragraph information corresponding to the text is determined, where the paragraph information includes a first segment, a middle segment, and a last segment; specifically, when multiple paragraphs of the body are acquired, a leading paragraph (a first paragraph), a trailing paragraph (a last paragraph), and intermediate paragraphs (all paragraphs between the leading paragraph and the trailing paragraph) among the multiple paragraphs are taken as the paragraph information.
Then, determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information, and a third keyword matched with three paragraphs in the paragraph information, in particular, the paragraph information includes a first segment FP, a middle segment MP and a last segment EP, and for a keyword set TwordsIf Wti belongs to FP or Wti belongs to MP or Wti belongs to 0EP, determining that the keyword Wti is a first keyword, if Wti belongs to 1FP and Wti belongs to MP, or Wti belongs to MP and Wti belongs to EP, or Wti belongs to FP and Wti belongs to EP, determining that the keyword Wti is a second keyword, and if Wti belongs to FP, Wti belongs to MP and Wti belongs to MP, determining that the keyword Wti is a third keyword.
In this embodiment, after the first keyword, the second keyword, and the third keyword are obtained, the association degree between the title and the text is determined based on the first keyword, the second keyword, and the third keyword, and specifically, in an embodiment, the step S203 includes:
step S2031, acquiring a first weight corresponding to the first keyword, a second weight corresponding to the second keyword and a third weight corresponding to the third keyword;
step S2032, determining a degree of association between the title and the text based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords, and the number of the third keywords.
In this embodiment, weights corresponding to a keyword belonging to only one end, keywords belonging to both ends, and a keyword belonging to three sections are set in advance according to a first section, a middle section, and a last section of a text, and when a first keyword, a second keyword, and a third keyword are obtained, a first weight corresponding to the first keyword, a second weight corresponding to the second keyword, and a third weight corresponding to the third keyword are obtained.
Then, based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords, and the number of the third keywords, the degree of association between the title and the text is determined, specifically, the degree of association is the number of the first keywords + the number of the second keywords + the number of the third keywords + the third weight, for example, the first weight, the second weight, and the third weight are 1, 2, and 3, respectively, the number of the first keywords is 2, the number of the second keywords is 3, and the number of the third keywords is 1, and the degree of association is 1+ 2+ 3+ 1 is 11. And through the weights of the first keyword, the second keyword and the third keyword, the association degree can be accurately obtained.
In the information processing method provided in this embodiment, paragraph information corresponding to the text is determined based on the line break in the text, where the paragraph information includes a first segment, a middle segment, and a last segment; then determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in the keyword set; and then determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword. The relevancy can be accurately obtained according to the position of each keyword in each paragraph in the text, and the accuracy of the relevancy between the title and the text is further improved.
A third embodiment of the data processing method of the present invention is proposed based on the second embodiment, and in this embodiment, step S203 includes:
step S301, determining a first correlation degree between the title and the text based on the first keyword, the second keyword and the third keyword;
step S302, determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set;
step S303, determining a degree of association between the title and the text based on the first degree of association, the second degree of association, and the third degree of association.
In this embodiment, first, a first degree of correlation between the title and the text is determined based on the first keyword, the second keyword, and the third keyword.
And then, determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, specifically, firstly carrying out sentence segmentation on the text of the text to obtain the sentence information, and determining the second degree of correlation according to the position of each sentence in the sentence information where the keyword sequence in the keyword set appears.
Meanwhile, determining a third degree of relevance corresponding to the title based on the keyword set and the keyword set, specifically, determining the third degree of relevance according to the number of keywords in the keyword set included in the keyword set.
And finally, determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree, specifically, adding the first association degree, the second association degree and the third association degree to obtain a correlation sum, and calculating the association degree according to the correlation sum, so that the association degree between the title and the text is calculated according to the position of each keyword in the title in the text, and the accuracy of association calculation is improved.
It should be noted that after the association degree between the title and the text of the information to be processed is obtained, if the association degree is greater than a preset threshold, it is determined that the information to be processed is normal information (non-title party), and the accuracy of selecting the reference news is improved, where the preset threshold may be reasonably set, for example, the preset threshold is 0.8.
In the information processing method provided by this embodiment, a first degree of correlation between the title and the text is determined based on the first keyword, the second keyword, and the third keyword; then, determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set; and then determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree, so that the association degree between the title and the text is determined according to the relative position relation of each keyword in the title in the text and the position of each keyword in the sentence of the text, and the accuracy of the association degree between the title and the text is further improved.
A fourth embodiment of the information processing method of the present invention is proposed based on the third embodiment, and in this embodiment, step S302 includes:
step S401, in each sentence of the sentence information, determining a target sentence matched with the keywords of the keyword set;
step S402, determining a first target sentence matched with a plurality of keywords in a keyword set in a target sentence, and determining the first sentence number of sentences with adjacent keywords in the same sentence in the first target sentence;
step S403, determining adjacent second target sentences in the target sentences and determining the number of second sentences with adjacent keywords in the adjacent second target sentences based on the sequence of each sentence in the sentence information;
step S404, determining the second degree of correlation based on the first sentence number and the second sentence number.
In this embodiment, a text is subjected to sentence breaking operation based on a preset punctuation mark to obtain sentence information, where the preset punctuation mark includes'. ','? ',' |! ' equal table punctuation at the end of the sentence. In each sentence of the sentence information, a target sentence matching the keywords of the keyword set is determined, wherein the target sentence comprises the keywords of one or more keyword sets.
In this embodiment, when the target sentence is obtained, a first target sentence in the target sentence, which matches with the plurality of keywords in the keyword set, is determined, that is, each sentence in the first target sentence includes a plurality of different keywords in the keyword set, and then, a first sentence number of sentences in which adjacent keywords exist in the same sentence in the first target sentence is determined, that is, the sentences in which adjacent keywords exist at least include any two adjacent keywords in the keyword set, for example, for any sentence Si, if the sentence Si includes Wti and Wt (i +1) in the keyword set, the first sentence number is added by 1.
Next, based on the sequence of each sentence in the sentence information, determining a second adjacent target sentence in the target sentences, where the second target sentence includes one or more groups of adjacent sentences, and determining a second sentence number in which adjacent keywords exist in the second adjacent target sentences, for example, corresponding to any one group of adjacent second target sentences Si and S (i +1), where the second sentence number is increased by 1 if Wti e Si and Wt (i +1) e S (i + 1).
Then, the second degree of correlation is determined based on the first number of sentences and the second number of sentences, and specifically, the second degree of correlation is calculated according to the first number of sentences, the second number of sentences, and the corresponding weights, for example, if the weight corresponding to the first number of sentences and the weight corresponding to the second number of sentences are both 1, the first number of sentences is 5, and the second number of sentences is 8, then the second degree of correlation is 1+ 5+ 8-13. And then, according to the relative position relation of the keywords in the plurality of titles in the text, determining a second degree of correlation, and further improving the accuracy of the degree of correlation between the news titles and the news text.
In the information processing method provided by the embodiment, a target sentence matched with a keyword of a keyword set is determined in each sentence of sentence information; then, determining a first target sentence matched with a plurality of keywords of the keyword set in the target sentence, and determining the first sentence number of sentences of which the same sentence has adjacent keywords in the first target sentence; then, based on the sequence of each sentence in the sentence information, determining adjacent second target sentences in the target sentences, and determining the number of second sentences with adjacent keywords in the adjacent second target sentences; and then, determining the second degree of correlation based on the first sentence quantity and the second sentence quantity, accurately obtaining the second degree of correlation according to the relative position relation of the keywords in the text, further determining the degree of correlation between the titles and the text according to the relative character relation of the keywords of the titles in the text, and further improving the accuracy of the degree of correlation between the news titles and the news text.
A fifth embodiment of the information processing method of the present invention is proposed based on the third embodiment, and in this embodiment, step S302 includes:
step S501, obtaining synonyms corresponding to each keyword of the keyword set, and determining a set of words to be matched, wherein the sets of words to be matched comprise the synonyms and each keyword of the keyword set;
step S502, determining the number of matched keywords of target words to be matched, wherein each word to be matched in the word set to be matched belongs to the text keywords;
step S503, determining the third degree of correlation based on the number of the matched keywords.
In this embodiment, since the text may have synonyms of the keywords, the synonym expansion needs to be performed on the keyword set, that is, synonyms corresponding to the keywords of the keyword set are obtained, and the to-be-matched word set including the synonyms and the keywords of the keyword set is determined, for example, for a keyword "occurrence", the obtaining of the "occurrence" near-synonym set based on the current "synonym thesaurus" includes: [ find, occur, burst, generate ], then "occur" and [ find, occur, burst, generate ] are added to the set of words to be matched.
And then, determining the number of matched keywords of target words to be matched, wherein each word to be matched of the word set to be matched belongs to the text keywords, and the text keywords comprise the target words to be matched.
Next, the third relevance is determined based on the number of matching keywords, specifically, a relevance weight of the matching keywords is set, and the third relevance is, for example, 3, 1, or 3 if the relevance weight of the matching keywords is 1 and the number of matching keywords is 3.
In the information processing method provided by this embodiment, a synonym corresponding to each keyword of the keyword set is obtained, and a to-be-matched word set including the synonym and each keyword of the keyword set is determined; then determining the number of matched keywords of target words to be matched, wherein each word to be matched in the word set to be matched belongs to the text keywords; and then determining the third degree of correlation based on the number of the matched keywords, and determining the third degree of correlation by matching the keyword set with the text keywords, thereby further improving the accuracy of the degree of correlation between the title and the text.
On the basis of the above-described respective embodiments, a sixth embodiment of the information processing method of the present invention is proposed, in which step S403 includes:
step S601, obtaining the correlation sum of the first correlation, the second correlation and the third correlation;
step S602, determining a correlation threshold value based on the number of keywords in the keyword set;
step S603, determining the degree of association based on the correlation sum and a correlation threshold.
In this embodiment, after the first correlation degree, the second correlation degree, and the third correlation degree are obtained, a correlation sum of the first correlation degree, the second correlation degree, and the third correlation degree is obtained, that is, the first correlation degree, the second correlation degree, and the third correlation degree are added to obtain a correlation sum.
Then, a correlation threshold is determined based on the number of keywords in the keyword set, the first correlation, the second correlation, and the third correlation, specifically, a first correlation threshold is determined based on the number N of keywords and the first correlation, a second correlation threshold is determined based on the number N of keywords and the second correlation, and a third correlation threshold is determined based on the number N of keywords and the third correlation, for example, the first correlation threshold is not greater than N (sum of correlation weights of the first keyword, the second keyword, and the third keyword)/3, the second correlation threshold is not greater than 1N, and the correlation threshold is first correlation threshold + second correlation threshold + third correlation threshold.
Finally, the relevance is determined based on the relevance sum and a relevance threshold, the relevance being a relevance and/or a relevance threshold.
In the information processing method provided by this embodiment, the sum of the correlation degrees of the first correlation degree, the second correlation degree, and the third correlation degree is obtained; then, determining a correlation threshold value based on the number of the keywords in the keyword set; and then, the relevancy is determined based on the relevancy sum and the relevancy threshold, and the relevancy can be accurately obtained according to the first relevancy, the second relevancy and the third relevancy, so that the accuracy of the relevancy between the news title and the news text is further improved.
The present invention also provides an information processing apparatus, referring to fig. 3, comprising:
an obtaining module 10, configured to obtain a keyword set corresponding to a title of information to be processed;
a determining module 20, configured to determine, based on a relative position relationship between different keywords in the keyword set and the text of the information to be processed, a degree of association between the title and the text; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
The methods executed by the program units can refer to various embodiments of the information processing method of the present invention, and are not described herein again.
The invention also provides a computer readable storage medium.
The computer-readable storage medium of the present invention stores thereon an information processing program that realizes the steps of the information processing method described above when executed by a processor.
The method implemented when the information processing program running on the processor is executed may refer to each embodiment of the information processing method of the present invention, and details are not described here.
Furthermore, an embodiment of the present invention further provides a computer program product, which includes an information processing program, and when the information processing program is executed by a processor, the information processing program implements the steps of the information processing method described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. An information processing method, characterized in that the method comprises:
acquiring a keyword set corresponding to a title of information to be processed;
determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
2. The information processing method according to claim 1, wherein the step of determining the association degree between the title and the body of the information to be processed based on the relative position relationship of different keywords in the keyword set in the body of the information to be processed comprises:
determining paragraph information corresponding to the text based on the line break in the text, wherein the paragraph information comprises a first segment, a middle segment and a tail segment;
determining a first keyword matched with one paragraph in the paragraph information, a second keyword matched with two paragraphs in the paragraph information and a third keyword matched with three paragraphs in the paragraph information in each keyword in a keyword set;
and determining the association degree between the title and the text based on the first keyword, the second keyword and the third keyword.
3. The information processing method according to claim 2, wherein the step of determining the degree of association between the title and the body text based on the first keyword, the second keyword, and the third keyword comprises:
acquiring a first weight corresponding to a first keyword, a second weight corresponding to a second keyword and a third weight corresponding to a third keyword;
and determining the association degree between the title and the text based on the first weight, the second weight, the third weight, the number of the first keywords, the number of the second keywords and the number of the third keywords.
4. The information processing method according to claim 2, wherein the step of determining the degree of association between the title and the body text based on the first keyword, the second keyword, and the third keyword comprises:
determining a first degree of correlation between the title and the text based on the first keyword, the second keyword and the third keyword;
determining a second degree of correlation between the title and the text based on the sentence information corresponding to the text and the keyword set, and determining a third degree of correlation between the title and the text based on the text keyword corresponding to the text and the keyword set;
and determining the association degree between the title and the text based on the first association degree, the second association degree and the third association degree.
5. The information processing method according to claim 4, wherein the step of determining the second degree of correlation between the title and the body based on the sentence information corresponding to the body and the keyword set comprises:
determining a target sentence matched with the keywords of the keyword set in each sentence of the sentence information;
determining a first target sentence matched with a plurality of keywords of a keyword set in a target sentence, and determining the first sentence number of sentences of which the same sentence has adjacent keywords in the first target sentence;
determining adjacent second target sentences in the target sentences and determining the number of second sentences with adjacent keywords in the adjacent second target sentences based on the sequence of each sentence in the sentence information;
determining the second degree of correlation based on the first sentence quantity and the second sentence quantity.
6. The information processing method according to claim 4, wherein the step of determining a third degree of correlation between the title and the body based on the body keyword corresponding to the body and the keyword set comprises:
obtaining synonyms corresponding to all the keywords in the keyword set, and determining a set of words to be matched, which comprises the synonyms and all the keywords in the keyword set;
determining the number of matched keywords of target words to be matched, wherein each word to be matched of the word set to be matched belongs to the text keywords;
and determining the third correlation degree based on the number of the matched keywords.
7. The information processing method according to any one of claims 4 to 6, wherein the step of determining the degree of association between the title and the body text based on the first degree of association, the second degree of association, and the third degree of association includes:
acquiring the correlation sum of the first correlation, the second correlation and the third correlation;
determining a relevancy threshold based on the number of keywords in the keyword set;
determining the degree of association based on the sum of degrees of correlation and a threshold of degrees of correlation.
8. An information processing apparatus characterized by comprising:
the acquisition module is used for acquiring a keyword set corresponding to the title of the information to be processed;
the determining module is used for determining the association degree between the title and the text based on the relative position relation of different keywords in the keyword set in the text of the information to be processed; and when the association degree corresponding to the relative position relation is greater than a preset threshold value, the association degree between the title and the text is high.
9. An information processing apparatus characterized by comprising: memory, processor and information processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the information processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that an information processing program is stored thereon, which when executed by a processor implements the steps of the information processing method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111487593.5A CN114282092A (en) | 2021-12-07 | 2021-12-07 | Information processing method, device, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111487593.5A CN114282092A (en) | 2021-12-07 | 2021-12-07 | Information processing method, device, equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114282092A true CN114282092A (en) | 2022-04-05 |
Family
ID=80871194
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111487593.5A Pending CN114282092A (en) | 2021-12-07 | 2021-12-07 | Information processing method, device, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114282092A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115409035A (en) * | 2022-06-02 | 2022-11-29 | 北京金堤科技有限公司 | Conversation information acquisition method and device, storage medium and electronic equipment |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011221894A (en) * | 2010-04-13 | 2011-11-04 | Hitachi Ltd | Secure document detection method, secure document detection program, and optical character reader |
CN103617213A (en) * | 2013-11-19 | 2014-03-05 | 北京奇虎科技有限公司 | Method and system for identifying newspage attributive characters |
CN106033445A (en) * | 2015-03-16 | 2016-10-19 | 北京国双科技有限公司 | Method and device for obtaining article association degree data |
CN106202150A (en) * | 2016-06-22 | 2016-12-07 | 北京小米移动软件有限公司 | Method for information display and device |
CN107357781A (en) * | 2017-06-29 | 2017-11-17 | 胡玥莹 | For differentiating the system and method for web page title and the text degree of association |
CN109614625A (en) * | 2018-12-17 | 2019-04-12 | 北京百度网讯科技有限公司 | Determination method, apparatus, equipment and the storage medium of the title text degree of correlation |
WO2021051599A1 (en) * | 2019-09-19 | 2021-03-25 | 平安科技(深圳)有限公司 | Method and apparatus for extracting locally optimized keywords, device and storage medium |
US20210174024A1 (en) * | 2018-12-07 | 2021-06-10 | Tencent Technology (Shenzhen) Company Limited | Method for training keyword extraction model, keyword extraction method, and computer device |
-
2021
- 2021-12-07 CN CN202111487593.5A patent/CN114282092A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011221894A (en) * | 2010-04-13 | 2011-11-04 | Hitachi Ltd | Secure document detection method, secure document detection program, and optical character reader |
CN103617213A (en) * | 2013-11-19 | 2014-03-05 | 北京奇虎科技有限公司 | Method and system for identifying newspage attributive characters |
CN106033445A (en) * | 2015-03-16 | 2016-10-19 | 北京国双科技有限公司 | Method and device for obtaining article association degree data |
CN106202150A (en) * | 2016-06-22 | 2016-12-07 | 北京小米移动软件有限公司 | Method for information display and device |
CN107357781A (en) * | 2017-06-29 | 2017-11-17 | 胡玥莹 | For differentiating the system and method for web page title and the text degree of association |
US20210174024A1 (en) * | 2018-12-07 | 2021-06-10 | Tencent Technology (Shenzhen) Company Limited | Method for training keyword extraction model, keyword extraction method, and computer device |
CN109614625A (en) * | 2018-12-17 | 2019-04-12 | 北京百度网讯科技有限公司 | Determination method, apparatus, equipment and the storage medium of the title text degree of correlation |
WO2021051599A1 (en) * | 2019-09-19 | 2021-03-25 | 平安科技(深圳)有限公司 | Method and apparatus for extracting locally optimized keywords, device and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115409035A (en) * | 2022-06-02 | 2022-11-29 | 北京金堤科技有限公司 | Conversation information acquisition method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5116775B2 (en) | Information retrieval method and apparatus, program, and computer-readable recording medium | |
US20160275148A1 (en) | Database query method and device | |
US10002188B2 (en) | Automatic prioritization of natural language text information | |
US20120330955A1 (en) | Document similarity calculation device | |
EP3540613A1 (en) | Matching device, matching method, and program | |
CN107885717B (en) | Keyword extraction method and device | |
CN107085568B (en) | Text similarity distinguishing method and device | |
CN111460131A (en) | Method, device and equipment for extracting official document abstract and computer readable storage medium | |
CN112559672B (en) | Information detection method, electronic device and computer storage medium | |
CN109710834B (en) | Similar webpage detection method and device, storage medium and electronic equipment | |
CN104281275B (en) | The input method of a kind of English and device | |
US20180137098A1 (en) | Methods and systems for providing universal portability in machine learning | |
CN104636415A (en) | Method of extracting important keyword and server performing the same | |
CN112214576B (en) | Public opinion analysis method, public opinion analysis device, terminal equipment and computer readable storage medium | |
CN102955773B (en) | For identifying the method and system of chemical name in Chinese document | |
JPWO2011129198A1 (en) | Inconsistency detection system, method, and program | |
CN114282092A (en) | Information processing method, device, equipment and computer readable storage medium | |
WO2018213783A1 (en) | Computerized methods of data compression and analysis | |
CN109446417B (en) | Intelligent retrieval method and device | |
JP7172187B2 (en) | INFORMATION DISPLAY METHOD, INFORMATION DISPLAY PROGRAM AND INFORMATION DISPLAY DEVICE | |
CN112749258A (en) | Data searching method and device, electronic equipment and storage medium | |
CN112527954A (en) | Unstructured data full-text search method and system and computer equipment | |
JP5179564B2 (en) | Query segment position determination device | |
CN109918661B (en) | Synonym acquisition method and device | |
CN115952276A (en) | Document retrieval method, system, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |