CN114661868A - Article key information tracing method, system, readable medium and device - Google Patents

Article key information tracing method, system, readable medium and device Download PDF

Info

Publication number
CN114661868A
CN114661868A CN202210338283.5A CN202210338283A CN114661868A CN 114661868 A CN114661868 A CN 114661868A CN 202210338283 A CN202210338283 A CN 202210338283A CN 114661868 A CN114661868 A CN 114661868A
Authority
CN
China
Prior art keywords
paragraph
source signal
paragraphs
current
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210338283.5A
Other languages
Chinese (zh)
Other versions
CN114661868B (en
Inventor
李根柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Siyuan Zhitong Technology Co ltd
Original Assignee
Beijing Siyuan Zhitong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Siyuan Zhitong Technology Co ltd filed Critical Beijing Siyuan Zhitong Technology Co ltd
Priority to CN202210338283.5A priority Critical patent/CN114661868B/en
Publication of CN114661868A publication Critical patent/CN114661868A/en
Application granted granted Critical
Publication of CN114661868B publication Critical patent/CN114661868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an article key information tracing method, system, readable medium and device. The method comprises the steps of setting an online acquisition mode, extracting all paragraphs, sentences and keywords, extracting a current paragraph, extracting a word frequency of each keyword in the paragraph, setting a logic type number, obtaining all current paragraphs and each corresponding sentence, extracting paragraph keyword numbers according to the logic type numbers, obtaining the association degree among all the paragraphs, judging whether the current total score state is a target key paragraph, obtaining all source data after receiving all the paragraph keyword numbers, sending out an independent source signal, a common source signal and a cooperative source signal, and automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal. The scheme realizes the online multi-dimensional traceability and display of the key information of the article in real time and high efficiency through online logic analysis and an information traceability algorithm.

Description

Article key information tracing method, system, readable medium and device
Technical Field
The invention relates to the technical field of document analysis, in particular to an article key information tracing method, system, readable medium and device.
Background
The article key information tracing refers to the information tracing problem under the condition that a source node for information propagation is a single source node. On a social network, the network itself is abstracted into a graph structure, with information propagating between points along edges. In some cases, since only the states of some nodes are observed or only a sub-graph formed after the propagation result is observed, it cannot be directly determined from which node the information propagation starts, and therefore information tracing needs to be performed.
Before the technology of the invention, most of the existing document analysis modes in the prior art mainly query the corresponding repetition degrees according to the known network and some existing websites, online tracing can be realized rarely and corresponding different-dimension sources can not be displayed according to key information and logic relations.
Disclosure of Invention
In view of the above problems, the invention provides a method, a system, a readable medium and a device for tracing the source of the key information of an article, which realize real-time and efficient online multi-dimensional tracing and display of the key information of the article by online logic analysis and an information tracing algorithm.
According to a first aspect of the embodiments of the present invention, a method for tracing the source of article key information is provided.
In one or more embodiments, preferably, the article key information tracing method includes:
setting an online acquisition mode, and extracting all paragraphs, sentences and keywords;
extracting a current paragraph, extracting the word frequency of each keyword in the paragraph, and setting a logic type number;
obtaining all current paragraphs and each corresponding sentence, and extracting the keyword number of the paragraph by combining the logic type number;
acquiring the association degree among all the sections, and judging whether the current total score state is a target key section or not;
after receiving all the paragraph keyword numbers, acquiring all the source data, and sending an independent source signal, a common source signal and a cooperative source signal;
and acquiring the target key paragraph, and automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal.
In one or more embodiments, preferably, the setting of the online collection mode extracts all paragraphs, sentences and keywords, and specifically includes:
setting the current acquisition mode, and performing online search according to the acquisition mode;
after online searching, obtaining all documents corresponding to the preset key information, and storing the documents as txt format documents;
carrying out paragraph splitting on the txt format document and keeping the document as a split paragraph;
extracting keywords from the split paragraphs to obtain all keywords;
and sentence extraction is carried out on the split paragraphs to obtain all sentences.
In one or more embodiments, preferably, the extracting a current paragraph, extracting a word frequency of each keyword in the paragraph, and setting a logic type number specifically includes:
extracting a current paragraph, extracting the word frequency of each keyword in the paragraph, and keeping the word frequency as the frequency of the keyword;
sorting the frequency of the keywords, wherein the keywords ranked 10% top serve as target keywords;
sequencing all paragraphs, wherein when the number of the target keywords in the first sequenced paragraph exceeds 8 and the number of the target keywords in the paragraphs except the first sequenced paragraph does not exceed 8, the current logic type number is 1;
when the number of the target keywords in the first and last paragraph is more than 8 and the number of the target keywords in the paragraphs except the first and last paragraph is not more than 8, the current logic type number is 2;
when the number of the target keywords in the paragraph which is sequenced last exceeds 8, and the number of the target keywords in the paragraphs which are not sequenced last does not exceed 8, the current logic type number is 3;
when the number of the target keywords in the first and last paragraph in the sequence exceeds 8, and the number of the target keywords in the paragraphs other than the first and last paragraph in the sequence also exceeds 8, the current logical type number is 4;
when the number of the target keywords in the first and last paragraphs is not more than 8, the current logic type number is 5; and extracting the current logic type number for storage.
In one or more embodiments, preferably, the obtaining of all current paragraphs and each corresponding sentence, and extracting the paragraph keyword number by combining the logic type number specifically include:
acquiring all current paragraphs, and marking the total number of marks of each sentence;
calculating the degree of correlation between segments using a third calculation formula;
judging the current logic type number according to the association degree;
calculating a weight coefficient of each paragraph by using a second calculation formula according to the logic type number;
extracting the frequency of the keywords, and calculating the paragraph keyword number by using a first calculation formula;
the first calculation formula is:
D=Max(W i *Fi d )
wherein the content of the first and second substances,Dthe number of the paragraph key words is given,Fi d is as followsiParagraph number one in paragraphdThe frequency of each of the said key words,Max() In order to extract the function of the keyword number corresponding to the maximum value,W i is as followsiA paragraph weight coefficient;
the second calculation formula is:
Figure 155834DEST_PATH_IMAGE001
wherein the content of the first and second substances,zthe logic type number is 1, the logic type number corresponds to total division logic, the logic type number is 2, the logic type number is 3, the logic type number corresponds to total division logic, the logic type number is 4, the logic type number corresponds to progressive logic, the logic type number is 5, the logic type number corresponds to equally dividing logic,nis the total number of paragraphs;
the third calculation formula is:
Figure 807396DEST_PATH_IMAGE002
wherein the content of the first and second substances,G a b,is as followsaSegment and the firstbThe degree of association of the segments is such that,Nis as followsaSegment and the firstbThe total number of sentences of the segment,n 1is as followsaThe total number of sentences of the segment,n 2is as followsbThe total number of sentences of the segment,x ai andx aj respectively to firstaThe first in the sectioniAnd a firstjThe total number of tokens of a sentence,x bi andx bj respectively tobIn a section ofiAnd a firstjAnd the total number of marks of the sentence is obtained by looking up a vocabulary table, the total number of marks is the sum of all marks in the corresponding sentence, and the vocabulary table comprises the semantic scores of all words.
In one or more embodiments, preferably, the obtaining the degree of association between all the segments and determining whether the current summary score state is a target key segment specifically includes:
acquiring the association degree among all the segments;
judging whether the current state is a total score state, and if the current state is the total score state, sending a key total segment judgment command;
after receiving the key block judging command, judging whether the fourth calculation formula judgment is met, and if the fourth calculation formula judgment is met, judging the current second calculation formulaaSegment markers are the target key segments;
the fourth calculation formula is:
Min(G a j,)+0.8[Max(G a j,)- Min(G a j,)]>1, 1<j<S,ja
wherein the content of the first and second substances,G a j,is as followsaSegment and the firstjThe degree of association of segments.
In one or more embodiments, preferably, after receiving all the paragraph keyword numbers, the obtaining all the source data and issuing an independent source signal, a common source signal, and a co-source signal specifically includes:
after receiving all paragraph keyword numbers, acquiring all the source data, and marking the block number of the source data;
judging whether all paragraphs corresponding to the paragraph key word numbers meet a fifth calculation formula, if so, sending the independent source signal, and if not, continuing to judge;
judging whether all paragraphs corresponding to the paragraph keyword numbers meet a sixth calculation formula, if so, sending a common source signal, and if not, sending a cooperative judgment command;
after receiving the collaborative judgment command, judging whether all paragraphs corresponding to the paragraph key word numbers meet a collaborative source set of a seventh computational formula, and sending a collaborative source signal;
the fifth calculation formula is:
KI/ALL≥0.9
wherein the content of the first and second substances,KIthe block number of the source head isIThe number of the keywords of (a),ALLthe total number of the keywords;
the sixth calculation formula is:
Figure 917959DEST_PATH_IMAGE003
wherein the content of the first and second substances,K i1andK i2are respectively the source header block numberi1 andi2 the number of corresponding keywords;
the seventh calculation formula is:
Figure 47589DEST_PATH_IMAGE004
wherein the content of the first and second substances,minfor the set of co-source headers,K is are respectively the source header block numberisThe number of corresponding keywords.
In one or more embodiments, preferably, the acquiring the target key paragraph automatically displays and downloads source information according to the independent source signal, the common source signal and the cooperative source signal, and specifically includes:
obtaining the independent source signal, generating a unique source link, and directly downloading and storing the corresponding link in a preset storage space;
acquiring a common source signal, generating a chain download link, and displaying the link only as a hyperlink on the left side of a corresponding paragraph;
acquiring the cooperative source signal, generating a download link set according to the cooperative source signal, sequencing the download link set from large to small according to the correlation degree, and storing the download link set in a preset storage space in a TXT format;
and acquiring the target key paragraph, and storing all the associated information of the target key paragraph in the storage space on line.
According to a second aspect of the embodiments of the present invention, a traceability system of key information of an article is provided.
In one or more embodiments, preferably, the article key information traceability system includes:
the document acquisition module is used for setting an online acquisition mode and extracting all paragraphs, sentences and keywords;
the segmentation logic module is used for extracting the current paragraph, extracting the word frequency of each keyword in the paragraph and setting a logic type number;
the key extraction module is used for obtaining all current paragraphs and each corresponding sentence and extracting the keyword number of the paragraph by combining the logic type number;
the total score checking module is used for acquiring the association degree among all the sections and judging whether the current total score state is a target key section or not;
the source extraction module is used for acquiring all source data after receiving all the paragraph keyword numbers and sending an independent source signal, a common source signal and a cooperative source signal;
and the source retrieval module is used for acquiring the target key paragraphs, and automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal.
According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method according to any one of the first aspect of embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided an electronic device, comprising a memory and a processor, the memory being configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any one of the first aspect of embodiments of the present invention.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
in the embodiment of the invention, the online paragraph key word number is extracted by combining the incidence relation of the paragraphs according to the corresponding document logic relation, so that the online efficient tracing according to the paragraph key word number is realized.
According to the embodiment of the invention, the single, common and cooperative dimension setting is combined according to the paragraph keyword number, so that the high-efficiency multi-dimension tracing and displaying of the corresponding article are realized.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an article key information tracing method according to an embodiment of the present invention.
Fig. 2 is a flowchart of extracting all paragraphs, sentences and keywords by setting an online collection manner in the article key information tracing method according to an embodiment of the present invention.
Fig. 3 is a flowchart of extracting a current paragraph, extracting a word frequency of each keyword in the paragraph, and setting a logic type number in the article key information tracing method according to an embodiment of the present invention.
Fig. 4 is a flowchart of obtaining all current paragraphs and each corresponding sentence in an article key information tracing method according to an embodiment of the present invention, and extracting the paragraph keyword number by combining the logic type number.
Fig. 5 is a flowchart of obtaining the association degree between all paragraphs in the article key information tracing method according to an embodiment of the present invention, and determining whether the current summary score state is a target key paragraph.
Fig. 6 is a flowchart of acquiring all source data and issuing an independent source signal, a common source signal and a cooperative source signal after receiving all the paragraph keyword numbers in the article key information tracing method according to an embodiment of the present invention.
Fig. 7 is a flowchart of automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal by acquiring the target key paragraph in the article key information tracing method according to an embodiment of the present invention.
Fig. 8 is a structural diagram of an article key information traceability system according to an embodiment of the present invention.
Fig. 9 is a block diagram of an electronic device in one embodiment of the invention.
Detailed Description
In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The article key information tracing refers to the information tracing problem under the condition that a source node for information propagation is a single source node. On a social network, the network itself is abstracted into a graph structure, with information propagating between points along edges. In some cases, since only the states of some nodes are observed or only a subgraph formed after the propagation result occurs is observed, it cannot be directly determined from which node the information propagation starts, and therefore information tracing needs to be performed.
Before the technology of the invention, most of the existing document analysis modes in the prior art mainly carry out corresponding repetition degree query according to the known network and some existing websites, online tracing can be hardly realized really, and corresponding different-dimension sources can not be displayed according to key information and logic relations.
The embodiment of the invention provides an article key information tracing method, a system, a readable medium and equipment. The scheme realizes the online multi-dimensional traceability and display of the key information of the article in real time and high efficiency through online logic analysis and an information traceability algorithm.
According to a first aspect of the embodiments of the present invention, a method for tracing the source of key information of an article is provided.
Fig. 1 is a flowchart of an article key information tracing method according to an embodiment of the present invention.
In one or more embodiments, preferably, the article key information tracing method includes:
s101, setting an online acquisition mode, and extracting all paragraphs, sentences and keywords;
s102, extracting a current paragraph, extracting word frequency of each keyword in the paragraph, and setting a logic type number;
s103, obtaining all current paragraphs and each corresponding sentence, and extracting the keyword number of the paragraph by combining the logic type number;
s104, acquiring the association degree among all the sections, and judging whether the current total score state is a target key section or not;
s105, after receiving all the paragraph keyword numbers, acquiring all source data, and sending an independent source signal, a common source signal and a cooperative source signal;
s106, acquiring the target key paragraph, and automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal.
In the embodiment of the invention, on one hand, the online setting of how to perform logic division of data in the document is solved through the extraction of document collection and classification logics, on the basis of the second time, key extraction and total score verification are continued, and finally, retrieval and display are performed according to the extraction result of the source, so that the problem of how to perform final automatic tracing and display according to the logical relationship and the characteristics of corresponding source data is solved.
Fig. 2 is a flowchart of extracting all paragraphs, sentences and keywords by setting an online collection manner in the article key information tracing method according to an embodiment of the present invention.
As shown in fig. 2, in one or more embodiments, preferably, the setting an online collection manner to extract all paragraphs, sentences, and keywords specifically includes:
s201, setting the current acquisition mode, and performing online search according to the acquisition mode;
s202, obtaining all documents corresponding to the preset key information after online searching, and storing the documents as txt format documents;
s203, carrying out paragraph splitting on the txt format document, and keeping the document as a split paragraph;
s204, extracting keywords from the split paragraphs to obtain all keywords;
s205, sentence extraction is carried out on the split paragraphs, and all sentences are obtained.
In the embodiment of the invention, firstly, the document is collected on line, the collected data is a data basis for carrying out subsequent data analysis, the data is divided and extracted into three types of information, namely segments, sentences and keywords, and then corresponding source tracing extraction is carried out in sequence; the online acquisition mode comprises real-time acquisition, interval fixed period acquisition and offline recording.
Fig. 3 is a flowchart of extracting a current paragraph, extracting a word frequency of each keyword in the paragraph, and setting a logic type number in the article key information tracing method according to an embodiment of the present invention.
As shown in fig. 3, in one or more embodiments, preferably, the extracting a current paragraph, extracting a word frequency of each keyword in the paragraph, and setting a logic type number specifically includes:
s301, extracting a current paragraph, extracting the word frequency of each keyword in the paragraph, and keeping the word frequency as the frequency of the keywords;
s302, sequencing the frequency of the keywords, wherein the keywords with the top 10% of ranks serve as target keywords;
s303, sequencing all paragraphs, wherein when the number of the target keywords in the first sequenced paragraph exceeds 8 and the number of the target keywords in the paragraphs except the first sequenced paragraph does not exceed 8, the current logic type number is 1;
s304, when the number of the target keywords in the first and last paragraph is more than 8 and the number of the target keywords in the paragraphs except the first and last paragraph is not more than 8, the current logic type number is 2;
s305, when the number of the target keywords in the paragraph which is sequenced last exceeds 8 and the number of the target keywords in the paragraphs which are not sequenced last does not exceed 8, the current logic type number is 3;
s306, when the number of the target keywords in the first and last sequenced paragraphs exceeds 8 and the number of the target keywords in paragraphs other than the first and last sequenced paragraphs also exceeds 8, the current logic type number is 4;
s307, when the number of the target keywords in the first and last paragraphs is not more than 8, the current logic type number is 5;
and S308, extracting the current logic type number for storage.
In the embodiment of the invention, a way of extracting the logic type number of the document is provided, the target key words are further extracted on the basis of obtaining the corresponding paragraphs, sentences and key words, the logic type is divided by using the number of the target key words in each paragraph, and the logic type number is extracted after the division for subsequent analysis.
Fig. 4 is a flowchart of obtaining all current paragraphs and each corresponding sentence in an article key information tracing method according to an embodiment of the present invention, and extracting the paragraph keyword number by combining the logic type number.
As shown in fig. 4, in one or more embodiments, preferably, the obtaining all current paragraphs and each corresponding sentence, and extracting the paragraph keyword number by combining the logic type number specifically includes:
s401, obtaining all current paragraphs and marking the total number of marks of each sentence;
s402, calculating the association degree between the segments by using a third calculation formula;
s403, judging the current logic type number according to the association degree;
s404, calculating a weight coefficient of each paragraph by using a second calculation formula according to the logic type number;
s405, extracting the frequency of the keywords, and calculating the paragraph keyword number by using a first calculation formula;
the first calculation formula is:
D=Max(W i Fi d )
wherein the content of the first and second substances,Dthe number of the paragraph key words is given,Fi d is as followsiParagraph number one in paragraphdThe frequency of each of the said key words,Max() In order to extract the function of the keyword number corresponding to the maximum value,W i is as followsiA paragraph weight coefficient;
the second calculation formula is:
Figure 455436DEST_PATH_IMAGE001
wherein the content of the first and second substances,zthe logic type number is 1, the logic type number corresponds to total division logic, the logic type number is 2, the logic type number is 3, the logic type number corresponds to total division logic, the logic type number is 4, the logic type number corresponds to progressive logic, the logic type number is 5, the logic type number corresponds to equally dividing logic,nis the total number of paragraphs;
the third calculation formula is:
Figure 12319DEST_PATH_IMAGE002
wherein the content of the first and second substances,G a b,is as followsaSegment and the firstbThe degree of association of the segments is such that,Nis as followsaSegment and the firstbThe total number of sentences of the segment,n 1is as followsaThe total number of sentences of the segment,n 2is as followsbThe total number of sentences of the segment,x ai andx aj respectively toaIn a section ofiAnd a firstjThe total number of tokens of a sentence,x bi andx bj respectively tobThe first in the sectioniAnd a firstjAnd the total number of marks of the sentence is obtained by looking up a vocabulary table, the total number of marks is the sum of all marks in the corresponding sentence, and the vocabulary table comprises the semantic scores of all words.
In the embodiment of the invention, in order to extract the keyword numbers of paragraphs of different articles on line according to different logical relations, the paragraphs corresponding to the keyword numbers have different contribution degrees of the word frequency of the keyword to the whole paragraph under the condition of considering different logical relations on one hand, and on the other hand, the differentiated key degrees of different areas can be formed on line.
Fig. 5 is a flowchart of obtaining the association degree between all paragraphs in the article key information tracing method according to an embodiment of the present invention, and determining whether the current summary score state is a target key paragraph.
As shown in fig. 5, in one or more embodiments, preferably, the obtaining the degree of association between all the paragraphs and determining whether the current summary score state is a target key paragraph specifically includes:
s501, acquiring the association degree among all the segments;
s502, judging whether the current state is a total score state or not, and if the current state is the total score state, sending a key total section judgment command;
s503, after receiving the key total paragraph judgment command, judging whether the judgment of a fourth calculation formula is met, and if so, marking the current segment a as the target key paragraph;
the fourth calculation formula is:
Min(G a j,)+0.8[Max(G a j,)- Min(G a j,)]>1, 1<j<S,ja
wherein the content of the first and second substances,G a j,is as followsaSegment and firstjThe degree of association of segments.
In the embodiment of the invention, the online key total segment judgment is carried out on the correlation degree among all the segments, and when the judgment is successful, a target key segment is formed.
Fig. 6 is a flowchart of acquiring all source data and issuing an independent source signal, a common source signal and a cooperative source after receiving all the paragraph keyword numbers in the article key information tracing method according to an embodiment of the present invention.
As shown in fig. 6, in one or more embodiments, preferably, after receiving all the keyword numbers of the paragraphs, the obtaining all the source data, and sending an independent source signal, a common source signal, and a co-source specifically includes:
s601, after receiving all paragraph keyword numbers, acquiring all source data, and marking the block numbers of the source data;
s602, judging whether all paragraphs corresponding to the paragraph keyword numbers meet a fifth calculation formula, if so, sending the independent source signal, and if not, continuing to judge;
s603, judging whether all paragraphs corresponding to the paragraph key word numbers meet a sixth calculation formula, if so, sending a common source signal, and if not, sending a cooperative judgment command;
s604, after receiving the collaboration judgment command, judging whether paragraphs corresponding to all the paragraph key word numbers meet a collaboration source set of a seventh calculation formula, and sending a collaboration source signal;
the fifth calculation formula is:
KI/ALL≥0.9
wherein, the first and the second end of the pipe are connected with each other,KIthe block number of the source head isIThe number of the keywords of (a),ALLthe total number of the keywords;
the sixth calculation formula is:
Figure 13774DEST_PATH_IMAGE003
wherein the content of the first and second substances,K i1andK i2are respectively the source header block numberi1 andi2 the number of corresponding keywords;
the seventh calculation formula is:
Figure 806149DEST_PATH_IMAGE004
wherein, the first and the second end of the pipe are connected with each other,minfor the set of co-source heads to be said,K is are respectively the source header block numberisThe number of corresponding keywords.
In the embodiment of the invention, aiming at the problem that the real-time tracing to the cooperative source, the independent source and the common source cannot be realized in the prior art, the fifth calculation formula, the sixth calculation formula and the seventh calculation formula are combined to perform online calculation to generate different tracing positions, and the different tracing positions are transmitted to a module or a port for signal issuing through a command to realize the tracing of online paragraph data, and the minimum source combination mode can be obtained when a plurality of sources appear.
Fig. 7 is a flowchart of automatically performing source information according to the independent source signal, the common source signal, and the cooperative source signal in the article key information tracing method according to an embodiment of the present invention, where the target key paragraph is acquired.
As shown in fig. 7, in one or more embodiments, preferably, the acquiring the target key paragraph automatically performs source information according to the independent source signal, the common source signal, and the co-source signal, which specifically includes:
s701, obtaining the independent source signal, generating a unique source link, and directly downloading and storing the corresponding link in a preset storage space;
s702, acquiring a common source signal, generating a chain download link, and displaying the link on the left side of a corresponding paragraph only as a hyperlink;
s703, acquiring the collaboration source signal, generating a download link set according to the collaboration source signal, sequencing the download link set from large to small according to the degree of correlation, and storing the download link set in a preset storage space in a TXT format;
s704, obtaining the target key paragraphs, and storing all the associated information of the target key paragraphs in the storage space on line.
In the embodiment of the invention, the source information is searched in three levels, firstly, the independent source is searched, the searched result is automatically displayed in a hierarchical manner after the search, the independent source information is directly downloaded, the link is marked in the interface, and for the source needing to be cooperatively obtained, a TXT file is generated according to the importance degree and is stored online, and the TXT file is not directly marked and displayed in the file.
According to a second aspect of the embodiments of the present invention, a traceability system of key information of an article is provided.
Fig. 8 is a structural diagram of an article key information traceability system according to an embodiment of the present invention.
In one or more embodiments, preferably, the article key information traceability system includes:
a document collection module 801, configured to set an online collection mode and extract all paragraphs, sentences, and keywords;
a segmentation logic module 802, configured to extract a current paragraph, extract a word frequency of each keyword in the paragraph, and set a logic type number;
a key extraction module 803, configured to obtain all current paragraphs and each corresponding sentence, and extract the paragraph keyword number by combining the logic type number;
a total score checking module 804, configured to obtain the association degrees between all the segments, and determine whether the current total score state is a target key segment;
a source extraction module 805, configured to obtain all source data after receiving all the paragraph keyword numbers, and send an independent source signal, a common source signal, and a cooperative source signal;
a source retrieval module 806, configured to obtain the target key paragraph, and automatically display and download source information according to the independent source signal, the common source signal, and the cooperative source signal.
In the embodiment of the invention, a specific modularized design structure is provided, and in the structure, through the extraction of document collection and classification logic and on the basis of the extraction, key extraction and total score verification are continued, and finally, retrieval and display are carried out according to the extraction result of the source.
According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method according to any one of the first aspect of embodiments of the present invention.
According to a fourth aspect of the embodiments of the present invention, there is provided an electronic apparatus. Fig. 9 is a block diagram of an electronic device in one embodiment of the invention. The electronic device shown in fig. 9 is a general article key information tracing apparatus. The electronic device can be a smart phone, a tablet computer and the like. As shown, the electronic device 900 includes a processor 901 and memory 902. The processor 901 is electrically connected to the memory 902. The processor 901 is a control center of the terminal 900, connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by running or calling a computer program stored in the memory 902 and calling data stored in the memory 902, thereby performing overall monitoring of the terminal.
In this embodiment, the processor 901 in the electronic device 900 loads instructions corresponding to one or more processes of the computer program into the memory 902 according to the following steps, and the processor 901 runs the computer program stored in the memory 902, so as to implement various functions: setting an on-line acquisition mode, extracting all paragraphs, sentences and keywords, extracting current paragraphs, extracting the word frequency of each keyword in the paragraphs, setting a logic type number, obtaining all current paragraphs and each corresponding sentence, extracting the paragraph keyword number by combining the logic type number, obtaining the association degree among all paragraphs, judging whether the current total score state is a target key paragraph, obtaining all source data after receiving all paragraph keyword numbers, sending out an independent source signal, a common source signal and a cooperative source signal, obtaining the target key paragraph, and automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal.
Memory 902 may be used to store computer programs and data. Memory 902 stores a computer program having instructions embodied therein that are executable in the processor. The computer program may constitute various functional modules. The processor 901 executes various functional applications and data processing by calling a computer program stored in the memory 902.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
in the embodiment of the invention, the online paragraph key word number is extracted by combining the incidence relation of the paragraphs according to the corresponding document logic relation, so that the online efficient tracing according to the paragraph key word number is realized.
According to the embodiment of the invention, the single, common and cooperative dimension setting is combined according to the paragraph keyword number, so that the high-efficiency multi-dimension tracing and displaying of the corresponding article are realized.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A tracing method for article key information is characterized by comprising the following steps:
setting an online acquisition mode, and extracting all paragraphs, sentences and keywords;
extracting a current paragraph, extracting the word frequency of each keyword in the paragraph, and setting a logic type number;
obtaining all current paragraphs and each corresponding sentence, and extracting the keyword number of the paragraph by combining the logic type number;
acquiring the association degree among all the sections, and judging whether the current total score state is a target key section or not;
after receiving all the paragraph keyword numbers, acquiring all source data, and sending an independent source signal, a common source signal and a cooperative source signal;
and acquiring the target key paragraph, and automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal.
2. The article key information tracing method according to claim 1, wherein the setting of an online acquisition mode to extract all paragraphs, sentences and keywords specifically comprises:
setting the current acquisition mode, and performing online search according to the acquisition mode;
after online searching, obtaining all documents corresponding to the preset key information, and storing the documents as txt format documents;
carrying out paragraph splitting on the txt format document and keeping the document as a split paragraph;
extracting keywords from the split paragraphs to obtain all keywords;
and sentence extraction is carried out on the split paragraphs to obtain all sentences.
3. The article key information tracing method according to claim 1, wherein the extracting a current paragraph, extracting a word frequency of each keyword in the paragraph, and setting a logic type number specifically comprises:
extracting a current paragraph, extracting the word frequency of each keyword in the paragraph, and keeping the word frequency as the frequency of the keyword;
sorting the frequency of the keywords, wherein the keywords ranked 10% top serve as target keywords;
sequencing all paragraphs, wherein when the number of the target keywords in the first sequenced paragraph exceeds 8 and the number of the target keywords in the paragraphs except the first sequenced paragraph does not exceed 8, the current logic type number is 1;
when the number of the target keywords in the first and last paragraph is more than 8 and the number of the target keywords in the paragraphs except the first and last paragraph is not more than 8, the current logic type number is 2;
when the number of the target keywords in the paragraph which is sequenced last exceeds 8, and the number of the target keywords in the paragraphs which are not sequenced last does not exceed 8, the current logic type number is 3;
when the number of the target keywords in the first and last paragraph in the sequence exceeds 8, and the number of the target keywords in the paragraphs other than the first and last paragraph in the sequence also exceeds 8, the current logical type number is 4;
when the number of the target keywords in the first and last paragraphs is not more than 8, the current logic type number is 5;
and extracting the current logic type number for storage.
4. The article key information tracing method according to claim 3, wherein the obtaining of all current paragraphs and each corresponding sentence, and the extracting of the paragraph keyword number in combination with the logic type number specifically include:
acquiring all current paragraphs, and marking the total number of marks of each sentence;
calculating the degree of correlation between segments using a third calculation formula;
judging the current logic type number according to the association degree;
calculating a weight coefficient of each paragraph by using a second calculation formula according to the logic type number;
extracting the frequency of the keywords, and calculating the paragraph keyword number by using a first calculation formula;
the first calculation formula is:
D=Max(W i *Fi d )
wherein the content of the first and second substances,Dthe number of the paragraph key words is given,Fi d is as followsiParagraph number one in paragraphdThe frequency of each of said key words is,Max() In order to extract the function of the keyword number corresponding to the maximum value,W i is as followsiA paragraph weight coefficient;
the second calculation formula is:
Figure DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,zthe logic type number is 1, the logic type number corresponds to total division logic, the logic type number is 2, the logic type number is 3, the logic type number corresponds to total division logic, the logic type number is 4, the logic type number corresponds to progressive logic, the logic type number is 5, the logic type number corresponds to equally dividing logic,nis the total number of paragraphs;
the third calculation formula is:
Figure 925391DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,G a b,is a firstaSegment and the firstbThe degree of association of the segments is such that,Nis as followsaSegment and the firstbThe total number of sentences of the segment,n 1is as followsaThe total number of sentences of the segment,n 2is a firstbThe total number of sentences of the segment,x ai andx aj respectively to firstaIn a section ofiAnd a firstjThe total number of tokens of a sentence,x bi andx bj respectively tobIn a section ofiAnd a first step ofjAnd the total number of marks of the sentence is obtained by looking up a vocabulary table, the total number of marks is the sum of all marks in the corresponding sentence, and the vocabulary table comprises the semantic scores of all words.
5. The article key information tracing method according to claim 1, wherein the obtaining of the degree of association between all paragraphs and the determining of whether the current summary score state is a target key paragraph specifically comprises:
acquiring the association degree among all the segments;
judging whether the current state is a total score state, and if the current state is the total score state, sending a key total segment judgment command;
after receiving the key total section judgment command, judging whether the judgment of a fourth calculation formula is met, and if the judgment of the fourth calculation formula is met, judging the current second sectionaSegment markers are the target key segments;
the fourth calculation formula is:
Min(G a j,)+0.8[Max(G a j,)- Min(G a j,)]>1, 1<j<S,ja
wherein the content of the first and second substances,G a j,is as followsaSegment and the firstjThe degree of association of segments.
6. The method for tracing the source of the article key information as claimed in claim 1, wherein the step of obtaining all source data and sending out an independent source signal, a common source signal and a collaborative source signal after receiving all the paragraph keyword numbers specifically comprises:
after receiving all paragraph keyword numbers, acquiring all the source data, and marking the block number of the source data;
judging whether all paragraphs corresponding to the paragraph key word numbers meet a fifth calculation formula, if so, sending the independent source signal, and if not, continuing to judge;
judging whether all paragraphs corresponding to the paragraph keyword numbers meet a sixth calculation formula, if so, sending a common source signal, and if not, sending a cooperative judgment command;
after receiving the collaborative judgment command, judging whether all paragraphs corresponding to the paragraph key word numbers meet a collaborative source set of a seventh computational formula, and sending a collaborative source signal;
the fifth calculation formula is:
KI/ALL≥0.9
wherein the content of the first and second substances,KIthe block number of the source head isIThe number of the keywords of (2),ALLthe total number of the keywords;
the sixth calculation formula is:
Figure DEST_PATH_IMAGE003
wherein the content of the first and second substances,K i1andK i2are respectively the source header block numberi1 andi2 the number of corresponding keywords;
the seventh calculation formula is:
Figure 574547DEST_PATH_IMAGE004
wherein, the first and the second end of the pipe are connected with each other,minfor the set of co-source headers,K is are respectively the source header block numberisThe number of corresponding keywords.
7. The article key information tracing method according to claim 1, wherein the obtaining of the target key passage automatically displays and downloads source information according to the independent source signal, the common source signal and the cooperative source signal, and specifically comprises:
obtaining the independent source signal, generating a unique source link, and directly downloading and storing the corresponding link in a preset storage space;
acquiring a common source signal, generating a chain download link, and displaying the link only as a hyperlink on the left side of a corresponding paragraph;
acquiring the cooperative source signal, generating a download link set according to the cooperative source signal, sequencing the download link set from large to small according to the correlation degree, and storing the download link set in a preset storage space in a TXT format;
and acquiring the target key paragraph, and storing all the associated information of the target key paragraph in the storage space on line.
8. An article key information traceability system, characterized in that the system comprises:
the document acquisition module is used for setting an online acquisition mode and extracting all paragraphs, sentences and keywords;
the segmentation logic module is used for extracting the current paragraph, extracting the word frequency of each keyword in the paragraph and setting a logic type number;
the key extraction module is used for acquiring all current paragraphs and each corresponding sentence and extracting the keyword number of the paragraph by combining the logic type number;
the total score checking module is used for acquiring the association degree among all the sections and judging whether the current total score state is a target key section or not;
the source extraction module is used for acquiring all source data after receiving all the paragraph keyword numbers and sending an independent source signal, a common source signal and a cooperative source signal;
and the source retrieval module is used for acquiring the target key paragraphs, and automatically displaying and downloading source information according to the independent source signal, the common source signal and the cooperative source signal.
9. A computer-readable storage medium on which computer program instructions are stored, which, when executed by a processor, implement the method of any one of claims 1-7.
10. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the method of any of claims 1-7.
CN202210338283.5A 2022-04-01 2022-04-01 Article key information tracing method, system, readable medium and device Active CN114661868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210338283.5A CN114661868B (en) 2022-04-01 2022-04-01 Article key information tracing method, system, readable medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210338283.5A CN114661868B (en) 2022-04-01 2022-04-01 Article key information tracing method, system, readable medium and device

Publications (2)

Publication Number Publication Date
CN114661868A true CN114661868A (en) 2022-06-24
CN114661868B CN114661868B (en) 2022-11-22

Family

ID=82032972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210338283.5A Active CN114661868B (en) 2022-04-01 2022-04-01 Article key information tracing method, system, readable medium and device

Country Status (1)

Country Link
CN (1) CN114661868B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119254A1 (en) * 2009-11-17 2011-05-19 International Business Machines Corporation Inference-driven multi-source semantic search
US20160078102A1 (en) * 2014-09-12 2016-03-17 Nuance Communications, Inc. Text indexing and passage retrieval
CN110083832A (en) * 2019-04-17 2019-08-02 北大方正集团有限公司 Recognition methods, device, equipment and the readable storage medium storing program for executing of article reprinting relationship
US20200226206A1 (en) * 2019-01-15 2020-07-16 International Business Machines Corporation Using computer-implemented analytics to determine plagiarism or heavy paraphrasing
CN114116973A (en) * 2021-11-23 2022-03-01 竹间智能科技(上海)有限公司 Multi-document text duplicate checking method, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119254A1 (en) * 2009-11-17 2011-05-19 International Business Machines Corporation Inference-driven multi-source semantic search
US20160078102A1 (en) * 2014-09-12 2016-03-17 Nuance Communications, Inc. Text indexing and passage retrieval
US20200226206A1 (en) * 2019-01-15 2020-07-16 International Business Machines Corporation Using computer-implemented analytics to determine plagiarism or heavy paraphrasing
CN110083832A (en) * 2019-04-17 2019-08-02 北大方正集团有限公司 Recognition methods, device, equipment and the readable storage medium storing program for executing of article reprinting relationship
CN114116973A (en) * 2021-11-23 2022-03-01 竹间智能科技(上海)有限公司 Multi-document text duplicate checking method, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIZHAO: "Data Analysis in Coronavirus based on Knowledge Graph of Chinese Literature", 《2020 INTERNATIONAL CONFERENCE ON PUBLIC HEALTH AND DATA SCIENCE (ICPHDS)》 *
面向单篇文献引文网络的主题来源与走向: "秦晓慧", 《现代图书情报技术》 *

Also Published As

Publication number Publication date
CN114661868B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
KR101508260B1 (en) Summary generation apparatus and method reflecting document feature
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN110555206A (en) named entity identification method, device, equipment and storage medium
CN113312461A (en) Intelligent question-answering method, device, equipment and medium based on natural language processing
CN110968684A (en) Information processing method, device, equipment and storage medium
CN104142822A (en) Source code flow analysis using information retrieval
CN113962293B (en) LightGBM classification and representation learning-based name disambiguation method and system
CN110321466A (en) A kind of security information duplicate checking method and system based on semantic analysis
CN112632278A (en) Labeling method, device, equipment and storage medium based on multi-label classification
CN111797222A (en) Course knowledge graph construction method, device, terminal and storage medium
CN108595411B (en) Method for acquiring multiple text abstracts in same subject text set
CN105550253A (en) Method and device for obtaining type relation
CN108388556B (en) Method and system for mining homogeneous entity
CN112948510B (en) Construction method of knowledge graph in media industry
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN108153728A (en) A kind of keyword determines method and device
CN111125443A (en) On-line updating method of test question bank based on automatic duplicate removal
CN111190873A (en) Log mode extraction method and system for log training of cloud native system
CN114153983A (en) Multi-source construction method of industry knowledge graph
CN114661868B (en) Article key information tracing method, system, readable medium and device
CN104331510A (en) Information management method and device
CN104573098B (en) Extensive object identifying method based on Spark systems
CN112115362B (en) Programming information recommendation method and device based on similar code recognition
CN115357765A (en) Data searching method and device, electronic equipment and storage medium
CN114780673A (en) Scientific and technological achievement management method and scientific and technological achievement management platform based on field matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant