CN111159347B - Article content quality data calculation method, calculation device and storage medium - Google Patents

Article content quality data calculation method, calculation device and storage medium Download PDF

Info

Publication number
CN111159347B
CN111159347B CN201911394161.2A CN201911394161A CN111159347B CN 111159347 B CN111159347 B CN 111159347B CN 201911394161 A CN201911394161 A CN 201911394161A CN 111159347 B CN111159347 B CN 111159347B
Authority
CN
China
Prior art keywords
article
content
sentence
article content
sentences
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911394161.2A
Other languages
Chinese (zh)
Other versions
CN111159347A (en
Inventor
柳燕煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhangyue Technology Co Ltd
Original Assignee
Zhangyue Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangyue Technology Co Ltd filed Critical Zhangyue Technology Co Ltd
Priority to CN201911394161.2A priority Critical patent/CN111159347B/en
Publication of CN111159347A publication Critical patent/CN111159347A/en
Application granted granted Critical
Publication of CN111159347B publication Critical patent/CN111159347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Abstract

The invention discloses an article content quality data calculation method, calculation equipment and a storage medium, wherein the article content quality data calculation method comprises the following steps: extracting a plurality of sentences from article contents in an article set; for each sentence, searching article content containing the sentence in the article set; establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram; and calculating the quality data of each article content according to the article association diagram. According to the technical scheme provided by the invention, sentence co-occurrence relation is fully considered for calculating the quality data of each article content, the obtained quality data of each article content can accurately reflect the quality of the article content from the aspect of the sentence of the article, and the calculation mode of the quality data of the article content is optimized.

Description

Article content quality data calculation method, calculation device and storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to an article content quality data calculation method, calculation equipment and a storage medium.
Background
With the continuous popularization of smart terminals such as smart phones and electronic book readers, more and more users enjoy reading article contents such as news or electronic books through the internet. For the article content, the article content with high quality is not only rich in content, but also can improve the reading interest of the user. In the prior art, the determination of the quality data of the article content is often determined by factors such as user evaluation and the number of user readings. However, the quality data obtained by this method of determining quality data only reflects the reading condition of the content of the article by the user, and cannot accurately reflect the quality of the content of the article from the aspect of the sentence of the article, which results in low accuracy of the obtained quality data.
Disclosure of Invention
In view of the above problems, the present invention has been made to provide an article content quality data calculation method, a calculation device, and a storage medium that overcome the above problems or at least partially solve the above problems.
According to one aspect of the invention, an article content quality data calculation method is provided, and the method comprises the following steps:
extracting a plurality of sentences from article contents in an article set;
for each sentence, searching article content containing the sentence in the article set;
establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram;
and calculating the quality data of each article content according to the article association diagram.
According to another aspect of the present invention, there is provided a computing device comprising: the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the following operations:
extracting a plurality of sentences from article contents in an article set;
for each sentence, searching article content containing the sentence in the article set;
establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram;
and calculating the quality data of each article content according to the article association diagram.
According to yet another aspect of the present invention, there is provided a storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to:
extracting a plurality of sentences from article contents in an article set;
for each sentence, searching article content containing the sentence in the article set;
establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram;
and calculating the quality data of each article content according to the article association diagram.
According to the technical scheme provided by the invention, the incidence relation and the incidence relation weight value between the article contents can be conveniently and conveniently determined from the aspect of the article sentences through the sentences commonly contained in the article contents, and an article incidence graph is constructed; sentence co-occurrence relation is fully considered for the calculation of the quality data of each article content, the obtained quality data of each article content can accurately reflect the quality of the article content from the aspect of article sentences, the accuracy of the quality data of the article content is improved, and the calculation mode of the quality data of the article content is optimized.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various additional advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1a is a schematic flow chart of an article content quality data calculation method according to an embodiment of the present invention;
FIG. 1b is a schematic diagram of an article association graph;
FIG. 2 is a flowchart illustrating a second article content quality data calculation method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
Fig. 1a is a schematic flow chart of an article content quality data calculation method according to an embodiment of the present invention, and as shown in fig. 1a, the method includes the following steps:
in step S101, a plurality of sentences are extracted from the article content in the article set.
The article set comprises a plurality of article contents, wherein the article contents can be news acquired from an internet information platform and the like, and can also be chapter contents of an electronic book acquired from an electronic book bank and the like. For example, a chapter content in an e-book can be regarded as an article content. Specifically, each electronic book in the electronic book library may be split according to chapters, and the split content of a plurality of chapters is added to the article collection as the content of a plurality of articles.
Each sentence content includes a plurality of sentences, and in step S101, a plurality of sentences are extracted from each sentence content in the sentence collection, and the number of words included in the extracted sentences may be limited to a certain extent in order to ensure that the extracted sentences can include a significant amount of information. Specifically, sentences having a word count greater than or equal to a preset word count threshold value may be extracted from each article content. The preset word count threshold value can be set by those skilled in the art according to actual needs, for example, the preset word count threshold value can be set to 8.
Step S102, aiming at each sentence, searching the article content containing the sentence in the article set.
And searching article contents containing the sentence in all article contents in the article set aiming at each sentence in the extracted sentences. Taking the sentence "country is popular, and pihusband is responsible" as an example, if the article content 1, the article content 2, and the article content 3 all include the sentence in the article set, the article content including the sentence is found to include the article content 1, the article content 2, and the article content 3.
Step S103, establishing an association relationship among the article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram.
After the search of the article contents is completed for all the sentences, the association relationship among the article contents can be established through the sentences contained in the article contents, and then an article association diagram is constructed. The method comprises the steps of establishing an association relationship among article contents containing the same sentence, and determining an association relationship weight value. In this embodiment, the association relationship refers to a direct association relationship, and the association relationship weight value is related to the number of the same sentences or the sentence quality weight value included in the two article contents corresponding to the association relationship. If any two article contents contain the same sentence, establishing an association relationship between the two article contents, connecting the two article contents through a connecting line, and determining the association relationship weight value between the two article contents according to the quantity of the same sentence or the sentence quality weight value and the like contained in the two article contents, thereby constructing and obtaining an article association diagram. The association relation among the article contents can be intuitively and conveniently reflected through the article association diagram. In this embodiment, the article association graph is an undirected graph. Fig. 1b shows a schematic diagram of an article association graph, as shown in fig. 1b, two article contents connected through a connection line include the same sentence, w1 to w16 shown in fig. 1b represent association relationship weight values between the two article contents connected, where w1 represents an association relationship weight value between article content 1 and article content 2, and w2 represents an association relationship weight value between article content 1 and article content 3.
And step S104, calculating the quality data of each article content according to the article association diagram.
The incidence relation and the incidence relation weight value between the article contents in the article incidence graph can be substituted into a preset algorithm model, and the quality data of the article contents can be obtained through calculation. Specifically, the step of calculating the quality data of the content of each article can be accomplished by loop iteration. The quality data of each article content may specifically be a quality score or the like. The article content quality data calculation method provided by this embodiment determines the association relationship between article contents through sentences commonly contained in the article contents, and the quality data of each article content calculated based on the sentence co-occurrence relationship (i.e., co-occurrence) can accurately reflect the quality of the article content from the aspect of the article sentences. Taking the quality data as the quality score as an example, the higher the quality score of the article content is, the closer the association relationship between the article content and other article contents is, the higher the sentence co-occurrence degree is.
By using the article content quality data calculation method provided by the embodiment, the incidence relation and the incidence relation weight value between article contents can be conveniently and conveniently determined from the aspect of the article sentences through the sentences commonly contained in the article contents, and an article incidence graph is constructed; sentence co-occurrence relation is fully considered for the calculation of the quality data of each article content, the obtained quality data of each article content can accurately reflect the quality of the article content from the aspect of article sentences, the accuracy of the quality data of the article content is improved, and the calculation mode of the quality data of the article content is optimized.
Example two
Fig. 2 is a flowchart illustrating a method for calculating article content quality data according to a second embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:
in step S201, a plurality of sentences are extracted from the article content in the article set.
The article content may be news collected from an internet information platform and the like and/or chapter content of an electronic book obtained from an electronic book library. Each article content comprises a plurality of sentences, and the article content can be split according to the sentences aiming at each article content in the article set to obtain the sentences of the article content; in order to ensure that the extracted sentences can contain effective information amount, after the sentences of each article content are obtained through splitting, the sentences of each article content, the words of which the number is smaller than a preset word number threshold value, can be screened out, and the screened out sentences are taken as the sentences extracted from the article set. Taking the preset word number threshold as 8 as an example, assuming that the content of each article in the article set is split according to the sentences, and a total split is obtained by 1000 ten thousand sentences, wherein the word number of 100 ten thousand sentences is less than 8, the sentences with the word number less than 8 are screened out from the 1000 ten thousand sentences, and 900 ten thousand sentences obtained after screening out are taken as the sentences extracted from the article set.
In step S202, for each sentence, the article content including the sentence is searched in the article set.
In order to determine the sentence co-occurrence relationship between the sentence contents, it is necessary to search the sentence content including the sentence in the sentence set for each sentence of the plurality of sentences extracted in step S201. Specifically, for each sentence, all article contents in the article set are traversed, and whether each article content contains the sentence is determined, so that all article contents containing the sentence are found.
Step S203, count the number of article contents containing the sentence, and obtain the usage frequency of the sentence.
Considering that although the word count of the extracted sentences is limited in step S201, the number of sentences in the sentence collection meeting the word count limitation is very large, and some sentences that appear in only a small number of the contents of the sentences may be included in the sentences, the extracted sentences may be further filtered according to the frequency of use of the sentences in the sentence collection. Specifically, after the article content including the sentence is searched, the number of the searched article content is counted, and then the counted result is used as the usage frequency of the sentence. Taking the statement "country is popular and is responsible" as an example, the article content including the statement obtained by searching includes article content 1, article content 2, and article content 3, and then the usage frequency of the statement is 3.
Step S204, the sentences of which the using frequency is less than a preset frequency threshold are screened out from the sentences.
The preset frequency threshold value can be set by those skilled in the art according to actual needs. Taking the preset frequency threshold as 2 as an example, sentences with the frequency less than 2 are screened out from the plurality of sentences in step S204. If the frequency of use of a sentence is only 1, which means that only one article content in the article set contains the sentence, and all other article contents do not contain the sentence, i.e. the sentence does not appear in more than two article contents together, the sentence is screened out from the sentences.
Step S205 determines each article content containing the same sentence as each article content having an association relationship, and establishes a connection line between each article content having an association relationship.
The sentences obtained after the filtering processing in step S204 are sentences whose frequency of use is greater than or equal to the preset frequency threshold, and for the sentences whose frequency of use is greater than or equal to the preset frequency threshold, each article content containing the same sentence is determined as each article content having an association relationship, each article content is used as a node in an article association graph, and a connection line is established between each article content having an association relationship.
Step S206, calculating the weight value of the incidence relation among the article contents according to the same sentences contained in the article contents, and constructing to obtain an article incidence graph.
In an alternative embodiment, for the determination of the association relationship weight value, the total number of sentences may be used as the association relationship weight value between the article contents by calculating the total number of sentences of all the same sentences contained between the article contents.
In another alternative embodiment, for the determination of the association relationship weight value, sentence quality weight values of a plurality of sentences may be obtained, and then an accumulated value of the sentence quality weight values of all the same sentences contained between the article contents is calculated, and the accumulated value is used as the association relationship weight value between the article contents. If the accumulated value of the sentence quality weight values of the same sentence contained between any two article contents is larger, the association relationship weight value between the two article contents is larger. The determination mode of the incidence relation weight value fully considers the sentence quality of the same sentence contained in each article content, and is beneficial to calculating the quality data of each article content in the article association diagram more accurately.
And inputting the plurality of sentences into the trained sentence quality analysis model to obtain the sentence quality weight values of the plurality of sentences. Specifically, for training of the statement quality analysis model, a plurality of statement samples can be collected, a positive sample label or a negative sample label is calibrated for each statement sample, and model training is performed according to the statement samples, so that the trained statement quality analysis model is obtained. The trained statement quality analysis model can quickly and accurately determine the statement quality weight value of the statement.
The constructed article association graph can be as shown in fig. 1b, in which two article contents connected through a connecting line in fig. 1b include the same sentence, the frequency of use of the included same sentence is greater than or equal to the preset frequency threshold, and w1 to w16 shown in fig. 1b represent association relationship weight values between the two connected article contents. For the article content 1 and the article content 2 in the article association diagram, it is assumed that 3 identical sentences are included between the article content 1 and the article content 2, and the 3 identical sentences are respectively sentence 1, sentence 2 and sentence 3, where a sentence quality weight value of sentence 1 is 0.8, a sentence quality weight value of sentence 2 is 0.6, and a sentence quality weight value of sentence 3 is 0.3. If the total number of the contained sentences of the same sentence is used as the weight value of the association relationship, the weight value w1 of the association relationship between the article content 1 and the article content 2 is equal to 3; if the accumulated value of the sentence quality weight values of the same contained sentence is used as the association relationship weight value, the association relationship weight value w1 between the article content 1 and the article content 2 is equal to (0.8 +0.6+ 0.3), that is, w1 is equal to 1.7.
Step S207, for any article content in the article association graph, calculating quality data of the article content according to the quality data of at least one associated article content having an association relationship with the article content, an association relationship weight value between the article content and the at least one associated article content, and an association relationship weight value between each associated article content and at least one other article content having an association relationship with each associated article content.
For each associated article content having an association relationship with the article content, the associated article content may have an association relationship with at least one other article content except the article content in addition to the article content, and for convenience of description, in this embodiment, the article content having an association relationship with the associated article content and except the article content is referred to as an adjacent article content. Calculating an association relationship weight value between the article content and the associated article content and a sum of the association relationship weight values between the associated article content and all adjacent article contents, knowing a ratio between the association relationship weight value between the article content and the associated article content and the sum to obtain a contribution ratio of the associated article content to the article content, then performing operation processing on the contribution ratio and quality data of the associated article content to obtain a part of the associated article content contributed by the article content, and calculating the quality data of the article content by accumulating the parts of the associated article content contributed by the article content.
For the article content 1 in the article association diagram shown in fig. 1b, the article contents having an association relationship with the article content 1 include the article content 2, the article content 3, the article content 21 and the article content 23, that is, the article content 2, the article content 3, the article content 21 and the article content 23 are the associated article contents of the article content 1, and then the quality data of the article content 1 is equal to the accumulation of the portions of the respective associated article contents contributed by the article content 1. As shown in fig. 1b, for the article content 2, as well as the article content 1, the article content 2 has an association relationship with the article content 3 and the article content 5, that is, the article content 3 and the article content 5 are adjacent article contents, fig. 1b also shows that the association relationship weight value between the article content 1 and the article content 2 is w1, the association relationship weight value between the article content 2 and the article content 3 is w8, and the association relationship weight value between the article content 2 and the article content 5 is w9, then the contribution ratio of the article content 2 to the article content 1 is w 1/(w 1+ w8+ w 9), where (w 1+ w8+ w 9) is the sum of the association relationship weight value between the article content 1 and the article content 2, the association relationship weight value between the article content 2 and the article content 3, and the association relationship weight value between the article content 2 and the article content 5; then, the contribution ratio and the quality data of the article content 2 are subjected to arithmetic processing, so that the portion of the article content 2 contributed by the article content 1 can be obtained. According to the above method, the portion of the article content 3 contributed to the article content 1, the portion of the article content 21 contributed to the article content 1, and the portion of the article content 23 contributed to the article content 1 are calculated, and then the quality data of the article content 1 is calculated by adding up the portions of the article content 2, the article content 3, the article content 21, and the article content 23 contributed to the article content 1.
Specifically, the step of calculating the quality data of the content of each article is completed by loop iteration. Firstly, setting the same initial value of quality data for each article content, for example, setting the initial value of quality data of each article content to be 1; and then, circularly and iteratively calculating the quality data of each article content, so that the quality data of each article content is continuously updated until an ending condition is met, and stable quality data of each article content is obtained. The person skilled in the art can set the ending condition according to the actual need, and is not limited herein. For example, the end conditions include: the iteration times reach the preset iteration times; and/or the difference value between the quality data of the article contents obtained by two adjacent iterative computations is smaller than a preset difference value. Specifically, whether the end condition is met or not may be determined by whether the iteration number reaches a preset iteration number or not, or a difference between the quality data of each article content obtained by the current iteration calculation and the quality data of each article content obtained by the previous iteration calculation may be calculated, and whether the end condition is met or not may be determined by determining whether the difference is smaller than the preset difference.
Through the calculation, the quality data of the article contents can accurately reflect the quality of the article contents and the co-occurrence degree of sentences, the article quality characteristics in the electronic book portrait data of the electronic books can be updated according to the quality data of the article contents after the quality data of the article contents are obtained, the electronic book can be analyzed and recommended subsequently by utilizing the updated electronic book portrait data, the electronic book which is in line with the preference of the user and has higher quality is recommended to the user preferentially, the user can read or download the recommended electronic book very likely, the adoption rate of the recommended electronic book is effectively improved, and the recommendation effect is greatly improved.
Optionally, the article set may include an isolated article content, where the isolated article content refers to an article content that does not have a sentence same as any other article content in the article set, and then the isolated article content does not have an association with other article content, the quality data of the isolated article content may be directly set as the preset quality data, and a person skilled in the art may set the preset quality data according to actual needs, for example, the preset quality data may be set as 0.1.
By using the method for calculating the article content quality data provided by the embodiment, the sentences can be effectively screened according to the word number and the use frequency of the sentences; through sentences commonly contained in the article contents, the incidence relation between the article contents can be conveniently and conveniently determined from the aspect of the article sentences; according to the sentence total number or the sentence quality weight value of the same sentence contained in each article content, the accurate calculation of the association relation weight value is realized; the quality data of each article content can be accurately calculated according to the determined association relationship between the article contents and the association relationship weight value, the obtained quality data of each article content accurately reflects the quality of the article content from the aspect of article sentences, and the calculation mode of the quality data of the article content is optimized; in addition, the obtained quality data of the article contents are used in the field of electronic book analysis and recommendation, and the electronic books which accord with the preference of the user and have high quality can be preferentially recommended to the user, so that the adoption rate of the recommended electronic books is improved, and the recommendation effect is greatly improved.
EXAMPLE III
The third embodiment of the present invention provides a nonvolatile storage medium, where the storage medium stores at least one executable instruction, and the executable instruction may execute the article content quality data calculation method in any of the above method embodiments.
The executable instructions may be specifically configured to cause the processor to: extracting a plurality of sentences from article contents in an article set; for each sentence, searching article content containing the sentence in the article set; establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram; and calculating the quality data of each article content according to the article association diagram.
In an alternative embodiment, the executable instructions further cause the processor to: splitting the article content according to sentences aiming at each article content in the article set to obtain the sentences of the article content; and screening out sentences of which the word number is smaller than a preset word number threshold value from the sentences of the article contents.
In an alternative embodiment, the executable instructions further cause the processor to: counting the number of article contents containing the sentence to obtain the use frequency of the sentence; and screening out sentences of which the use frequency is less than a preset frequency threshold from the sentences.
In an alternative embodiment, the executable instructions further cause the processor to: determining the article contents containing the same sentence as the article contents with the association relationship, and establishing a connecting line between the article contents with the association relationship; and calculating the association relation weight value between the article contents according to the same sentences contained in the article contents, and constructing to obtain an article association diagram.
In an alternative embodiment, the executable instructions further cause the processor to: and calculating the total number of sentences of all the same sentences contained in the article contents, and taking the total number of sentences as the weight value of the association relationship among the article contents.
In an alternative embodiment, the executable instructions further cause the processor to: obtaining statement quality weight values of a plurality of statements; and calculating an accumulated value of statement quality weight values of all the same statements contained in the article contents, and taking the accumulated value as an association relation weight value between the article contents.
In an alternative embodiment, the executable instructions further cause the processor to: and inputting the plurality of sentences into the trained sentence quality analysis model to obtain the sentence quality weight values of the plurality of sentences.
In an alternative embodiment, the executable instructions further cause the processor to: and calculating the quality data of the article content according to the quality data of at least one associated article content having an association relationship with the article content, an association relationship weight value between the article content and the at least one associated article content and an association relationship weight value between each associated article content and at least one other article content having an association relationship with each associated article content aiming at any article content in the article association diagram.
In an alternative embodiment, the step of calculating the quality data of the article contents is performed by loop iteration; and (4) circularly and iteratively calculating the quality data of the article contents until the ending condition is met.
In an alternative embodiment, the end condition comprises: the iteration times reach the preset iteration times; and/or the difference value between the quality data of the article contents obtained by two adjacent iterative computations is smaller than a preset difference value.
In an alternative embodiment, the article content is chapter content of an electronic book.
Example four
Fig. 3 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.
As shown in fig. 3, the computing device may include: a processor (processor) 302, a communication Interface 304, a memory 306, and a communication bus 308.
Wherein:
the processor 302, communication interface 304, and memory 306 communicate with each other via a communication bus 308.
A communication interface 304 for communicating with network elements of other devices, such as clients or other servers.
The processor 302 is configured to execute the program 310, and may specifically execute the relevant steps in the above-mentioned article content quality data calculation method embodiment.
In particular, program 310 may include program code comprising computer operating instructions.
The processor 302 may be a central processing unit CPU, or an Application Specific Integrated Circuit ASIC (Application Specific Integrated Circuit), or one or more Integrated circuits configured to implement embodiments of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 306 for storing a program 310. Memory 306 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 310 may specifically be configured to cause the processor 302 to perform the following operations: extracting a plurality of sentences from article contents in an article set; for each sentence, searching article content containing the sentence in the article set; establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram; and calculating the quality data of each article content according to the article association diagram.
In an alternative embodiment, program 310 further causes processor 302 to: splitting the article content according to sentences aiming at each article content in the article set to obtain the sentences of the article content; and screening out sentences of which the word number is smaller than a preset word number threshold value from the sentences of the article contents.
In an alternative embodiment, program 310 further causes processor 302 to: counting the number of article contents containing the sentence to obtain the use frequency of the sentence; and screening out sentences of which the use frequency is less than a preset frequency threshold from the sentences.
In an alternative embodiment, program 310 further causes processor 302 to: determining the article contents containing the same sentence as the article contents with the association relationship, and establishing a connecting line between the article contents with the association relationship; and calculating the weight value of the association relationship among the article contents according to the same sentences contained in the article contents, and constructing to obtain an article association diagram.
In an alternative embodiment, program 310 further causes processor 302 to: and calculating the total number of sentences of all the same sentences contained in the article contents, and taking the total number of sentences as the weight value of the association relationship among the article contents.
In an alternative embodiment, program 310 further causes processor 302 to: obtaining statement quality weight values of a plurality of statements; and calculating an accumulated value of statement quality weight values of all the same statements contained in the article contents, and taking the accumulated value as an association relation weight value between the article contents.
In an alternative embodiment, program 310 further causes processor 302 to: and inputting the plurality of sentences into the trained sentence quality analysis model to obtain the sentence quality weight values of the plurality of sentences.
In an alternative embodiment, program 310 further causes processor 302 to: and calculating the quality data of the article content according to the quality data of at least one associated article content having an association relationship with the article content, an association relationship weight value between the article content and the at least one associated article content and an association relationship weight value between each associated article content and at least one other article content having an association relationship with each associated article content aiming at any article content in the article association diagram.
In an alternative embodiment, the step of calculating the quality data of the article contents is performed by loop iteration; and (4) circularly and iteratively calculating the quality data of the article contents until the ending condition is met.
In an alternative embodiment, the end condition comprises: the iteration times reach the preset iteration times; and/or the difference value between the quality data of the article contents obtained by two adjacent iterative computations is smaller than a preset difference value.
In an alternative embodiment, the article content is chapter content of an electronic book.
For specific implementation of each step in the program 310, reference may be made to the description corresponding to the corresponding step in the above-mentioned article content quality data calculation embodiment, which is not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the above-described device may refer to the corresponding process description in the foregoing method embodiment, and is not described herein again.
By the scheme provided by the embodiment, sentence co-occurrence relation is fully considered for calculating the quality data of each article content, the obtained quality data of each article content can accurately reflect the quality of the article content from the aspect of the article sentence, and the calculation mode of the quality data of the article content is optimized.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system is apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: rather, the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore, may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (24)

1. An article content quality data calculation method, comprising:
extracting a plurality of sentences from article contents in an article set;
for each sentence, searching article content containing the sentence in the article set;
establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram; calculating the weight value of the incidence relation among the article contents according to the same sentences contained in the article contents, and constructing to obtain an article incidence graph;
calculating quality data of each article content according to the article association diagram;
before the calculating the association relationship weight value between the article contents according to the same sentences contained in the article contents, the method further includes: obtaining statement quality weight values of a plurality of statements;
the calculating of the association relationship weight value between the article contents according to the same sentence contained between the article contents specifically comprises: calculating an accumulated value of statement quality weight values of all the same statements contained in each article content, and taking the accumulated value as an association relation weight value between each article content;
the step of calculating quality data of each article content according to the article association graph further comprises:
aiming at any article content in the article association diagram, calculating the quality data of the article content according to the quality data of at least one associated article content having an association relationship with the article content, an association relationship weight value between the article content and the at least one associated article content and an association relationship weight value between each associated article content and at least one other article content having an association relationship with each associated article content; wherein, the step of calculating the quality data of each article content is completed by loop iteration.
2. The method of claim 1, wherein the extracting a plurality of sentences from the article content in the collection of articles further comprises:
for each article content in the article set, splitting the article content according to a sentence to obtain the sentence of the article content;
and screening out sentences of which the word number is smaller than a preset word number threshold value from the sentences of the article contents.
3. The method of claim 1, wherein after said for each sentence, finding article content in said collection of articles that contains the sentence, the method further comprises:
counting the number of article contents containing the sentence to obtain the use frequency of the sentence;
and screening out sentences of which the use frequency is less than a preset frequency threshold from the sentences.
4. The method of claim 1, wherein the establishing an association relationship between the article contents containing the same sentence and determining an association relationship weight value to construct an article association graph further comprises:
determining the article contents containing the same sentence as the article contents with the association relationship, and establishing a connecting line between the article contents with the association relationship.
5. The method of claim 1, wherein said obtaining statement quality weight values for a plurality of statements further comprises:
and inputting the plurality of sentences into the trained sentence quality analysis model to obtain the sentence quality weight values of the plurality of sentences.
6. The method of claim 1, wherein the loop iteratively calculates quality data for each article content until an end condition is satisfied.
7. The method of claim 6, wherein the end condition comprises: the iteration times reach the preset iteration times; and/or the difference value between the quality data of the article contents obtained by two adjacent iterative computations is smaller than a preset difference value.
8. The method of any of claims 1-7, wherein the article content is chapter content of an electronic book.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to:
extracting a plurality of sentences from article contents in an article set;
for each sentence, searching article content containing the sentence in the article set;
establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram; calculating the weight value of the incidence relation among the article contents according to the same sentences contained in the article contents, and constructing to obtain an article incidence graph;
calculating quality data of each article content according to the article association diagram;
wherein the executable instructions further cause the processor to:
obtaining statement quality weight values of a plurality of statements;
calculating an accumulated value of statement quality weight values of all the same statements contained in each article content, and taking the accumulated value as an association relation weight value between each article content;
aiming at any article content in the article association graph, calculating the quality data of the article content according to the quality data of at least one associated article content having an association relationship with the article content, an association relationship weight value between the article content and the at least one associated article content and an association relationship weight value between each associated article content and at least one other article content having an association relationship with each associated article content; wherein, the step of calculating the quality data of each article content is completed by loop iteration.
10. The computing device of claim 9, the executable instructions further cause the processor to:
for each article content in the article set, splitting the article content according to a sentence to obtain the sentence of the article content;
and screening out sentences of which the word number is smaller than a preset word number threshold value from the sentences of the article contents.
11. The computing device of claim 9, the executable instructions further cause the processor to:
counting the number of article contents containing the sentence to obtain the use frequency of the sentence;
and screening out sentences the use frequency of which is less than a preset frequency threshold from the plurality of sentences.
12. The computing device of claim 9, the executable instructions further cause the processor to:
determining the article contents containing the same sentence as the article contents with the association relationship, and establishing a connecting line between the article contents with the association relationship.
13. The computing device of claim 9, the executable instructions further cause the processor to:
and inputting the plurality of sentences into the trained sentence quality analysis model to obtain the sentence quality weight values of the plurality of sentences.
14. The computing device of claim 9, wherein the loop iteratively computes quality data for each article content until an end condition is satisfied.
15. The computing device of claim 14, wherein the end condition comprises: the iteration times reach the preset iteration times; and/or the difference value between the quality data of the article contents obtained by two adjacent iterative computations is smaller than a preset difference value.
16. The computing device of any of claims 9-15, wherein the article content is chapter content of an electronic book.
17. A storage medium having stored therein at least one executable instruction that causes a processor to:
extracting a plurality of sentences from article contents in an article set;
for each sentence, searching article content containing the sentence in the article set;
establishing an association relationship among article contents containing the same sentence, determining an association relationship weight value, and constructing to obtain an article association diagram; calculating the weight value of the incidence relation among the article contents according to the same sentences contained in the article contents, and constructing to obtain an article incidence graph;
calculating quality data of each article content according to the article association diagram;
wherein the executable instructions further cause the processor to:
obtaining statement quality weight values of a plurality of statements;
calculating an accumulated value of statement quality weight values of all the same statements contained in each article content, and taking the accumulated value as an association relation weight value between each article content;
aiming at any article content in the article association graph, calculating the quality data of the article content according to the quality data of at least one associated article content having an association relationship with the article content, an association relationship weight value between the article content and the at least one associated article content and an association relationship weight value between each associated article content and at least one other article content having an association relationship with each associated article content; wherein, the step of calculating the quality data of each article content is completed by loop iteration.
18. The storage medium of claim 17, the executable instructions further causing the processor to:
for each article content in the article set, splitting the article content according to a sentence to obtain the sentence of the article content;
and screening out sentences of which the word number is smaller than a preset word number threshold value from the sentences of the article contents.
19. The storage medium of claim 17, the executable instructions further causing the processor to:
counting the number of article contents containing the sentence to obtain the use frequency of the sentence;
and screening out sentences of which the use frequency is less than a preset frequency threshold from the sentences.
20. The storage medium of claim 17, the executable instructions further causing the processor to:
determining the article contents containing the same sentence as the article contents with the association relationship, and establishing a connecting line between the article contents with the association relationship.
21. The storage medium of claim 17, the executable instructions further causing the processor to:
and inputting the plurality of sentences into the trained sentence quality analysis model to obtain the sentence quality weight values of the plurality of sentences.
22. A storage medium as defined in claim 17, wherein the loop iteratively calculates quality data for each article content until an end condition is satisfied.
23. The storage medium of claim 22, wherein the end condition comprises: the iteration times reach the preset iteration times; and/or the difference value between the quality data of the article contents obtained by two adjacent iterative computations is smaller than a preset difference value.
24. The storage medium of any one of claims 17-23, wherein the article content is chapter content of an electronic book.
CN201911394161.2A 2019-12-30 2019-12-30 Article content quality data calculation method, calculation device and storage medium Active CN111159347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911394161.2A CN111159347B (en) 2019-12-30 2019-12-30 Article content quality data calculation method, calculation device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911394161.2A CN111159347B (en) 2019-12-30 2019-12-30 Article content quality data calculation method, calculation device and storage medium

Publications (2)

Publication Number Publication Date
CN111159347A CN111159347A (en) 2020-05-15
CN111159347B true CN111159347B (en) 2023-03-21

Family

ID=70559149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911394161.2A Active CN111159347B (en) 2019-12-30 2019-12-30 Article content quality data calculation method, calculation device and storage medium

Country Status (1)

Country Link
CN (1) CN111159347B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015033341A1 (en) * 2013-09-09 2015-03-12 Sami Shamoon College Of Engineering (R.A.) Polytope based summarization method
CN109241297A (en) * 2018-07-09 2019-01-18 广州品唯软件有限公司 A kind of classifying content polymerization, electronic equipment, storage medium and engine
CN110334356A (en) * 2019-07-15 2019-10-15 腾讯科技(深圳)有限公司 Article matter method for determination of amount, article screening technique and corresponding device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10552498B2 (en) * 2016-09-19 2020-02-04 International Business Machines Corporation Ground truth generation for machine learning based quality assessment of corpora

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015033341A1 (en) * 2013-09-09 2015-03-12 Sami Shamoon College Of Engineering (R.A.) Polytope based summarization method
CN109241297A (en) * 2018-07-09 2019-01-18 广州品唯软件有限公司 A kind of classifying content polymerization, electronic equipment, storage medium and engine
CN110334356A (en) * 2019-07-15 2019-10-15 腾讯科技(深圳)有限公司 Article matter method for determination of amount, article screening technique and corresponding device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
知识图谱中链接数据质量评价研究综述;顾进广等;《武汉大学学报(理学版)》;第63卷(第01期);第22-38页 *

Also Published As

Publication number Publication date
CN111159347A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN108073568B (en) Keyword extraction method and device
CN110162695B (en) Information pushing method and equipment
WO2011011046A1 (en) Ranking search results based on word weight
CN107526846B (en) Method, device, server and medium for generating and sorting channel sorting model
WO2015188006A1 (en) Method and apparatus of matching text information and pushing a business object
CN105608113B (en) Judge the method and device of POI data in text
CN108984735B (en) Label Word library updating method, apparatus and electronic equipment
CN103577547B (en) Webpage type identification method and device
CN110263127A (en) Text search method and device is carried out based on user query word
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CN108804550B (en) Query term expansion method and device and electronic equipment
CN110968802A (en) User characteristic analysis method, analysis device and readable storage medium
CN108664605B (en) Model evaluation method and system
CN110532528B (en) Book similarity calculation method based on random walk and electronic equipment
CN111159347B (en) Article content quality data calculation method, calculation device and storage medium
CN111611471B (en) Searching method and device and electronic equipment
CN111125543A (en) Training method of book recommendation sequencing model, computing device and storage medium
CN110968666A (en) Similarity-based title generation model training method and computing equipment
CN110598194A (en) Method and device for extracting content of non-full-grid table and terminal equipment
CN115238194A (en) Book recommendation method, computing device and computer storage medium
CN108763258A (en) Document subject matter parameter extracting method, Products Show method, equipment and storage medium
CN109241421B (en) ID data network processing method, device, computing equipment and computer storage medium
CN113468339A (en) Label extraction method, system, electronic device and medium based on knowledge graph
CN113868373A (en) Word cloud generation method and device, electronic equipment and storage medium
CN113850523A (en) ESG index determining method based on data completion and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant