CN116681056B

CN116681056B - Text value calculation method and device based on value scale

Info

Publication number: CN116681056B
Application number: CN202310596067.5A
Authority: CN
Inventors: 张勇东; 毛震东; 刘毅; 郭俊波; 陈伟东
Original assignee: University of Science and Technology of China USTC; People Co Ltd
Current assignee: University of Science and Technology of China USTC; Konami Sports Club Co Ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2024-01-26
Anticipated expiration: 2043-05-24
Also published as: CN116681056A

Abstract

The embodiment of the invention discloses a text value calculation method and a text value calculation device based on a value table, wherein the method comprises the following steps: word segmentation is carried out on the text to obtain a keyword set containing a plurality of keywords; traversing the keyword set based on a preset value table, and inquiring node keywords matched with the keywords to obtain matched node sets with different levels; the preset value table comprises a plurality of preset level nodes; each node includes a node key; and calculating the value data of the text according to the number and the weight of the matched node sets of different levels. And segmenting the text, determining matching node sets of different levels contained in the text by matching keywords in the text with node keywords in a preset value table, and further calculating to obtain value data of the text according to the number and the weight of the matching node sets of different levels, so as to determine the value of the text based on the preset value table.

Description

Text value calculation method and device based on value scale

Technical Field

The embodiment of the invention relates to the technical field of artificial intelligence, in particular to a text value calculation method and device based on a value table.

Background

Along with the development of science and technology, the self-media is different from the traditional media ecology, and the traditional media ecology is mainly produced and released by a professional main body, and the information has the characteristics of higher public confidence, strict content management and the like. From the media age, anyone can create and publish content through the internet, so that the quality of information propagated in the network is seriously lack of guarantee. The content of each media platform is good and bad, and a large amount of content with low value orientation exists. Because of low production cost and low acceptance threshold of the content, a large amount of low-value content exists in the network, and the low-value content is easy to overstretch, so that the propagation of the main stream value content is challenging. If the low-value content freely grows without guidance, useless, bad and other information can be flooded in the network to pollute the network space, and the social wind can be negatively influenced, so that the value of the public is silently brought.

The existing network information guiding method mainly comprises rumor detection, public opinion monitoring, standard formulation, popularity prediction and the like. The main purpose of the above methods is to identify counterfeit information, monitor the development situation of hot events, etc. If standards are formulated, the content and form of network information are explicitly released by formulating relevant standards and specifications, so that the information release and the information release are managed and guided, but the method is more inscribed and lacks flexibility. Among the information popularity predictions, it is generally believed that information having a greater popularity tends to be of greater value, but this deviates from the actual existence, such as the crowd's favour, inexpensive low value information sometimes being rather easier to stream. Therefore, it is necessary to perform value calculation from the value layer to the text of the web content, not only to the content focusing on the one-sidedness such as forgery or hot spots.

Disclosure of Invention

In view of the foregoing, embodiments of the present invention are provided to provide a method and apparatus for calculating a value table-based text value that overcomes or at least partially solves the foregoing problems.

According to an aspect of the embodiment of the present invention, there is provided a text value calculation method based on a value table, including:

word segmentation is carried out on the text to obtain a keyword set containing a plurality of keywords;

traversing the keyword set based on a preset value table, and inquiring node keywords matched with the keywords to obtain matched node sets with different levels; the preset value table comprises a plurality of preset level nodes; each node includes a node key;

and calculating the value data of the text according to the number and the weight of the matched node sets of different levels.

According to another aspect of the embodiment of the present invention, there is provided a text value calculating apparatus based on a value table, the apparatus including:

the word segmentation module is suitable for carrying out word segmentation processing on the text to obtain a keyword set containing a plurality of keywords;

the matching module is suitable for traversing the keyword set based on a preset value table, inquiring node keywords matched with the keywords, and obtaining matching node sets with different levels; the preset value table comprises a plurality of preset level nodes; each node includes a node key;

And the value calculation module is suitable for calculating and obtaining the value data of the text according to the number and the weight of the matched node sets of different levels.

According to yet another aspect of an embodiment of the present invention, there is provided a computing device including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the text value calculation method based on the value table.

According to still another aspect of the embodiments of the present invention, there is provided a computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the above-described value table-based text value calculation method.

According to the text value calculating method and device based on the value table, the text is segmented, the matching node sets of different levels contained in the text are determined by matching the keywords in the text with the node keywords in the preset value table, and then the value data of the text is calculated according to the number and the weight of the matching node sets of different levels, so that the text value is determined based on the preset value table.

The foregoing description is only an overview of the technical solutions of the embodiments of the present invention, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present invention can be more clearly understood, and the following specific implementation of the embodiments of the present invention will be more apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 illustrates a flow chart of a value scale based text value calculation method according to one embodiment of the invention;

FIG. 2 shows a flow chart for updating a preset value table;

FIG. 3 illustrates a schematic diagram of a value meter-based text value computing device, according to one embodiment of the invention;

FIG. 4 illustrates a schematic diagram of a computing device, according to one embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

FIG. 1 shows a flow chart of a value scale based text value calculation method according to one embodiment of the invention, as shown in FIG. 1, comprising the steps of:

step S101, word segmentation processing is carried out on the text, and a keyword set containing a plurality of keywords is obtained.

According to the method, the text value is calculated by analyzing various texts issued by a user in a network, and the matching degree of the texts and the main stream value is calculated, so that the sense-of-society guidance is maintained as a fundamental basis, and the correct cognition and accurate propagation of the main stream value content are ensured.

Specifically, after the text is obtained, the text is preprocessed, where the preprocessing includes, for example, format filtering processing and stop word filtering processing. Various formatted information and words with no value which are irrelevant to text value calculation can be removed through preprocessing, words irrelevant to the value calculation are reduced, and the accuracy of subsequent word segmentation is ensured. Such as the date in the text, the "electricity per newspaper" in news, URL, etc. For stop words, a stop word list can be preset, stop word filtering processing is carried out according to the preset stop word list, and the preset stop word list comprises words or symbols with no value meaning, such as "@", and "(" the time of the stop word ) "," emmmm ", etc. The above formatted information and the preset stop word list are exemplified, and may be specifically set according to the implementation situation, which is not limited herein.

The text after pretreatment is processed according to punctuation marks, for example, the text is firstly split into a plurality of sentences according to the punctuation marks, then each sentence is subjected to word segmentation processing to obtain each phrase contained in each sentence, and the word segmentation processing can be carried out by using NER (Named Entity Recognition) tools such as natural language processing to carry out word segmentation to obtain phrases such as 'human', 'fortune', 'community', and the like. Furthermore, each phrase obtained based on word segmentation processing is segmented into sentences, and the association relation among the phrases is not considered, so that the embodiment also combines each phrase based on a preset expansion word list to obtain corresponding keywords to form a keyword set. The preset expansion word list is set according to implementation conditions.

The keyword set contains a plurality of keywords obtained from the text, and subsequent text value calculation is performed based on the keywords.

Step S102, traversing the keyword set based on a preset value table, and inquiring node keywords matched with the keywords to obtain matched node sets with different levels.

The preset value table can be preset, and a hierarchical label semantic knowledge graph mode is adopted, wherein the hierarchical label semantic knowledge graph mode comprises a plurality of preset level nodes which are core nodes, secondary core nodes and peripheral nodes in sequence. The value of the core node is higher than that of the secondary core node, and the value of the secondary core node is higher than that of the peripheral node. The division of the core node, the secondary core node and the peripheral node is set according to the implementation situation, and is determined by combining the current main stream value, and the method is not limited herein. Each node includes node keywords and also includes, for example, node frequency, related and similar nodes, node numbers, entity types, and the like. The node number can conveniently and rapidly search and locate the node to which the node keyword belongs, the node number corresponds to the node level, if the node number starts with the A, namely the core node, the node number starts with the B, namely the secondary core node, the node number starts with the C, namely the peripheral node and the like, the above is exemplified, and various information of the node, such as the node frequency, the related node, the similar node and the like, can be correspondingly returned according to the query according to the implementation condition. The degree of the node (i.e., the total number of related nodes and similar nodes) can also be accumulated based on the number of related nodes and similar nodes returned. Here, the related nodes and similar nodes returned can be queried according to the node keywords, the secondary related nodes and secondary similar nodes of the original node can be searched correspondingly according to the related nodes and similar nodes (i.e. the related nodes and similar nodes are used as query words to query a preset value table, the related nodes and similar nodes of the query words are obtained), the specific query can be selected to query once according to the implementation condition or can be queried for multiple times according to the query results, and the specific query is not limited herein.

After the keyword set is obtained, the keyword set can be traversed, a preset value table is queried for any keyword contained in the keyword set, and node keywords matched with the keywords are obtained from the keyword set, namely whether the corresponding node keywords exist in the preset value table or not is queried according to the keywords, if yes, the node keywords are classified according to the level of the affiliated node, and the matched node sets of different levels are obtained. The matching node set comprises a core node set, a secondary core node set and a peripheral node set. If the node number of the node keyword is AXXXX, the level of the node to which the node keyword belongs can be determined according to the node number, and the node keyword is classified as a core node set. If the preset value table is queried according to the keywords, the node keywords matched with the keywords are not obtained, the keywords in the keyword set can be classified into a non-value matching node set, and the non-value matching node set is not used for calculating the text value. Here, the keywords contained in each of the core node set, the secondary core node set, the peripheral node set, and the non-value matching node set are not repeated.

Further, the preset value table may be preset, and may also be updated according to a new text, as shown in fig. 2:

Step S201, splitting the first text into a plurality of sentences, performing first word segmentation on the sentences, and acquiring part-of-speech information, grammar dependency relationship and semantic dependency relationship information of each first word segmentation.

Corresponding to any new text (hereinafter referred to as a first text), splitting the text into a plurality of sentences, performing first word segmentation processing on each sentence, for example, performing first word segmentation processing by using hanlp (Han Language Processing, chinese language processing package), and performing word segmentation, part-of-speech tagging, entity recognition and the like, so as to obtain each first word of the sentence, part-of-speech information, grammar dependency relationship and semantic dependency relationship information of the first word. If the first text is split to obtain multiple sentences, D= { s _i ，i＝1，The composition of the present invention is a combination of 2, a.n., (s is therein _i Represents the ith sentence in the first text, N represents the total number of sentences in the first text, s _i ＝{w _j J=1, 2, V }, where w _j Representative sentence s _i V represents the total number of first partial words.

Step S202, extracting to-be-processed segmented words according to part-of-speech information, grammar dependency relation and semantic dependency relation information of each first segmented word, and filtering to-be-processed segmented words to obtain to-be-processed segmented word sets.

According to each first word division w _j Part-of-speech information, grammatical dependency, semantic dependency information, may be used to count each first word segment w _j According to the byte size, performing sliding window operation with the size of n, and extracting to obtain the word to be processed. The word segmentation to be processed adopts a n gram mode.

Further, after the word to be processed is obtained, filtering processing is carried out on the word to be processed, wherein the filtering processing comprises stopping word filtering, digital filtering, low-frequency character name filtering, digital word filtering, part-of-speech filtering and keyword filtering, a filtering list can be set during the filtering, and daily commonly used words are removed according to the filtering list, so that new words can be found out more quickly and used for updating a preset value table. And filtering to obtain a word segmentation set to be processed.

Step S203, extracting word segmentation feature sets of word segmentation sets to be processed, core feature sets of core node keywords of a preset value table, sub-core feature sets of sub-core node keywords and peripheral feature sets of peripheral node keywords based on a preset model; calculating according to the number of word feature sets, core feature sets and core node keywords to obtain core similarity of each word in the word set to be processed, calculating according to the number of word feature sets, sub-core feature sets and sub-core node keywords to obtain sub-core similarity of each word in the word set to be processed, and calculating according to the number of word feature sets, peripheral feature sets and peripheral node keywords to obtain peripheral similarity of each word in the word set to be processed.

For the word set to be processed, O= { n ₁ ，n ₂ ，...，n _i ，...，n _m Wherein m represents the total number of words to be processed of the obtained ngram, and a preset model, such as a self-coding language model of a pre-trained BERT and the like, can be adopted to extract and obtain a word segmentation feature set of the word segmentation set O to be processed, f _O ＝{f _Oi ，i＝1，2，...，m}，f _O ∈R ^m×d ，R ^m×d A real space with m x d dimension, and d is a characteristic dimension. Wherein f _Oi The acquisition of (c) can be based on the following formula:

f _Oi ＝LM(n _i )

wherein LM represents a preset model, n _i For the ith word segment in the word segment set to be processed, f _Oi Is n _i Is a word segmentation feature of (a). Correspondingly, according to the formula, the core feature set f of each core node keyword of the preset value table can be obtained by utilizing the preset model _A Sub-core feature set f of individual sub-core node keywords _B Peripheral feature set f of individual peripheral node keywords _C 。

Obtaining word segmentation feature set f _O Core feature set f of core node keywords of preset value table _A Sub-core feature set f of sub-core node keywords _B Peripheral feature set f of peripheral node keywords _C Thereafter, each similarity can be calculated based on each feature set, and the feature set f is segmented _O Core feature set f _A And calculating the number of key words of the core nodes to obtain core similarity, and acquiring a feature set f according to the word segmentation _O Sub-core feature set f _B Calculating the number of secondary core node keywords to obtain secondary core similarity, and according to the word segmentation feature set f _O Peripheral feature set f _C And calculating the number of the peripheral node keywords to obtain peripheral similarity, specifically taking the core similarity as an example, referring to the following formula:

wherein, A is the number of core node keywords, f _Aj Is the word segmentation characteristic of the j-th core node keyword, T is a transposition function, sim _A Is the core similarity. Correspondingly, according to the above formula, the word segmentation feature set f can be used _O Sub-core feature set f _B Calculating the number of secondary core node keywords to obtain secondary core similarity sim _B According to the word segmentation feature set f _O Peripheral feature set f _C Calculating the number of peripheral node keywords to obtain peripheral similarity sim _C 。sim _A 、sim _B 、sim _C The range of the value of (2) is 0-1.

Step S204, traversing the word segmentation set to be processed, comparing the core similarity of the word segmentation with a preset core threshold value for any word segmentation, and judging whether the core similarity is larger than or equal to the preset core threshold value.

After the core similarity, the secondary core similarity and the peripheral similarity of each word in the word segmentation set to be processed are obtained through calculation, traversing the word segmentation set to be processed, and aiming at any word, firstly, carrying out core similarity sim of the word segmentation _A Comparing with a preset core threshold value, if sim _A If the score is greater than or equal to the preset core threshold, step S207 is executed to add the score to the preset value table, if sim _A =1, indicating that the segmentation has been preset for a value scale, without addition. If the core similarity sim _A And less than the preset core threshold, step S205 is performed.

Step S205, comparing the sub-core similarity of the word segmentation with a preset sub-core threshold value, and judging whether the sub-core similarity is larger than or equal to the preset sub-core threshold value.

If the core similarity sim _A If the similarity is smaller than the preset core threshold value, the sub-core similarity sim of the word is further processed _B Comparing with a preset sub-core threshold, if sim _B If the score is greater than or equal to the preset sub-core threshold, step S207 is executed to add the score to the preset value table, if sim _B =1, indicating that the segmentation has been preset for a value scale, without addition.If the sub-core similarity sim _B And is smaller than the preset secondary core threshold, step S206 is performed.

Step S206, comparing the peripheral similarity of the word segmentation with a preset peripheral threshold value, and judging whether the peripheral similarity is larger than or equal to the preset peripheral threshold value.

If the sub-core similarity sim _B If the peripheral similarity sim of the word is smaller than the secondary preset core threshold value, the peripheral similarity sim of the word is further processed _C Comparing with a preset peripheral threshold value, if sim _C If the value is greater than or equal to the preset peripheral threshold, step S207 is executed to add the word segment to the preset value table, if sim _C =1, indicating that the segmentation has been preset for a value scale, without addition. If the peripheral similarity sim _C If the value of the word is smaller than the preset peripheral threshold value, the word is not in accordance with the requirement of the preset value table, the word does not belong to the main stream value, and the word is discarded. After discarding the word, the word segmentation set to be processed can be traversed to obtain the next word segment, and the core similarity, the secondary core similarity and the peripheral similarity of the next word segment are judged until all the word segments in the word segmentation set to be processed are traversed, so that the updating of the preset value table is completed.

Step S207, adding the segmentation into a preset value list.

When the core similarity is judged to be greater than or equal to a preset core threshold, or the secondary core similarity is judged to be greater than or equal to a preset secondary core threshold, or the peripheral similarity is judged to be greater than or equal to a preset peripheral threshold, the segmentation can be added into a preset value table, and corresponding core node keywords, secondary core node keywords, peripheral node keywords and the like can be correspondingly added according to judgment conditions. After the word segmentation is added into the preset value list, traversing the word segmentation set to be processed to obtain the next word segmentation, and judging the core similarity, the secondary core similarity and the peripheral similarity of the next word segmentation until traversing all the word segmentation in the word segmentation set to be processed, and finishing updating the preset value list.

Step S103, calculating to obtain the value data of the text according to the number and the weight of the matched node sets of different levels.

After the matching node sets are obtained, the value data of the text can be calculated according to the number of keywords respectively contained in the matching node sets of different levels and the weights corresponding to the matching node sets of different levels. Specifically, according to the matched node sets, a first product of the number of the core node sets and the weight of the core node, a second product of the number of the secondary core node sets and the weight of the secondary core node, a third product of the number of the peripheral node sets and the weight of the peripheral node, and a fourth product of the number of the keyword sets and the weight of the core node are calculated respectively, the first product, the second product and the third product are accumulated, and the ratio of the accumulated result to the fourth product is calculated, specifically referring to the following formula:

wherein |A| in the formula (1) is the number of core node sets, |B| is the number of secondary core node sets, |C| is the number of peripheral node sets, |S| is the number of keyword sets, and alpha '' _A As the weight of the core node, alpha' _B For secondary core node weights, α' _C And v is the value intermediate data of the text for the peripheral node weight.

Considering that certain non-valuable keywords may exist in the keyword set obtained by word segmentation, so that the non-valuable matching node set contains too many keywords, and the calculated value intermediate data v of the text is smaller, the embodiment corrects the value according to the preset index contrast value to obtain the value data of the text, and the following formula is referred to:

v′＝v ^0.3 (2)

wherein v 'in the formula (2) is the value data of the text, the preset index is 0.3, and the value data v' of the text after correction is obtained by stretching v by using a power function. Based on the calculation, if the matched node sets obtained by matching the keyword sets are core node sets, the value data v '=1 of the obtained text, and if the matched node sets are non-value matched node sets, the value data v' of the text is determined to be 0.

Further, the calculation of each weight is specifically as follows: the core node weight is obtained by carrying out normalization processing on the first sum value of each node keyword in the core node set; the first sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the accumulated core node set and preset weights and the node frequency of the node keywords; the secondary core node weight is obtained by carrying out normalization processing on the second sum value of each node keyword in the secondary core node set; the second sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the accumulated secondary core node set and preset weights and the node frequency of the node keywords; the peripheral node weight is obtained by carrying out normalization processing according to the third sum value of the key words of each node in the peripheral node set; the third sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the accumulated peripheral node set and the preset weight and the node frequency of the node keywords, and the specific reference is as follows:

α′ _A ＝softmax(∑ _x∈A [f _x +λd _x ]) (3)

Wherein, alpha 'in formula (3)' _A Weight of core node, d _x Representing the number of related nodes and similar nodes of each node keyword in the core node set, wherein A is the core node set, the value range of x is the core node set, and f _x And (3) representing node frequencies of key words of all nodes in the core node set, wherein lambda is a preset weight, and softmax is a normalization function.

α′ _B ＝softmax(∑ _x∈B [f _x +λd _x ]) (4)

Wherein, alpha 'in formula (4)' _B Weight for secondary core node, d _x Representing the number of related nodes and similar nodes of each node keyword in the secondary core node set, wherein B is the secondary core node set, the value range of x is the secondary core node set, and f _x And expressing the node frequency of each node keyword in the secondary core node set, wherein lambda is a preset weight, and softmax is a normalization function.

α′ _C ＝softmax(∑ _x∈C [f _x +λd _x ]) (5)

Wherein, alpha 'in formula (5)' _C Weighting peripheral nodes, d _x Representing the number of related nodes and similar nodes of each node keyword in the peripheral node set, wherein C is the peripheral node set, the value range of x is the peripheral node set, and f _x And expressing node frequencies of key words of all nodes in the peripheral node set, wherein lambda is a preset weight for balancing the scale difference between the number of related nodes and similar nodes and the node frequencies, and specifically, the method is set according to the implementation condition, and softmax is a normalization function.

Each weight is determined according to various attribute information of each keyword in the matched node set of different levels in a preset value table, such as the number of related nodes and similar nodes and the node frequency, and the higher the node frequency in the corresponding preset value table is, the larger the value data is, namely the larger the weight is; if the number of the corresponding related nodes and the number of the similar nodes are larger, the keyword is indicated to belong to an important hub in the preset value table, and the weight of the keyword is larger.

According to the text value calculation method based on the value table, the text is segmented, the matching node sets of different levels contained in the text are determined by matching the keywords in the text with the node keywords in the preset value table, and then the value data of the text is calculated according to the number and the weight of the matching node sets of different levels, so that the text value is determined based on the preset value table.

Fig. 3 shows a schematic structural diagram of a text value calculating device based on a value table according to an embodiment of the present invention. As shown in fig. 3, the apparatus includes:

the word segmentation module 310 is adapted to perform word segmentation on the text to obtain a keyword set containing a plurality of keywords;

The matching module 320 is adapted to traverse the keyword set based on a preset value table, query the node keywords matched with the keywords, and obtain matching node sets of different levels; the preset value table comprises a plurality of preset level nodes; each node includes a node key;

the value calculation module 330 is adapted to calculate the value data of the text according to the number and the weight of the matched node sets with different levels.

Optionally, presetting the plurality of level nodes includes: core nodes, secondary core nodes and peripheral nodes; each node further comprises: node number, node frequency, related nodes, and similar nodes.

Optionally, the matching module 320 is further adapted to:

traversing the keyword set, and inquiring a preset value table aiming at any keyword to obtain node keywords matched with the keywords;

classifying the node keywords according to the levels of the nodes to which the node keywords belong to obtain matching node sets of different levels; the matching node set comprises a core node set, a secondary core node set and a peripheral node set.

Optionally, the value calculation module 330 is further adapted to:

calculating to obtain a first product of the number of the core node sets and the weight of the core node, a second product of the number of the secondary core node sets and the weight of the secondary core node, a third product of the number of the peripheral node sets and the weight of the peripheral node, and a fourth product of the number of the keyword sets and the weight of the core node; the core node weight is obtained by carrying out normalization processing on the first sum value of each node keyword in the core node set; the first sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the accumulated core node set and preset weights and the node frequency of the node keywords; the secondary core node weight is obtained by carrying out normalization processing on the second sum value of each node keyword in the secondary core node set; the second sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the accumulated secondary core node set and preset weights and the node frequency of the node keywords; the peripheral node weight is obtained by carrying out normalization processing according to the third sum value of the key words of each node in the peripheral node set; the third sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the accumulated peripheral node set and preset weights and the node frequency of the node keywords;

And accumulating the first product, the second product and the third product, calculating the ratio of the accumulated result to the fourth product, and correcting the ratio according to a preset index to obtain the value data of the text.

Optionally, the apparatus further comprises: the non-matching module 340 is adapted to classify the keywords into a non-value matching node set if the preset value table is queried and the node keywords matching the keywords are not obtained.

Optionally, the apparatus further comprises: the non-matching value module 350 is adapted to determine that the value data of the text is 0 if the set of matching nodes is a set of non-value matching nodes.

Optionally, the word segmentation module 310 is further adapted to:

preprocessing the text; the preprocessing comprises format filtering processing and stop word filtering processing;

processing the text according to punctuation marks, and splitting the text into a plurality of sentences;

word segmentation processing is carried out on each sentence, and each phrase contained in each sentence is obtained;

and combining the phrases based on a preset expansion word list to obtain corresponding keywords to form a keyword set.

Optionally, the apparatus further comprises: the updating module 360 is adapted to split the first text into a plurality of sentences, perform a first word segmentation process on the plurality of sentences, and acquire part-of-speech information, grammar dependency relationship and semantic dependency relationship information of each first word; extracting to-be-processed word segments according to part-of-speech information, grammar dependency relation and semantic dependency relation information of each first word segment, and filtering to-be-processed word segments to obtain a to-be-processed word segment set; the filtering processing comprises stop word filtering, digital filtering, low-frequency character name filtering, digital word quantity filtering, part-of-speech filtering and keyword filtering; extracting word segmentation feature sets of word segmentation sets to be processed, core feature sets of core node keywords of a preset value table, secondary core feature sets of secondary core node keywords and peripheral feature sets of peripheral node keywords based on a preset model; calculating according to the number of word feature sets, core feature sets and core node keywords to obtain core similarity of each word in the word set to be processed, calculating according to the number of word feature sets, sub-core feature sets and sub-core node keywords to obtain sub-core similarity of each word in the word set to be processed, and calculating according to the number of word feature sets, peripheral feature sets and peripheral node keywords to obtain peripheral similarity of each word in the word set to be processed; traversing a word segmentation set to be processed, comparing the core similarity of the word segments with a preset core threshold value aiming at any word segment, and adding the word segments into a preset value table if the core similarity is greater than or equal to the preset core threshold value; if the core similarity is smaller than a preset core threshold, comparing the sub-core similarity of the segmented words with a preset sub-core threshold, and if the sub-core similarity is larger than or equal to the preset sub-core threshold, adding the segmented words into a preset value table; if the secondary core similarity is smaller than a preset secondary core threshold, comparing the peripheral similarity of the segmented words with a preset peripheral threshold, and if the peripheral similarity is larger than or equal to the preset peripheral threshold, adding the segmented words into a preset value table.

The above descriptions of the modules refer to the corresponding descriptions in the method embodiments, and are not repeated herein.

The embodiment of the invention also provides a nonvolatile computer storage medium, and the computer storage medium stores at least one executable instruction which can execute the text value calculation method based on the value table in any method embodiment.

FIG. 4 illustrates a schematic diagram of a computing device, according to an embodiment of the invention, the particular embodiment of which is not limiting of the particular implementation of the computing device.

As shown in fig. 4, the computing device may include: a processor 402, a communication interface (Communications Interface) 404, a memory 406, and a communication bus 408.

Wherein:

processor 402, communication interface 404, and memory 406 communicate with each other via communication bus 408.

A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.

Processor 402 is configured to execute program 410, and may specifically perform relevant steps in the embodiments of the value-table-based text value calculation method described above.

In particular, program 410 may include program code including computer-operating instructions.

The processor 402 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included by the computing device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

Memory 406 for storing programs 410. Memory 406 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 410 may be specifically operable to cause processor 402 to perform a value table-based text value calculation method in any of the method embodiments described above. The specific implementation of each step in the procedure 410 may refer to the corresponding descriptions in the corresponding steps and units in the text value calculation embodiment based on the value table, which are not described herein. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It should be appreciated that the teachings of embodiments of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of preferred embodiments of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., an embodiment of the invention that is claimed, requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). Embodiments of the present invention may also be implemented as a device or apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the embodiments of the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. Embodiments of the invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specifically stated.

Claims

1. A text value calculation method based on a value table is characterized by comprising the following steps:

traversing the keyword set based on a preset value table, and inquiring the preset value table aiming at any keyword to obtain a node keyword matched with the keyword; classifying the node keywords according to the levels of the nodes to which the node keywords belong to obtain matching node sets of different levels; wherein the preset value table comprises a plurality of preset level nodes; each node includes a node key; the preset multiple level nodes comprise: core nodes, secondary core nodes and peripheral nodes; each node further comprises: node number, node frequency, related nodes and similar nodes; the matching node set comprises a core node set, a secondary core node set and a peripheral node set; the preset value scale: splitting the first text into a plurality of sentences, performing first word segmentation on the sentences, and acquiring part-of-speech information, grammar dependency relationship and semantic dependency relationship information of each first word; the first text is any new text; extracting to-be-processed word segments according to the part-of-speech information, the grammar dependency relationship and the semantic dependency relationship information of each first word segment, and filtering the to-be-processed word segments to obtain a to-be-processed word segment set; the filtering processing comprises stop word filtering, digital filtering, low-frequency character name filtering, digital word filtering, part-of-speech filtering and keyword filtering; extracting a word segmentation feature set of the word segmentation set to be processed, a core feature set of a core node keyword of the preset value table, a secondary core feature set of a secondary core node keyword and a peripheral feature set of a peripheral node keyword based on a preset model; calculating according to the number of the word segmentation feature set, the core feature set and the core node keywords to obtain the core similarity of each word segment in the word segmentation set to be processed, calculating according to the number of the word segmentation feature set, the secondary core feature set and the secondary core node keywords to obtain the secondary core similarity of each word segment in the word segmentation set to be processed, and calculating according to the number of the word segmentation feature set, the peripheral feature set and the peripheral node keywords to obtain the peripheral similarity of each word segment in the word segmentation set to be processed; traversing the word segmentation set to be processed, comparing the core similarity of the word segments with a preset core threshold value for any word segment, and adding the word segment into the preset value table if the core similarity is greater than or equal to the preset core threshold value; if the core similarity is smaller than the preset core threshold, comparing the sub-core similarity of the segmented word with a preset sub-core threshold, and if the sub-core similarity is larger than or equal to the preset sub-core threshold, adding the segmented word into the preset value table; if the secondary core similarity is smaller than the preset secondary core threshold, comparing the peripheral similarity of the segmented words with a preset peripheral threshold, and if the peripheral similarity is larger than or equal to the preset peripheral threshold, adding the segmented words into the preset value table;

According to the number and the weight of the matched node sets of different levels, calculating to obtain a first product of the number of the core node sets and the weight of the core node, a second product of the number of the secondary core node sets and the weight of the secondary core node, a third product of the number of the peripheral node sets and the weight of the peripheral node, and a fourth product of the number of the keyword sets and the weight of the core node; the core node weight is obtained by carrying out normalization processing on the first sum value of each node keyword in the core node set; the first sum value is obtained by accumulating the sum of the product of the number of related nodes and similar nodes of each node keyword in the core node set and preset weights and the node frequency of the node keywords; the secondary core node weight is obtained by carrying out normalization processing on the second sum value of each node keyword in the secondary core node set; the second sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the secondary core node set and preset weights and the node frequency of the node keywords; the peripheral node weight is obtained by carrying out normalization processing on the third sum value of each node keyword in the peripheral node set; the third sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the peripheral node set and the preset weight and the node frequency of the node keywords; and accumulating the first product, the second product and the third product, calculating the ratio of the accumulated result to the fourth product, and correcting the ratio according to a preset index to obtain the value data of the text.

2. The method according to claim 1, wherein the method further comprises:

and if the preset value list is queried, node keywords matched with the keywords are not obtained, and the keywords are classified into a non-value matching node set.

3. The method according to claim 2, wherein the method further comprises:

and if the matching node set is a non-value matching node set, determining that the value data of the text is 0.

4. The method of claim 1, wherein the word segmentation of the text to obtain a keyword set comprising a plurality of keywords further comprises:

5. A value meter-based text value computing device, the device comprising:

the matching module is suitable for traversing the keyword set based on a preset value table, inquiring the preset value table aiming at any keyword to obtain a node keyword matched with the keyword; classifying the node keywords according to the levels of the nodes to which the node keywords belong to obtain matching node sets of different levels; wherein the preset value table comprises a plurality of preset level nodes; each node includes a node key; the preset multiple level nodes comprise: core nodes, secondary core nodes and peripheral nodes; each node further comprises: node number, node frequency, related nodes and similar nodes; the matching node set comprises a core node set, a secondary core node set and a peripheral node set; the preset value scale: splitting the first text into a plurality of sentences, performing first word segmentation on the sentences, and acquiring part-of-speech information, grammar dependency relationship and semantic dependency relationship information of each first word; the first text is any new text; extracting to-be-processed word segments according to the part-of-speech information, the grammar dependency relationship and the semantic dependency relationship information of each first word segment, and filtering the to-be-processed word segments to obtain a to-be-processed word segment set; the filtering processing comprises stop word filtering, digital filtering, low-frequency character name filtering, digital word filtering, part-of-speech filtering and keyword filtering; extracting a word segmentation feature set of the word segmentation set to be processed, a core feature set of a core node keyword of the preset value table, a secondary core feature set of a secondary core node keyword and a peripheral feature set of a peripheral node keyword based on a preset model; calculating according to the number of the word segmentation feature set, the core feature set and the core node keywords to obtain the core similarity of each word segment in the word segmentation set to be processed, calculating according to the number of the word segmentation feature set, the secondary core feature set and the secondary core node keywords to obtain the secondary core similarity of each word segment in the word segmentation set to be processed, and calculating according to the number of the word segmentation feature set, the peripheral feature set and the peripheral node keywords to obtain the peripheral similarity of each word segment in the word segmentation set to be processed; traversing the word segmentation set to be processed, comparing the core similarity of the word segments with a preset core threshold value for any word segment, and adding the word segment into the preset value table if the core similarity is greater than or equal to the preset core threshold value; if the core similarity is smaller than the preset core threshold, comparing the sub-core similarity of the segmented word with a preset sub-core threshold, and if the sub-core similarity is larger than or equal to the preset sub-core threshold, adding the segmented word into the preset value table; if the secondary core similarity is smaller than the preset secondary core threshold, comparing the peripheral similarity of the segmented words with a preset peripheral threshold, and if the peripheral similarity is larger than or equal to the preset peripheral threshold, adding the segmented words into the preset value table;

The value calculation module is suitable for calculating to obtain a first product of the number of the core node sets and the weight of the core node, a second product of the number of the secondary core node sets and the weight of the secondary core node, a third product of the number of the peripheral node sets and the weight of the peripheral node, and a fourth product of the number of the keyword sets and the weight of the core node according to the number and the weight of the matching node sets of different levels; the core node weight is obtained by carrying out normalization processing on the first sum value of each node keyword in the core node set; the first sum value is obtained by accumulating the sum of the product of the number of related nodes and similar nodes of each node keyword in the core node set and preset weights and the node frequency of the node keywords; the secondary core node weight is obtained by carrying out normalization processing on the second sum value of each node keyword in the secondary core node set; the second sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the secondary core node set and preset weights and the node frequency of the node keywords; the peripheral node weight is obtained by carrying out normalization processing on the third sum value of each node keyword in the peripheral node set; the third sum value is obtained according to the sum of the product of the number of related nodes and similar nodes of each node keyword in the peripheral node set and the preset weight and the node frequency of the node keywords; and accumulating the first product, the second product and the third product, calculating the ratio of the accumulated result to the fourth product, and correcting the ratio according to a preset index to obtain the value data of the text.

6. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the value table-based text value calculation method according to any one of claims 1 to 4.

7. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the value table-based text value calculation method of any one of claims 1 to 4.