CN108846023A - The unconventional characteristic method for digging and device of text - Google Patents

The unconventional characteristic method for digging and device of text Download PDF

Info

Publication number
CN108846023A
CN108846023A CN201810507576.5A CN201810507576A CN108846023A CN 108846023 A CN108846023 A CN 108846023A CN 201810507576 A CN201810507576 A CN 201810507576A CN 108846023 A CN108846023 A CN 108846023A
Authority
CN
China
Prior art keywords
keyword
word
degree
text
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810507576.5A
Other languages
Chinese (zh)
Inventor
田兴邦
杨喆
何国涛
李全忠
蒲瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puqiang times (Zhuhai Hengqin) Information Technology Co., Ltd
Original Assignee
Universal Information Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universal Information Technology (beijing) Co Ltd filed Critical Universal Information Technology (beijing) Co Ltd
Priority to CN201810507576.5A priority Critical patent/CN108846023A/en
Publication of CN108846023A publication Critical patent/CN108846023A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides the unconventional characteristic method for digging and device of a kind of text, the method includes:Obtain in text to be excavated the degree of association between the information content of each keyword and each word;The degree of association between information content and each word based on each keyword obtains word figure, and a node in word figure indicates that a keyword, the initial value of node are the information content of keyword, the degree of association between the word of two keywords of side expression in word figure;Recursive operation is carried out until convergence to the information content of each keyword in word figure, the different degree of each keyword is obtained, using the maximum keyword of different degree as the unconventional characteristic of text to be excavated.The unconventional characteristic method for digging and device of text provided by the invention, by the information content for calculating keyword, the degree of association between word, generate word figure, further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated, deep excavation can be carried out to text, obtain the unconventional characteristic of text.

Description

The unconventional characteristic method for digging and device of text
Technical field
The present invention relates to data mining technology field more particularly to the unconventional characteristic method for digging and dress of a kind of text It sets.
Background technique
With the development of science and technology there is a large amount of text data to generate daily.These texts are conventional, characteristic a bit It is precognition.But some be it is unconventional, characteristic is unpredictable.Known characteristic, can be according to its characteristic design method It performs an analysis, such as establishes the logical expression of keyword to search for whether text has important characteristic.How to the non-of text It is an important topic at this stage that normal attribute, which carries out analysis,.
In the prior art, common text analyzing method is three layers of bayesian probability model (Latent Dirichlet Allocation, LDA), the major function of LDA is to find out theme from a large amount of texts.Theme can be any amount, a certain text Also several themes be may include.Using such subject analysis, theme all kinds of in numerous texts can be found out.Here theme is One group of word defines.Such as the theme of sport will be defined with words such as movement, race, high jump, swimming, word also has a probability Distribution, relevant Word probability is higher, and incoherent Word probability is lower.
But method in the prior art is only applicable to analyze the normal attribute in the text of unknown scene, that is, passes through master Topic analysis is to determine theme involved in the text of unknown scene.And it is directed to the text of known fixed scene, to analyze text Unconventional characteristic in this, text analyzing method in the prior art are simultaneously not suitable for.
Summary of the invention
The object of the present invention is to provide the unconventional characteristic method for digging and device of a kind of text, solve in the prior art Text analyzing method the technical issues of not being suitable for the excavation of the unconventional characteristic in text.
In order to solve the above-mentioned technical problem, on the one hand, the present invention provides a kind of unconventional characteristic method for digging of text, packet It includes:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
On the other hand, the present invention provides a kind of unconventional characteristic excavating gear of text, which is characterized in that including:
Information content obtains module, for obtaining the information content of each keyword in text to be excavated;
The degree of association obtains module between word, for obtaining the degree of association between each word in the text to be excavated, between institute's predicate The degree of association indicates the correlation degree of two target keywords;
Word figure obtain module, for based on each keyword information content and each word between the degree of association, obtain word figure, A node in institute's predicate figure indicates a keyword, and the initial value of node is the information content of keyword, in institute's predicate figure The degree of association between the word of two keywords of side expression;
Unconventional characteristic obtains module, and it is straight to carry out recursive operation for the information content to each keyword in institute's predicate figure To convergence, the different degree of each keyword is obtained, using the maximum keyword of different degree as the unconventional of the text to be excavated Characteristic.
In another aspect, the present invention provides a kind of electronic equipment that the unconventional characteristic for text is excavated, including:
Memory and processor, the processor and the memory complete mutual communication by bus;It is described to deposit Reservoir is stored with the program instruction that can be executed by the processor, and it is above-mentioned that the processor calls described program instruction to be able to carry out Method.
Another aspect, the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, the meter Calculation machine program realizes above-mentioned method when being executed by processor.
The unconventional characteristic method for digging and device of text provided by the invention, by calculating the information content of keyword, Through the degree of association between word, word figure is generated, further according to word figure using the maximum keyword of different degree as the unconventional spy of text to be excavated Property, deep excavation can be carried out to text, obtain the unconventional characteristic of text.
Detailed description of the invention
Fig. 1 is the unconventional characteristic method for digging schematic diagram according to the text of the embodiment of the present invention;
Fig. 2 is to illustrate to be intended to according to the word of the text of the embodiment of the present invention;
Fig. 3 is the unconventional characteristic excavating gear schematic diagram according to the text of the embodiment of the present invention;
Fig. 4 is the structural representation for the electronic equipment that the unconventional characteristic provided in an embodiment of the present invention for text is excavated Figure.
Specific embodiment
In order to keep the purposes, technical schemes and advantages of the embodiment of the present invention clearer, implement below in conjunction with the present invention Attached drawing in example, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment It is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is according to the unconventional characteristic method for digging schematic diagram of the text of the embodiment of the present invention, as shown in Figure 1, this hair Bright embodiment provides a kind of unconventional characteristic method for digging of text, and executing subject is that the unconventional characteristic of text excavates dress It sets, this method includes:
Step S101, the information content of each keyword in text to be excavated is obtained;
Step S102, the degree of association between each word in the text to be excavated is obtained, the degree of association indicates two between institute's predicate The correlation degree of target keyword;
Step S103, the degree of association between the information content and each word based on each keyword obtains word figure, institute's predicate figure In a node indicate that a keyword, the initial value of node are the information content of keyword, the side in institute's predicate figure indicates two The degree of association between the word of a keyword;
Step S104, recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtained every The different degree of one keyword, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
Specifically, the information content of each keyword is obtained from text to be excavated first, if the quantity of text to be excavated is Dry, several texts to be excavated can be stored in a general act and concentrated.One word a large amount of quilt in more a small amount of document In use, its information content is bigger.
Then, the degree of association between each word in the text to be excavated is obtained, the degree of association indicates two targets between institute's predicate The correlation degree of keyword, i.e., according to the frequency of co-occurrence (two keywords occurred in the same sentence), come calculate word with The degree of association between word.
The embodiment of the present invention by taking the text (not listed in detail in this specification) of telemarketing as an example, extract text in as Lower two word, carrys out the acquisition methods of the degree of association between declarer, below the word marked with runic in two words be to need to obtain word Between the degree of association keyword.
First:" you get well my credit card be just stolen brushed then just now I do not make whatever he just to I 180 Eight consumption ".
Second:" he is that is my Mobile banking of XX bank horse back just fails it and wants what network address I logs in ".
Wherein, this two word is divided into multiple words, is all the single word being divided into such as " hello " and " me ", and " credit Card " and " stolen brush " etc. are keyword with the word that runic marks, and " credit card " and " stolen brush " the two keywords occur simultaneously In same a word, it is therefore desirable to obtain the degree of association between the word of the two keywords.
The number occurred between word and word is calculated divided by the sum distinctly occurred.
Then, the degree of association between the information content and each word based on each keyword, obtains word figure, and one in word figure Node indicates that a keyword, the initial value of node are the information content of keyword, and the side in word figure indicates the word of two keywords Between the degree of association.
Fig. 2 is to illustrate to be intended to according to the word of the text of the embodiment of the present invention, as shown in Fig. 2, the embodiment of the present invention is with phone For the text (not listed in detail in this specification) of sale, according to the information content of each keyword got and each The degree of association between word generates word figure, and a node in word figure indicates that a keyword, the initial value of node are the letter of keyword Breath amount, the side in word figure indicate the degree of association between the words of two keywords.
Value on side is the degree of association between word, and each node contains a keyword, and each keyword gives a weight, keyword power It is the summation of around keyword weight and the degree of association again.
For example, the information content of keyword " remaining sum " is 77, the information content of keyword " downward " is 56, keyword " remaining sum " and The degree of association between keyword " downward " is 13.2.
Finally, carrying out recursive operation until convergence to the information content of each keyword in word figure, each keyword is obtained Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
The embodiment of the present invention excavates 6 very by taking the text (not listing in detail in this specification) of telemarketing as an example Characteristic is advised, each corresponding exception/significant event of unconventional characteristic is as follows respectively:
1, fraud text message generation link network address
Correlative is:" I just received the mobile phone of the user name of the application of the short message credit card of only one China bank Bank will be that failure please log in a network address "." trouble woulds you please to put this network address be a fraud text message trouble Network address can provide ".
2, brush is stolen
Correlative is:" be exactly well to you I that credit card just be stolen brushed then just now I do not had him whatever It is 188 consumption to me ".
3, retain card
Correlative is:" you get well I card I deposit me this card done not to my machine me "." having been gulped down by machine is You are now also beside machine ".
4, people's row new policy
Correlative is:" inter-bank, which is transferred accounts, to be four by the system for settling account twenty four hours of people's row to the time of account in total Account is arrived to you within 18 hours ".
5, it is middle silver for public relation education center withhold business the problem of
Correlative is:" public accumalation fund for housing construction should bales catch also (under the premise of remaining sum is enough), but do not have, cause to exceed Phase refunds "." it has detained, but still overdue main problem (70% or more) ".
6, channel phone is obstructed
Correlative is:" you be well it is such I just received a short message he say you amount lower "." your credits Phone is not got through at all in card center "." I has tried for a long time ".
The reason of also having found conclusion of the business in electric sale place scape:Client is allowed to understand insurance meeting " really settling a claim ".
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated Deep excavation is carried out to text, obtains the unconventional characteristic of text.
On the basis of the above embodiments, further, the keyword be contain much information in the word of preset threshold, and/ Or default word, the keyword do not include stop words.
Specifically, can choose the non-stop words to contain much information in preset threshold as keyword, it is also possible to user hand The word of dynamic setting.
For example, " you ", " I ", " he ", " once ", " a little while ", " well " and these words such as " this ", often occur Stop words should delete these stop words when selecting keyword.
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated Deep excavation is carried out to text, obtains the unconventional characteristic of text.
On the basis of the above various embodiments, further, the information content of the keyword is the TFIDF of the keyword Value.
The information content of each keyword in the text to be excavated is obtained by following formula:
A1=B1*log(C1/D1)
Wherein, A1For the TFIDF value of keyword, B1For the word frequency of keyword, C1For the total quantity of text, D1For comprising closing The quantity of the text of keyword.
Specifically, the information content of keyword is calculated in the embodiment of the present invention according to TFIDF algorithm in information theory, i.e., with key The TFIDF value of word is as information content.
Calculation formula is:
A1=B1*log(C1/D1)
Wherein, A1For the TFIDF value of keyword, B1For the word frequency of keyword, C1For the total quantity of text, D1For comprising closing The quantity of the text of keyword.
Word frequency indicates the number that a word occurs in general act collection.Document-frequency indicates the number of files that a word is used, i.e., All documents contain the quantity of this word.One word is in more a small amount of document largely by use, its information content is bigger.
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated Deep excavation is carried out to text, obtains the unconventional characteristic of text.
On the basis of the above various embodiments, further, closed between each word obtained in the text to be excavated Connection degree, the degree of association indicates the correlation degree of two target keywords between institute's predicate, specially:
Each target keyword pair is obtained, the target keyword is to by first object keyword and the second target keyword It constitutes, the number of the word between the first object keyword and second target keyword is less than preset value;
The co-occurrence frequency of each target keyword pair is obtained, the co-occurrence frequency is number of the target keyword to appearance;
Based on the co-occurrence frequency of each target keyword pair, the degree of association between each word in the text to be excavated is obtained, The degree of association indicates the correlation degree of two target keywords between institute's predicate.
The degree of association between each word in the text to be excavated is obtained by following formula:
A2=B2/(C1+C2-B2)
Wherein, A2The degree of association between the first object keyword and the word of second target keyword, B2It is described The common frequency of occurrence of one target keyword and second target keyword, C1Go out occurrence for the first object keyword Number, C2For the frequency of occurrence of second target keyword.
Specifically, in the degree of association between calculating word, it is necessary first to obtain each target keyword pair, target keyword to by Two target keywords are constituted, and the number of the word between the two target keywords is less than preset value, and the size of the preset value can It is adjusted according to practical application.
By taking the following sentence in telemarketing as an example:" then you, which get well my credit card and are just stolen, has brushed that I was whatever just now Do not make he just to me 188 consumption ".
Wherein, the number of the word between target keyword " credit card " and target keyword " stolen brush " is 1, it is assumed that default Value is 5, then " credit card " and " stolen brush " just constitutes a target keyword pair.
Then the co-occurrence frequency of each target keyword pair is obtained, co-occurrence frequency is number of the target keyword to appearance; Count in how many sentence while occurring target keyword " credit card " and target keyword " stolen brush ", and the two The number of word between keyword is less than preset value 5.
Finally, being obtained using preset technology formula described to be excavated according to the co-occurrence frequency of each target keyword pair The degree of association between each word in text.
It should be noted that the calculation formula of the degree of association is not unique between word in above-mentioned example method, and actually answering Without being limited thereto in, specific formula can depend on the circumstances.
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated Deep excavation is carried out to text, obtains the unconventional characteristic of text.
Fig. 3 is according to the unconventional characteristic excavating gear schematic diagram of the text of the embodiment of the present invention, as shown in figure 3, this hair Bright embodiment provides a kind of unconventional characteristic excavating gear of text, for completing method described in above-described embodiment, specifically The degree of association between module 301, word, which is obtained, including information content obtains module 302, word figure acquisition module 303 and unconventional characteristic acquisition mould Block 304, wherein
Information content obtains the information content that module 301 is used to obtain each keyword in text to be excavated;
The degree of association obtains module 302 and is used to obtain the degree of association between each word in the text to be excavated, institute's predicate between word Between the degree of association indicate two target keywords correlation degree;
Word figure obtains module 303 and is used for the degree of association between the information content based on each keyword and each word, obtains word Figure, a node in institute's predicate figure indicate a keyword, and the initial value of node is the information content of keyword, in institute's predicate figure Side indicate two keywords word between the degree of association;
Unconventional characteristic obtains module 304 and is used to carry out recursive operation to the information content of each keyword in institute's predicate figure Until convergence, the different degree of each keyword is obtained, very using the maximum keyword of different degree as the text to be excavated Advise characteristic.
The embodiment of the present invention provides a kind of unconventional characteristic excavating gear of text, for completing described in above-described embodiment Method, the device provided through this embodiment completes the specific steps and above-described embodiment of method described in above-described embodiment Identical, details are not described herein again.
The unconventional characteristic excavating gear of text provided by the invention, by calculating the information content of keyword, between word The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated Deep excavation is carried out to text, obtains the unconventional characteristic of text.
Fig. 4 is the structural representation for the electronic equipment that the unconventional characteristic provided in an embodiment of the present invention for text is excavated Figure, as shown in figure 4, the equipment includes:Processor 401, memory 402 and bus 403;
Wherein, processor 401 and memory 402 complete mutual communication by the bus 403;
Processor 401 is used to call the program instruction in memory 402, to execute provided by above-mentioned each method embodiment Method, for example including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
The embodiment of the present invention discloses a kind of computer program product, and the computer program product is non-transient including being stored in Computer program on computer readable storage medium, the computer program include program instruction, when described program instructs quilt When computer executes, computer is able to carry out method provided by above-mentioned each method embodiment, for example including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage Medium storing computer instruction, the computer instruction make the computer execute side provided by above-mentioned each method embodiment Method, for example including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
Those of ordinary skill in the art will appreciate that:Realize that all or part of the steps of above method embodiment can pass through The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes:ROM, RAM, magnetic disk or light The various media that can store program code such as disk.
The embodiments such as device and equipment described above are only schematical, wherein described be used as separate part description Unit may or may not be physically separated, component shown as a unit may or may not be Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying In the case where creative labor, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that:It still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (9)

1. a kind of unconventional characteristic method for digging of text, which is characterized in that including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates the pass of two target keywords between institute's predicate Connection degree;
The degree of association between information content and each word based on each keyword obtains word figure, a node table in institute's predicate figure Show that a keyword, the initial value of node are the information content of keyword, the side in institute's predicate figure indicates between the word of two keywords The degree of association;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains the important of each keyword Degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
2. the method according to claim 1, wherein the keyword is to contain much information in the word of preset threshold, And/or default word, the keyword do not include stop words.
3. the method according to claim 1, wherein the information content of the keyword is the keyword TFIDF value.
4. the method according to claim 1, wherein being closed between each word obtained in the text to be excavated Connection degree, the degree of association indicates the correlation degree of two target keywords between institute's predicate, specially:
Each target keyword pair is obtained, the target keyword is to by first object keyword and the second target keyword structure At the number of the word between the first object keyword and second target keyword is less than preset value;
The co-occurrence frequency of each target keyword pair is obtained, the co-occurrence frequency is number of the target keyword to appearance;
Based on the co-occurrence frequency of each target keyword pair, the degree of association between each word in the text to be excavated is obtained, it is described The degree of association indicates the correlation degree of two target keywords between word.
5. according to the method described in claim 3, it is characterized in that, obtaining each key in the text to be excavated by following formula The information content of word:
A1=B1*log(C1/D1)
Wherein, A1For the TFIDF value of keyword, B1For the word frequency of keyword, C1For the total quantity of text, D1To include keyword The quantity of text.
6. according to the method described in claim 4, it is characterized in that, obtaining each word in the text to be excavated by following formula Between the degree of association:
A2=B2/(C1+C2-B2)
Wherein, A2The degree of association between the first object keyword and the word of second target keyword, B2For first mesh Mark the common frequency of occurrence of keyword and second target keyword, C1For the frequency of occurrence of the first object keyword, C2 For the frequency of occurrence of second target keyword.
7. a kind of unconventional characteristic excavating gear of text, which is characterized in that including:
Information content obtains module, for obtaining the information content of each keyword in text to be excavated;
The degree of association obtains module between word, for obtaining the degree of association between each word in the text to be excavated, is associated between institute's predicate Degree indicates the correlation degree of two target keywords;
Word figure obtain module, for based on each keyword information content and each word between the degree of association, obtain word figure, it is described A node in word figure indicates that a keyword, the initial value of node are the information content of keyword, the side table in institute's predicate figure Show the degree of association between the word of two keywords;
Unconventional characteristic obtains module, carries out recursive operation until receiving for the information content to each keyword in institute's predicate figure It holds back, obtains the different degree of each keyword, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
8. a kind of electronic equipment that the unconventional characteristic for text is excavated, which is characterized in that including:
Memory and processor, the processor and the memory complete mutual communication by bus;The memory It is stored with the program instruction that can be executed by the processor, the processor calls described program instruction to be able to carry out right such as and wants Seek 1 to 6 any method.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The method as described in claim 1 to 6 is any is realized when processor executes.
CN201810507576.5A 2018-05-24 2018-05-24 The unconventional characteristic method for digging and device of text Pending CN108846023A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810507576.5A CN108846023A (en) 2018-05-24 2018-05-24 The unconventional characteristic method for digging and device of text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810507576.5A CN108846023A (en) 2018-05-24 2018-05-24 The unconventional characteristic method for digging and device of text

Publications (1)

Publication Number Publication Date
CN108846023A true CN108846023A (en) 2018-11-20

Family

ID=64213347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810507576.5A Pending CN108846023A (en) 2018-05-24 2018-05-24 The unconventional characteristic method for digging and device of text

Country Status (1)

Country Link
CN (1) CN108846023A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626044A (en) * 2020-05-14 2020-09-04 北京字节跳动网络技术有限公司 Text generation method and device, electronic equipment and computer readable storage medium
CN111831833A (en) * 2020-07-27 2020-10-27 人民卫生电子音像出版社有限公司 Knowledge graph construction method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
CN103399901B (en) * 2013-07-25 2016-06-08 三星电子(中国)研发中心 A kind of keyword abstraction method
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN106469187A (en) * 2016-08-29 2017-03-01 东软集团股份有限公司 The extracting method of key word and device
CN106776881A (en) * 2016-11-28 2017-05-31 中国科学院软件研究所 A kind of realm information commending system and method based on microblog
KR101769247B1 (en) * 2015-12-16 2017-08-18 건국대학교 산학협력단 Method and apparatus for comparing strings using hierarchical interval tree
CN107193803A (en) * 2017-05-26 2017-09-22 北京东方科诺科技发展有限公司 A kind of particular task text key word extracting method based on semanteme

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399901B (en) * 2013-07-25 2016-06-08 三星电子(中国)研发中心 A kind of keyword abstraction method
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
KR101769247B1 (en) * 2015-12-16 2017-08-18 건국대학교 산학협력단 Method and apparatus for comparing strings using hierarchical interval tree
CN106202042A (en) * 2016-07-06 2016-12-07 中央民族大学 A kind of keyword abstraction method based on figure
CN106469187A (en) * 2016-08-29 2017-03-01 东软集团股份有限公司 The extracting method of key word and device
CN106776881A (en) * 2016-11-28 2017-05-31 中国科学院软件研究所 A kind of realm information commending system and method based on microblog
CN107193803A (en) * 2017-05-26 2017-09-22 北京东方科诺科技发展有限公司 A kind of particular task text key word extracting method based on semanteme

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626044A (en) * 2020-05-14 2020-09-04 北京字节跳动网络技术有限公司 Text generation method and device, electronic equipment and computer readable storage medium
CN111626044B (en) * 2020-05-14 2023-06-30 北京字节跳动网络技术有限公司 Text generation method, text generation device, electronic equipment and computer readable storage medium
CN111831833A (en) * 2020-07-27 2020-10-27 人民卫生电子音像出版社有限公司 Knowledge graph construction method and device

Similar Documents

Publication Publication Date Title
Pinna et al. A petri nets model for blockchain analysis
Khan et al. Multi-gcn: Graph convolutional networks for multi-view networks, with applications to global poverty
CN106126521B (en) The social account method for digging and server of target object
CN109063966B (en) Risk account identification method and device
CN105051721B (en) Natural language description is converted to because of the program of the language different and different because of domain for electrical form
CN106469196A (en) Data processing method and its device, method of insuring and its system of insuring
CN109472568A (en) A kind of block chain method of commerce, device, management system, equipment and storage medium
CN105094572B (en) Menu self-adjusting roll screen display control method, server and portable terminal
CN109685336A (en) Collection task distribution method, device, computer equipment and storage medium
CN109600724A (en) A kind of method and apparatus that short message is sent
CN112287015A (en) Image generation system, image generation method, electronic device, and storage medium
CN107705199A (en) The generation method and device of feature calculation code
CN108595579A (en) Contact person's cohesion evaluation method, device, computer equipment and storage medium
TAN et al. Evaluation and improvement of procurement process with data analytics
CN108446968A (en) A kind of method, apparatus and terminal device of accounting entry
CN101202792B (en) Method and apparatus for processing messages based on relationship between sender and recipient
CN105159927B (en) Method and device for selecting subject term of target text and terminal
CN108846023A (en) The unconventional characteristic method for digging and device of text
CN109697626A (en) Marketing system, method, electric terminal and medium based on JeeSite quick development platform
CN109903122A (en) House prosperity transaction information processing method, device, equipment and storage medium
CN111930366A (en) Rule engine implementation method and system based on JIT real-time compilation
Rahimikia et al. Realised volatility forecasting: Machine learning via financial word embedding
CN111708897A (en) Target information determination method, device and equipment
CN113032001B (en) Intelligent contract classification method and device
Baldassini et al. client2vec: towards systematic baselines for banking applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200309

Address after: 519000 room 105-58115, No. 6, Baohua Road, Hengqin New District, Zhuhai City, Guangdong Province (centralized office area)

Applicant after: Puqiang times (Zhuhai Hengqin) Information Technology Co., Ltd

Address before: 100089 Haidian District, Beijing, Yongfeng Road, North Road, South East Road, F, 2 floor.

Applicant before: Puqiang Information Technology (Beijing) Co., Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181120