CN108846023A - The unconventional characteristic method for digging and device of text - Google Patents
The unconventional characteristic method for digging and device of text Download PDFInfo
- Publication number
- CN108846023A CN108846023A CN201810507576.5A CN201810507576A CN108846023A CN 108846023 A CN108846023 A CN 108846023A CN 201810507576 A CN201810507576 A CN 201810507576A CN 108846023 A CN108846023 A CN 108846023A
- Authority
- CN
- China
- Prior art keywords
- keyword
- word
- degree
- text
- association
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides the unconventional characteristic method for digging and device of a kind of text, the method includes:Obtain in text to be excavated the degree of association between the information content of each keyword and each word;The degree of association between information content and each word based on each keyword obtains word figure, and a node in word figure indicates that a keyword, the initial value of node are the information content of keyword, the degree of association between the word of two keywords of side expression in word figure;Recursive operation is carried out until convergence to the information content of each keyword in word figure, the different degree of each keyword is obtained, using the maximum keyword of different degree as the unconventional characteristic of text to be excavated.The unconventional characteristic method for digging and device of text provided by the invention, by the information content for calculating keyword, the degree of association between word, generate word figure, further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated, deep excavation can be carried out to text, obtain the unconventional characteristic of text.
Description
Technical field
The present invention relates to data mining technology field more particularly to the unconventional characteristic method for digging and dress of a kind of text
It sets.
Background technique
With the development of science and technology there is a large amount of text data to generate daily.These texts are conventional, characteristic a bit
It is precognition.But some be it is unconventional, characteristic is unpredictable.Known characteristic, can be according to its characteristic design method
It performs an analysis, such as establishes the logical expression of keyword to search for whether text has important characteristic.How to the non-of text
It is an important topic at this stage that normal attribute, which carries out analysis,.
In the prior art, common text analyzing method is three layers of bayesian probability model (Latent Dirichlet
Allocation, LDA), the major function of LDA is to find out theme from a large amount of texts.Theme can be any amount, a certain text
Also several themes be may include.Using such subject analysis, theme all kinds of in numerous texts can be found out.Here theme is
One group of word defines.Such as the theme of sport will be defined with words such as movement, race, high jump, swimming, word also has a probability
Distribution, relevant Word probability is higher, and incoherent Word probability is lower.
But method in the prior art is only applicable to analyze the normal attribute in the text of unknown scene, that is, passes through master
Topic analysis is to determine theme involved in the text of unknown scene.And it is directed to the text of known fixed scene, to analyze text
Unconventional characteristic in this, text analyzing method in the prior art are simultaneously not suitable for.
Summary of the invention
The object of the present invention is to provide the unconventional characteristic method for digging and device of a kind of text, solve in the prior art
Text analyzing method the technical issues of not being suitable for the excavation of the unconventional characteristic in text.
In order to solve the above-mentioned technical problem, on the one hand, the present invention provides a kind of unconventional characteristic method for digging of text, packet
It includes:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate
Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure
Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords
The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword
Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
On the other hand, the present invention provides a kind of unconventional characteristic excavating gear of text, which is characterized in that including:
Information content obtains module, for obtaining the information content of each keyword in text to be excavated;
The degree of association obtains module between word, for obtaining the degree of association between each word in the text to be excavated, between institute's predicate
The degree of association indicates the correlation degree of two target keywords;
Word figure obtain module, for based on each keyword information content and each word between the degree of association, obtain word figure,
A node in institute's predicate figure indicates a keyword, and the initial value of node is the information content of keyword, in institute's predicate figure
The degree of association between the word of two keywords of side expression;
Unconventional characteristic obtains module, and it is straight to carry out recursive operation for the information content to each keyword in institute's predicate figure
To convergence, the different degree of each keyword is obtained, using the maximum keyword of different degree as the unconventional of the text to be excavated
Characteristic.
In another aspect, the present invention provides a kind of electronic equipment that the unconventional characteristic for text is excavated, including:
Memory and processor, the processor and the memory complete mutual communication by bus;It is described to deposit
Reservoir is stored with the program instruction that can be executed by the processor, and it is above-mentioned that the processor calls described program instruction to be able to carry out
Method.
Another aspect, the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, the meter
Calculation machine program realizes above-mentioned method when being executed by processor.
The unconventional characteristic method for digging and device of text provided by the invention, by calculating the information content of keyword,
Through the degree of association between word, word figure is generated, further according to word figure using the maximum keyword of different degree as the unconventional spy of text to be excavated
Property, deep excavation can be carried out to text, obtain the unconventional characteristic of text.
Detailed description of the invention
Fig. 1 is the unconventional characteristic method for digging schematic diagram according to the text of the embodiment of the present invention;
Fig. 2 is to illustrate to be intended to according to the word of the text of the embodiment of the present invention;
Fig. 3 is the unconventional characteristic excavating gear schematic diagram according to the text of the embodiment of the present invention;
Fig. 4 is the structural representation for the electronic equipment that the unconventional characteristic provided in an embodiment of the present invention for text is excavated
Figure.
Specific embodiment
In order to keep the purposes, technical schemes and advantages of the embodiment of the present invention clearer, implement below in conjunction with the present invention
Attached drawing in example, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment
It is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiment of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Fig. 1 is according to the unconventional characteristic method for digging schematic diagram of the text of the embodiment of the present invention, as shown in Figure 1, this hair
Bright embodiment provides a kind of unconventional characteristic method for digging of text, and executing subject is that the unconventional characteristic of text excavates dress
It sets, this method includes:
Step S101, the information content of each keyword in text to be excavated is obtained;
Step S102, the degree of association between each word in the text to be excavated is obtained, the degree of association indicates two between institute's predicate
The correlation degree of target keyword;
Step S103, the degree of association between the information content and each word based on each keyword obtains word figure, institute's predicate figure
In a node indicate that a keyword, the initial value of node are the information content of keyword, the side in institute's predicate figure indicates two
The degree of association between the word of a keyword;
Step S104, recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtained every
The different degree of one keyword, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
Specifically, the information content of each keyword is obtained from text to be excavated first, if the quantity of text to be excavated is
Dry, several texts to be excavated can be stored in a general act and concentrated.One word a large amount of quilt in more a small amount of document
In use, its information content is bigger.
Then, the degree of association between each word in the text to be excavated is obtained, the degree of association indicates two targets between institute's predicate
The correlation degree of keyword, i.e., according to the frequency of co-occurrence (two keywords occurred in the same sentence), come calculate word with
The degree of association between word.
The embodiment of the present invention by taking the text (not listed in detail in this specification) of telemarketing as an example, extract text in as
Lower two word, carrys out the acquisition methods of the degree of association between declarer, below the word marked with runic in two words be to need to obtain word
Between the degree of association keyword.
First:" you get well my credit card be just stolen brushed then just now I do not make whatever he just to I 180
Eight consumption ".
Second:" he is that is my Mobile banking of XX bank horse back just fails it and wants what network address I logs in ".
Wherein, this two word is divided into multiple words, is all the single word being divided into such as " hello " and " me ", and " credit
Card " and " stolen brush " etc. are keyword with the word that runic marks, and " credit card " and " stolen brush " the two keywords occur simultaneously
In same a word, it is therefore desirable to obtain the degree of association between the word of the two keywords.
The number occurred between word and word is calculated divided by the sum distinctly occurred.
Then, the degree of association between the information content and each word based on each keyword, obtains word figure, and one in word figure
Node indicates that a keyword, the initial value of node are the information content of keyword, and the side in word figure indicates the word of two keywords
Between the degree of association.
Fig. 2 is to illustrate to be intended to according to the word of the text of the embodiment of the present invention, as shown in Fig. 2, the embodiment of the present invention is with phone
For the text (not listed in detail in this specification) of sale, according to the information content of each keyword got and each
The degree of association between word generates word figure, and a node in word figure indicates that a keyword, the initial value of node are the letter of keyword
Breath amount, the side in word figure indicate the degree of association between the words of two keywords.
Value on side is the degree of association between word, and each node contains a keyword, and each keyword gives a weight, keyword power
It is the summation of around keyword weight and the degree of association again.
For example, the information content of keyword " remaining sum " is 77, the information content of keyword " downward " is 56, keyword " remaining sum " and
The degree of association between keyword " downward " is 13.2.
Finally, carrying out recursive operation until convergence to the information content of each keyword in word figure, each keyword is obtained
Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
The embodiment of the present invention excavates 6 very by taking the text (not listing in detail in this specification) of telemarketing as an example
Characteristic is advised, each corresponding exception/significant event of unconventional characteristic is as follows respectively:
1, fraud text message generation link network address
Correlative is:" I just received the mobile phone of the user name of the application of the short message credit card of only one China bank
Bank will be that failure please log in a network address "." trouble woulds you please to put this network address be a fraud text message trouble
Network address can provide ".
2, brush is stolen
Correlative is:" be exactly well to you I that credit card just be stolen brushed then just now I do not had him whatever
It is 188 consumption to me ".
3, retain card
Correlative is:" you get well I card I deposit me this card done not to my machine me "." having been gulped down by machine is
You are now also beside machine ".
4, people's row new policy
Correlative is:" inter-bank, which is transferred accounts, to be four by the system for settling account twenty four hours of people's row to the time of account in total
Account is arrived to you within 18 hours ".
5, it is middle silver for public relation education center withhold business the problem of
Correlative is:" public accumalation fund for housing construction should bales catch also (under the premise of remaining sum is enough), but do not have, cause to exceed
Phase refunds "." it has detained, but still overdue main problem (70% or more) ".
6, channel phone is obstructed
Correlative is:" you be well it is such I just received a short message he say you amount lower "." your credits
Phone is not got through at all in card center "." I has tried for a long time ".
The reason of also having found conclusion of the business in electric sale place scape:Client is allowed to understand insurance meeting " really settling a claim ".
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word
The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated
Deep excavation is carried out to text, obtains the unconventional characteristic of text.
On the basis of the above embodiments, further, the keyword be contain much information in the word of preset threshold, and/
Or default word, the keyword do not include stop words.
Specifically, can choose the non-stop words to contain much information in preset threshold as keyword, it is also possible to user hand
The word of dynamic setting.
For example, " you ", " I ", " he ", " once ", " a little while ", " well " and these words such as " this ", often occur
Stop words should delete these stop words when selecting keyword.
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word
The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated
Deep excavation is carried out to text, obtains the unconventional characteristic of text.
On the basis of the above various embodiments, further, the information content of the keyword is the TFIDF of the keyword
Value.
The information content of each keyword in the text to be excavated is obtained by following formula:
A1=B1*log(C1/D1)
Wherein, A1For the TFIDF value of keyword, B1For the word frequency of keyword, C1For the total quantity of text, D1For comprising closing
The quantity of the text of keyword.
Specifically, the information content of keyword is calculated in the embodiment of the present invention according to TFIDF algorithm in information theory, i.e., with key
The TFIDF value of word is as information content.
Calculation formula is:
A1=B1*log(C1/D1)
Wherein, A1For the TFIDF value of keyword, B1For the word frequency of keyword, C1For the total quantity of text, D1For comprising closing
The quantity of the text of keyword.
Word frequency indicates the number that a word occurs in general act collection.Document-frequency indicates the number of files that a word is used, i.e.,
All documents contain the quantity of this word.One word is in more a small amount of document largely by use, its information content is bigger.
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word
The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated
Deep excavation is carried out to text, obtains the unconventional characteristic of text.
On the basis of the above various embodiments, further, closed between each word obtained in the text to be excavated
Connection degree, the degree of association indicates the correlation degree of two target keywords between institute's predicate, specially:
Each target keyword pair is obtained, the target keyword is to by first object keyword and the second target keyword
It constitutes, the number of the word between the first object keyword and second target keyword is less than preset value;
The co-occurrence frequency of each target keyword pair is obtained, the co-occurrence frequency is number of the target keyword to appearance;
Based on the co-occurrence frequency of each target keyword pair, the degree of association between each word in the text to be excavated is obtained,
The degree of association indicates the correlation degree of two target keywords between institute's predicate.
The degree of association between each word in the text to be excavated is obtained by following formula:
A2=B2/(C1+C2-B2)
Wherein, A2The degree of association between the first object keyword and the word of second target keyword, B2It is described
The common frequency of occurrence of one target keyword and second target keyword, C1Go out occurrence for the first object keyword
Number, C2For the frequency of occurrence of second target keyword.
Specifically, in the degree of association between calculating word, it is necessary first to obtain each target keyword pair, target keyword to by
Two target keywords are constituted, and the number of the word between the two target keywords is less than preset value, and the size of the preset value can
It is adjusted according to practical application.
By taking the following sentence in telemarketing as an example:" then you, which get well my credit card and are just stolen, has brushed that I was whatever just now
Do not make he just to me 188 consumption ".
Wherein, the number of the word between target keyword " credit card " and target keyword " stolen brush " is 1, it is assumed that default
Value is 5, then " credit card " and " stolen brush " just constitutes a target keyword pair.
Then the co-occurrence frequency of each target keyword pair is obtained, co-occurrence frequency is number of the target keyword to appearance;
Count in how many sentence while occurring target keyword " credit card " and target keyword " stolen brush ", and the two
The number of word between keyword is less than preset value 5.
Finally, being obtained using preset technology formula described to be excavated according to the co-occurrence frequency of each target keyword pair
The degree of association between each word in text.
It should be noted that the calculation formula of the degree of association is not unique between word in above-mentioned example method, and actually answering
Without being limited thereto in, specific formula can depend on the circumstances.
The unconventional characteristic method for digging of text provided by the invention, by calculating the information content of keyword, between word
The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated
Deep excavation is carried out to text, obtains the unconventional characteristic of text.
Fig. 3 is according to the unconventional characteristic excavating gear schematic diagram of the text of the embodiment of the present invention, as shown in figure 3, this hair
Bright embodiment provides a kind of unconventional characteristic excavating gear of text, for completing method described in above-described embodiment, specifically
The degree of association between module 301, word, which is obtained, including information content obtains module 302, word figure acquisition module 303 and unconventional characteristic acquisition mould
Block 304, wherein
Information content obtains the information content that module 301 is used to obtain each keyword in text to be excavated;
The degree of association obtains module 302 and is used to obtain the degree of association between each word in the text to be excavated, institute's predicate between word
Between the degree of association indicate two target keywords correlation degree;
Word figure obtains module 303 and is used for the degree of association between the information content based on each keyword and each word, obtains word
Figure, a node in institute's predicate figure indicate a keyword, and the initial value of node is the information content of keyword, in institute's predicate figure
Side indicate two keywords word between the degree of association;
Unconventional characteristic obtains module 304 and is used to carry out recursive operation to the information content of each keyword in institute's predicate figure
Until convergence, the different degree of each keyword is obtained, very using the maximum keyword of different degree as the text to be excavated
Advise characteristic.
The embodiment of the present invention provides a kind of unconventional characteristic excavating gear of text, for completing described in above-described embodiment
Method, the device provided through this embodiment completes the specific steps and above-described embodiment of method described in above-described embodiment
Identical, details are not described herein again.
The unconventional characteristic excavating gear of text provided by the invention, by calculating the information content of keyword, between word
The degree of association, generating word figure can further according to word figure using the maximum keyword of different degree as the unconventional characteristic of text to be excavated
Deep excavation is carried out to text, obtains the unconventional characteristic of text.
Fig. 4 is the structural representation for the electronic equipment that the unconventional characteristic provided in an embodiment of the present invention for text is excavated
Figure, as shown in figure 4, the equipment includes:Processor 401, memory 402 and bus 403;
Wherein, processor 401 and memory 402 complete mutual communication by the bus 403;
Processor 401 is used to call the program instruction in memory 402, to execute provided by above-mentioned each method embodiment
Method, for example including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate
Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure
Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords
The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword
Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
The embodiment of the present invention discloses a kind of computer program product, and the computer program product is non-transient including being stored in
Computer program on computer readable storage medium, the computer program include program instruction, when described program instructs quilt
When computer executes, computer is able to carry out method provided by above-mentioned each method embodiment, for example including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate
Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure
Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords
The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword
Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage
Medium storing computer instruction, the computer instruction make the computer execute side provided by above-mentioned each method embodiment
Method, for example including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates two target keywords between institute's predicate
Correlation degree;
The degree of association between information content and each word based on each keyword obtains word figure, a section in institute's predicate figure
Point indicates that a keyword, the initial value of node are the information content of keyword, and the side in institute's predicate figure indicates two keywords
The degree of association between word;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains each keyword
Different degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
Those of ordinary skill in the art will appreciate that:Realize that all or part of the steps of above method embodiment can pass through
The relevant hardware of program instruction is completed, and program above-mentioned can be stored in a computer readable storage medium, the program
When being executed, step including the steps of the foregoing method embodiments is executed;And storage medium above-mentioned includes:ROM, RAM, magnetic disk or light
The various media that can store program code such as disk.
The embodiments such as device and equipment described above are only schematical, wherein described be used as separate part description
Unit may or may not be physically separated, component shown as a unit may or may not be
Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs
Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying
In the case where creative labor, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that:It still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (9)
1. a kind of unconventional characteristic method for digging of text, which is characterized in that including:
Obtain the information content of each keyword in text to be excavated;
The degree of association between each word in the text to be excavated is obtained, the degree of association indicates the pass of two target keywords between institute's predicate
Connection degree;
The degree of association between information content and each word based on each keyword obtains word figure, a node table in institute's predicate figure
Show that a keyword, the initial value of node are the information content of keyword, the side in institute's predicate figure indicates between the word of two keywords
The degree of association;
Recursive operation is carried out until convergence to the information content of each keyword in institute's predicate figure, obtains the important of each keyword
Degree, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
2. the method according to claim 1, wherein the keyword is to contain much information in the word of preset threshold,
And/or default word, the keyword do not include stop words.
3. the method according to claim 1, wherein the information content of the keyword is the keyword
TFIDF value.
4. the method according to claim 1, wherein being closed between each word obtained in the text to be excavated
Connection degree, the degree of association indicates the correlation degree of two target keywords between institute's predicate, specially:
Each target keyword pair is obtained, the target keyword is to by first object keyword and the second target keyword structure
At the number of the word between the first object keyword and second target keyword is less than preset value;
The co-occurrence frequency of each target keyword pair is obtained, the co-occurrence frequency is number of the target keyword to appearance;
Based on the co-occurrence frequency of each target keyword pair, the degree of association between each word in the text to be excavated is obtained, it is described
The degree of association indicates the correlation degree of two target keywords between word.
5. according to the method described in claim 3, it is characterized in that, obtaining each key in the text to be excavated by following formula
The information content of word:
A1=B1*log(C1/D1)
Wherein, A1For the TFIDF value of keyword, B1For the word frequency of keyword, C1For the total quantity of text, D1To include keyword
The quantity of text.
6. according to the method described in claim 4, it is characterized in that, obtaining each word in the text to be excavated by following formula
Between the degree of association:
A2=B2/(C1+C2-B2)
Wherein, A2The degree of association between the first object keyword and the word of second target keyword, B2For first mesh
Mark the common frequency of occurrence of keyword and second target keyword, C1For the frequency of occurrence of the first object keyword, C2
For the frequency of occurrence of second target keyword.
7. a kind of unconventional characteristic excavating gear of text, which is characterized in that including:
Information content obtains module, for obtaining the information content of each keyword in text to be excavated;
The degree of association obtains module between word, for obtaining the degree of association between each word in the text to be excavated, is associated between institute's predicate
Degree indicates the correlation degree of two target keywords;
Word figure obtain module, for based on each keyword information content and each word between the degree of association, obtain word figure, it is described
A node in word figure indicates that a keyword, the initial value of node are the information content of keyword, the side table in institute's predicate figure
Show the degree of association between the word of two keywords;
Unconventional characteristic obtains module, carries out recursive operation until receiving for the information content to each keyword in institute's predicate figure
It holds back, obtains the different degree of each keyword, using the maximum keyword of different degree as the unconventional characteristic of the text to be excavated.
8. a kind of electronic equipment that the unconventional characteristic for text is excavated, which is characterized in that including:
Memory and processor, the processor and the memory complete mutual communication by bus;The memory
It is stored with the program instruction that can be executed by the processor, the processor calls described program instruction to be able to carry out right such as and wants
Seek 1 to 6 any method.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
The method as described in claim 1 to 6 is any is realized when processor executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810507576.5A CN108846023A (en) | 2018-05-24 | 2018-05-24 | The unconventional characteristic method for digging and device of text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810507576.5A CN108846023A (en) | 2018-05-24 | 2018-05-24 | The unconventional characteristic method for digging and device of text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108846023A true CN108846023A (en) | 2018-11-20 |
Family
ID=64213347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810507576.5A Pending CN108846023A (en) | 2018-05-24 | 2018-05-24 | The unconventional characteristic method for digging and device of text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108846023A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111626044A (en) * | 2020-05-14 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Text generation method and device, electronic equipment and computer readable storage medium |
CN111831833A (en) * | 2020-07-27 | 2020-10-27 | 人民卫生电子音像出版社有限公司 | Knowledge graph construction method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104375989A (en) * | 2014-12-01 | 2015-02-25 | 国家电网公司 | Natural language text keyword association network construction system |
CN103399901B (en) * | 2013-07-25 | 2016-06-08 | 三星电子(中国)研发中心 | A kind of keyword abstraction method |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN106469187A (en) * | 2016-08-29 | 2017-03-01 | 东软集团股份有限公司 | The extracting method of key word and device |
CN106776881A (en) * | 2016-11-28 | 2017-05-31 | 中国科学院软件研究所 | A kind of realm information commending system and method based on microblog |
KR101769247B1 (en) * | 2015-12-16 | 2017-08-18 | 건국대학교 산학협력단 | Method and apparatus for comparing strings using hierarchical interval tree |
CN107193803A (en) * | 2017-05-26 | 2017-09-22 | 北京东方科诺科技发展有限公司 | A kind of particular task text key word extracting method based on semanteme |
-
2018
- 2018-05-24 CN CN201810507576.5A patent/CN108846023A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103399901B (en) * | 2013-07-25 | 2016-06-08 | 三星电子(中国)研发中心 | A kind of keyword abstraction method |
CN104375989A (en) * | 2014-12-01 | 2015-02-25 | 国家电网公司 | Natural language text keyword association network construction system |
KR101769247B1 (en) * | 2015-12-16 | 2017-08-18 | 건국대학교 산학협력단 | Method and apparatus for comparing strings using hierarchical interval tree |
CN106202042A (en) * | 2016-07-06 | 2016-12-07 | 中央民族大学 | A kind of keyword abstraction method based on figure |
CN106469187A (en) * | 2016-08-29 | 2017-03-01 | 东软集团股份有限公司 | The extracting method of key word and device |
CN106776881A (en) * | 2016-11-28 | 2017-05-31 | 中国科学院软件研究所 | A kind of realm information commending system and method based on microblog |
CN107193803A (en) * | 2017-05-26 | 2017-09-22 | 北京东方科诺科技发展有限公司 | A kind of particular task text key word extracting method based on semanteme |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111626044A (en) * | 2020-05-14 | 2020-09-04 | 北京字节跳动网络技术有限公司 | Text generation method and device, electronic equipment and computer readable storage medium |
CN111626044B (en) * | 2020-05-14 | 2023-06-30 | 北京字节跳动网络技术有限公司 | Text generation method, text generation device, electronic equipment and computer readable storage medium |
CN111831833A (en) * | 2020-07-27 | 2020-10-27 | 人民卫生电子音像出版社有限公司 | Knowledge graph construction method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pinna et al. | A petri nets model for blockchain analysis | |
Khan et al. | Multi-gcn: Graph convolutional networks for multi-view networks, with applications to global poverty | |
CN106126521B (en) | The social account method for digging and server of target object | |
CN109063966B (en) | Risk account identification method and device | |
CN105051721B (en) | Natural language description is converted to because of the program of the language different and different because of domain for electrical form | |
CN106469196A (en) | Data processing method and its device, method of insuring and its system of insuring | |
CN109472568A (en) | A kind of block chain method of commerce, device, management system, equipment and storage medium | |
CN105094572B (en) | Menu self-adjusting roll screen display control method, server and portable terminal | |
CN109685336A (en) | Collection task distribution method, device, computer equipment and storage medium | |
CN109600724A (en) | A kind of method and apparatus that short message is sent | |
CN112287015A (en) | Image generation system, image generation method, electronic device, and storage medium | |
CN107705199A (en) | The generation method and device of feature calculation code | |
CN108595579A (en) | Contact person's cohesion evaluation method, device, computer equipment and storage medium | |
TAN et al. | Evaluation and improvement of procurement process with data analytics | |
CN108446968A (en) | A kind of method, apparatus and terminal device of accounting entry | |
CN101202792B (en) | Method and apparatus for processing messages based on relationship between sender and recipient | |
CN105159927B (en) | Method and device for selecting subject term of target text and terminal | |
CN108846023A (en) | The unconventional characteristic method for digging and device of text | |
CN109697626A (en) | Marketing system, method, electric terminal and medium based on JeeSite quick development platform | |
CN109903122A (en) | House prosperity transaction information processing method, device, equipment and storage medium | |
CN111930366A (en) | Rule engine implementation method and system based on JIT real-time compilation | |
Rahimikia et al. | Realised volatility forecasting: Machine learning via financial word embedding | |
CN111708897A (en) | Target information determination method, device and equipment | |
CN113032001B (en) | Intelligent contract classification method and device | |
Baldassini et al. | client2vec: towards systematic baselines for banking applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200309 Address after: 519000 room 105-58115, No. 6, Baohua Road, Hengqin New District, Zhuhai City, Guangdong Province (centralized office area) Applicant after: Puqiang times (Zhuhai Hengqin) Information Technology Co., Ltd Address before: 100089 Haidian District, Beijing, Yongfeng Road, North Road, South East Road, F, 2 floor. Applicant before: Puqiang Information Technology (Beijing) Co., Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181120 |