CN108920660A - Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing - Google Patents

Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing Download PDF

Info

Publication number
CN108920660A
CN108920660A CN201810723425.3A CN201810723425A CN108920660A CN 108920660 A CN108920660 A CN 108920660A CN 201810723425 A CN201810723425 A CN 201810723425A CN 108920660 A CN108920660 A CN 108920660A
Authority
CN
China
Prior art keywords
keyword
text
business operation
operation page
measured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810723425.3A
Other languages
Chinese (zh)
Other versions
CN108920660B (en
Inventor
宋雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201810723425.3A priority Critical patent/CN108920660B/en
Publication of CN108920660A publication Critical patent/CN108920660A/en
Application granted granted Critical
Publication of CN108920660B publication Critical patent/CN108920660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing, due to considering the float factor of the corresponding business operation page of keyword, that is, the selected probability of the corresponding text of each business operation page is combined, so the weight of obtained keyword can assess the significance level for the file that the keyword includes for corpus;The weight of keyword is bigger, and the file for showing that the keyword includes for corpus is more important.Based on the weight of keyword each in keyword set, from corpus obtain file be user be intended to the text checked probability it is bigger, i.e., it is more accurate.

Description

Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing
Technical field
The present invention relates to weighting technique fields, and more specifically, it relates to keyword weight acquisition methods, device, electronics Equipment and readable storage medium storing program for executing.
Background technique
TF-IDF (termfrequency-inverse document frequency) be it is a kind of for information retrieval with The common weighting technique of data mining, i.e., the file for including for corpus using one words of right assessment of words it is important Degree.TF (TermFrequency) refers to that word frequency, IDF (Inverse Document occurs in text to be measured in words Frequency) refer to the inverse of the number of the document in corpus comprising the words, i.e. inverse document frequency.Based on TF with IDF can use one or more texts in the weight hit corpus of the words in text to be measured.
Intelligent answer is exactly to utilize TF-IDF technology, the words that the text to be measured based on user's input includes, from corpus In determine the text that user may need, and show user.For example, user clicks online question and answer, and in the window of displaying Middle input text to be measured, for example, I will report the loss;Backstage can based on " I will report the loss " this text to be measured include " reporting the loss " this Words determines that user may need that credit card reports the loss text or deposit card reports the loss text from corpus, and credit card is reported the loss Text or deposit card report the loss textual presentation to user.
Under normal circumstances, text to be measured is all shorter, causes the TF of each words smaller, for example, I to report the loss in " report the loss " The TF of this words is 1, so that using TF-IDF technology, the weight inaccuracy of obtained words leads to the weight based on words The text inaccuracy hit from corpus.
Summary of the invention
In view of this, the present invention provides a kind of keyword weight acquisition methods, device, electronic equipment and readable storage mediums Matter.
To achieve the above object, the present invention provides the following technical solutions:
A kind of keyword weight acquisition methods, including:
Obtain text to be measured;
Keyword set is obtained, the keyword set includes at least:At least one key that the text to be measured includes Word;
For any keyword in the keyword set, obtain the floating of the corresponding business operation page of the keyword because Son, wherein a business operation page corresponds to a text in corpus, the corresponding one or more passes of a business operation page Keyword, the float factor of the corresponding business operation page of a keyword indicate that the corresponding text of the business operation page is selected Probability;
Obtain the frequency that the keyword occurs in the text to be measured;
Obtain the number of the text in the corpus comprising the keyword;
The frequency that is occurred in the text to be measured based on the corresponding float factor of the keyword, the keyword and described The number of text in corpus including the keyword, obtains the weight of the keyword, every in the keyword set to obtain The corresponding weight of one keyword.
Wherein, further include:
Based on the corresponding weight of keyword each in the keyword set, target text is obtained from the corpus.
Wherein, the acquisition text to be measured includes:
Show the first business operation page;
In response to inputting the operational order of text to be measured, the text to be measured of user's input is received.
Wherein, the keyword set includes the first keyword and at least one second keyword, wherein described first closes Keyword corresponds to the first business operation page, described for any keyword in the keyword set, obtains the keyword The float factor of the corresponding business operation page, including:
It is directed to any second keyword in the keyword set, by the corresponding business operation page of second keyword The determine the probability that corresponding text is selected within a preset time is the corresponding float factor of the second keyword;
For the first keyword in the keyword set, it is selected that the corresponding text of the first business operation page is set In probability be the first value, first value is greater than or equal to the corresponding float factor of any second keyword;It will be described First value is determined as the corresponding float factor of first keyword.
Wherein, corresponding first keyword of the first business operation page, the acquisition keyword set include:
Obtain at least one described keyword that the text to be measured includes;
If at least one described keyword does not include first keyword, by first keyword and described at least one A keyword merges, and obtains the keyword set;
The frequency that occurs in the text to be measured of the keyword that obtains includes:
It is second value that the frequency that first keyword occurs in the text to be measured, which is arranged,.
A kind of keyword weight acquisition device, including:
First obtains module, for obtaining text to be measured;
Second obtains module, and for obtaining keyword set, the keyword set is included at least:The text packet to be measured At least one keyword contained;
Third obtains module, for obtaining the corresponding industry of the keyword for any keyword in the keyword set The float factor for operation pages of being engaged in, wherein a business operation page corresponds to a text in corpus, a business operation page The corresponding one or more keywords in face, the float factor of the corresponding business operation page of a keyword indicate the business operation page The selected probability of the corresponding text in face;
4th obtains module, for for any keyword in the keyword set, obtain the keyword it is described to Survey the frequency occurred in text;
5th obtains module, includes for obtaining in the corpus for any keyword in the keyword set The number of the text of the keyword;
6th obtains module, for it is corresponding floating to be based on the keyword for any keyword in the keyword set It include the text of the keyword in reason, the frequency that occurs in the text to be measured of the keyword and the corpus Number obtains the weight of the keyword, to obtain the corresponding weight of each keyword in the keyword set.
Wherein, the first acquisition module includes:
Display unit, for showing the first business operation page;
Receiving unit receives the text to be measured of user's input for the operational order in response to inputting text to be measured.
Wherein, keyword set includes the first keyword and at least one second keyword, wherein first keyword The corresponding first business operation page, the third obtain module and include:
First determination unit, for being directed to any second keyword in the keyword set, by second keyword The determine the probability that the corresponding text of the business operation page is selected within a preset time accordingly is that second keyword is corresponding Float factor;
Second determination unit, for first business operation to be arranged for the first keyword in the keyword set The selected probability of the corresponding text of the page is the first value, and it is corresponding that first value is greater than or equal to any second keyword Float factor;First value is determined as the corresponding float factor of first keyword.
A kind of electronic equipment, including:
Memory, for storing program;
Processor, for executing described program, described program is specifically used for:
Obtain text to be measured;
Keyword set is obtained, the keyword set includes at least:At least one key that the text to be measured includes Word;
For any keyword in the keyword set, obtain the floating of the corresponding business operation page of the keyword because Son, wherein a business operation page corresponds to a text in corpus, the corresponding one or more passes of a business operation page Keyword, the float factor of the corresponding business operation page of a keyword indicate that the corresponding text of the business operation page is selected Probability;
Obtain the frequency that the keyword occurs in the text to be measured;
Obtain the number of the text in the corpus comprising the keyword;
The frequency that is occurred in the text to be measured based on the corresponding float factor of the keyword, the keyword and described The number of text in corpus including the keyword, obtains the weight of the keyword, every in the keyword set to obtain The corresponding weight of one keyword.
A kind of readable storage medium storing program for executing is stored thereon with computer program, real when the computer program is executed by processor Now each step that the keyword weight acquisition methods as described in any of the above-described include.
It can be seen via above technical scheme that compared with prior art, the invention discloses a kind of acquisitions of keyword weight Method obtains text to be measured first, and obtains the keyword set comprising at least one keyword in text to be measured;For key Each keyword in set of words, obtains the float factor of the corresponding business operation page of the keyword, and a keyword is corresponding The float factor of the business operation page indicates the selected probability of the corresponding text of the business operation page;Based on the keyword pair It include the keyword in the frequency that float factor, the keyword answered occur in the text to be measured and the corpus The number of text obtains the weight of the keyword, to obtain the weight of each keyword in keyword set.Due to considering The float factor of the corresponding business operation page of keyword, that is, it is selected to combine the corresponding text of each business operation page In probability, so the weight of obtained keyword can assess the important journey for the file that the keyword includes for corpus Degree;The weight of keyword is bigger, and the file for showing that the keyword includes for corpus is more important.Based on every in keyword set As soon as the weight of keyword, from the file that obtains in corpus, to be that user is intended to the probability of text checked bigger, i.e., more accurate.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 a to Fig. 1 b is a kind of form of expression schematic diagram of intelligent answer provided in an embodiment of the present invention;
Fig. 2 is that the embodiment of the invention provides a kind of flow charts of implementation of keyword weight acquisition methods;
Fig. 3 is the flow chart of another implementation of keyword weight acquisition methods provided in an embodiment of the present invention;
Fig. 4 is a kind of structure chart of implementation of keyword weight acquisition device provided in an embodiment of the present invention;
Fig. 5 is a kind of structure chart of implementation of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Keyword weight acquisition methods provided in an embodiment of the present invention, can be applied to client, and client can be net Page client or application client.
It the case where often will appear intelligent answer in client, as shown in Fig. 1 a to Fig. 1 b, is provided for the embodiment of the present invention Intelligent answer a kind of form of expression schematic diagram.
Some business operation page and online question and answer prompting frame 11 of client are illustrated in Fig. 1 a.User clicks online Question and answer prompting frame 11 can enter intelligent answer window 12 shown in Fig. 1 b.User's intelligent answer window shown in Fig. 1 b 12 input texts to be measured of mouth.
In an alternative embodiment, user can also input voice, and client converts speech into text to be measured again.
In the prior art, after user inputs text to be measured, client can based on the weight of the keyword in text to be measured, Corresponding text is found from corpus, and shows user, for example, user inputs " I will report the loss ", client can be found Credit card reports the loss text and deposit card reports the loss text, and is supplied to user.Since the text to be measured of user's input is all shorter, lead The TF frequency that the keyword for causing text to be measured to include occurs in text to be measured is generally all smaller, or even is all 1.It can not by TF Importance of the keyword in text to be measured is embodied to lead so that the weight accuracy based on the obtained keyword of TF and IDF is reported to the leadship after accomplishing a task Cause the weight based on keyword from corpus to text, be not text required for user.
To solve the above-mentioned problems, the embodiment of the invention provides keyword weight acquisition methods, as shown in Fig. 2, for this Inventive embodiments provide a kind of flow chart of implementation of keyword weight acquisition methods, and this method includes:
Step S201:Obtain text to be measured.
In an alternative embodiment, user can input text to be measured, and in another alternative embodiment, user can be inputted Voice signal, client convert voice signals into text to be measured.
Step S202:Keyword set is obtained, the keyword set includes at least:The text to be measured includes at least One keyword.
One or more keywords are respectively configured for each business operation page in advance in the embodiment of the present invention, for example, savings The keyword of card operation pages distribution is deposit card, and the keyword of credit card operation page configuration is credit card.Deposit card is transferred accounts The keyword of page configuration is deposit card and transfers accounts.Credit card reports the loss the keyword of page configuration and is credit card and reports the loss.
In at least one keyword that text to be measured includes, each keyword corresponds to a business operation page.
Step S203 to step S206 is executed for keyword any in the keyword set, to obtain the keyword The corresponding weight of each keyword in set.
Step S203:Obtain the float factor of the corresponding business operation page of the keyword, wherein a business operation page Face corresponds to a text in corpus, the corresponding one or more keywords of a business operation page, the corresponding industry of the keyword The float factor of business operation pages indicates the selected probability of the corresponding text of the business operation page.
In the embodiment of the present invention, the relationship of text is as follows in the corresponding corpus of the business operation page:
Client shows the process of the business operation page, as corresponding literary with the business operation page in load corpus This process.
It is assumed that current time is on June 16th, 2018, can count on 1 day to 2018 January in 2018,1 date of June Between, the selected number of the corresponding text of each business operation page which includes.With obtain on January 1st, 2018 extremely During on June 1st, 2018, the selected probability of the corresponding text of each business operation page which includes.
Assuming that client includes 3 business operation pages, respectively the business operation page 1, the business operation page 2 and industry Business operation pages 3, wherein the corresponding keyword 1 of the business operation page 1 and keyword 2, the corresponding keyword 3 of the business operation page 2, The corresponding keyword 4 of the business operation page 3 and keyword 5.
During on June 1,1 day to 2018 January in 2018, user is directed to the number that the business operation page 1 is operated, The number for choosing the corresponding text of the business operation page 1 is 30 times;Time that user is operated for the business operation page 2 Number, that is, choosing the number of the corresponding text of the business operation page 2 is 50 times;For the number that the business operation page 3 is operated, The number for choosing the corresponding text of the business operation page 3 is 20 times.
So keyword 1 and the corresponding float factor of keyword 2 can be 30/ (30+50+20)=0.3;Similarly, crucial The corresponding float factor of word 3 can be 50/ (30+50+20)=0.5;The float factor that keyword 4 and keyword 5 are answered can be 20/ (30+50+20)=0.2.
Step S204:Obtain the frequency that the keyword occurs in the text to be measured.
Assuming that text to be measured is:I will report the loss credit card, then the keyword that text to be measured includes is:It reports the loss, credit card; Wherein, the frequency that credit card occurs in text to be measured is 1, and reporting the loss the frequency occurred in text to be measured is 1.
Step S205:Obtain the number of the text in the corpus comprising the keyword.
Assuming that including in corpus:Deposit card reports the loss text, and credit card reports the loss text, and credit card handles Wen Wenben, savings Card handles text;Wherein, the text number comprising credit card this keyword is 2;Textual data including reporting the loss this keyword Mesh is 2.
Step S203 to step S205 is without successively executing sequence.
Step S206:The frequency occurred in the text to be measured based on the corresponding float factor of the keyword, the keyword The number of text in the secondary and described corpus including the keyword, obtains the weight of the keyword.
In an alternative embodiment, weight=float factor * TF*IDF of a keyword.Assuming that keyword is floating Reason is 0.3, and the frequency which occurs in text to be measured is 5, includes the text number of the keyword in corpus It is 10, then weight=0.3*5*1/10=0.15 of the keyword.
Keyword weight acquisition methods provided in an embodiment of the present invention, obtain text to be measured first, and obtain comprising to be measured The keyword set of at least one keyword in text;For keyword each in keyword set, it is corresponding to obtain the keyword The business operation page float factor, the float factor of the corresponding business operation page of a keyword indicates the business operation The selected probability of the corresponding text of the page;Based on the corresponding float factor of the keyword, the keyword in the text to be measured The number of text in the frequency of middle appearance and the corpus including the keyword, obtains the weight of the keyword, with The weight of each keyword into keyword set.Due to consider the floating of the corresponding business operation page of keyword because Son, that is, the selected probability of the corresponding text of each business operation page is combined, so the weight of obtained keyword The significance level for the file that the keyword includes for corpus can be assessed;The weight of keyword is bigger, shows the key The file that word includes for corpus is more important.Based on the weight of keyword each in keyword set, from obtaining in corpus It is bigger to be that user is intended to the probability of text checked to file, i.e., it is more accurate.
In an alternative embodiment, above-mentioned keyword weight acquisition methods further include:
Based on the corresponding weight of keyword each in the keyword set, target text is obtained from the corpus.
Specifically, each keyword for including by keyword set carries out descending according to the corresponding weight of each keyword Sequence;
From corpus, at least one first text including all keywords in the keyword set is obtained, is shown At least one described first text;
If corpus does not include first text, last M keyword is removed from keyword set, obtains the first pass Keyword set;M is the positive integer more than or equal to 1;
From corpus, at least one second text comprising all keywords in the first keyword set is obtained, is shown At least one described second text;
If in corpus not including second text, last N keyword is removed from the first keyword set, is obtained Second keyword set;N is the positive integer more than or equal to 1.
From corpus, at least one third text comprising all keywords in the second keyword set is obtained, is shown At least one described third text.
Target text in the embodiment of the present invention may include:At least one first text, and/or, at least one second Text, and/or, at least one third text.
As shown in figure 3, the stream of another implementation for keyword weight acquisition methods provided in an embodiment of the present invention Cheng Tu, this method include:
Step S301:Show the first business operation page, corresponding first keyword of the first business operation page.
Still with the first business operation page that shown in Fig. 1 a and Fig. 1 b, Fig. 1 a is shown for the credit card operation page, then first Corresponding first keyword of the business operation page can be credit card.
Step S302:In response to inputting the operational order of text to be measured, the text to be measured of user's input is received.
It is assumed that as shown in Figure 1 b, text to be measured is:I will report the loss.
Step S303:Keyword set is obtained, the keyword set includes at least:First keyword and described to be measured At least one second keyword that text includes.
In an alternative embodiment, text to be measured may include the first keyword, at this point, at least one described second key Word does not include the first keyword;In an alternative embodiment, text to be measured may not include the first keyword, at this time, it may be necessary to will First keyword merges at least one described second keyword, obtains the keyword set.And due to the first key The frequency that word occurs in text to be measured is 0, it is therefore desirable to be arranged what first keyword occurred in the text to be measured The frequency is second value, and second value is not equal to 0.Second value can be based on depending on actual conditions, for example, being 1 or 2 or 3 ....
Still by taking Fig. 1 a and Fig. 1 b as an example, then text to be measured does not include the first keyword " credit card ".Text to be measured only includes Keyword " is reported the loss ", and therefore, obtained keyword set includes:Credit card is reported the loss.
Step S304:It is the first value that the selected probability of the corresponding text of the first business operation page, which is arranged, described First value is greater than or equal to the corresponding float factor of any second keyword.
Step S305:First value is determined as the corresponding float factor of first keyword.
Since user is to enter intelligent answer in the corresponding business operation page of credit card, very in maximum probability, use Family is to need to be operated for credit card.Therefore, the first value of setting is larger, for example, it is corresponding to be greater than any second keyword Float factor.
Step S306:It is directed to any second keyword in the keyword set, by the corresponding industry of the second keyword The probability that the corresponding text of business operation pages is selected within a preset time as the corresponding float factor of the second keyword, with Obtain at least one described corresponding float factor of the second keyword.
It is assumed that current time is on June 16th, 2018, preset time period is on June 1,1 day to 2018 January in 2018, Assuming that client includes 3 business operation pages, respectively the business operation page 1, the business operation page 2 and business operation page Face 3, wherein the corresponding keyword 1 of the business operation page 1 and keyword 2, the corresponding keyword 3 of the business operation page 2, business operation The corresponding keyword 4 of the page 3 and keyword 5.I.e. at least one second keyword includes:Keyword 1, keyword 3, closes at keyword 2 Keyword 4 and keyword 5.
Assuming that during on June 1,1 day to 2018 January in 2018, what user was operated for the business operation page 1 Number, that is, choosing the number of the corresponding text of the business operation page 1 is 30 times;User operates for the business operation page 2 Number, that is, choose the corresponding text of the business operation page 2 number be 50 times;It is operated for the business operation page 3 Number, that is, choosing the number of the corresponding text of the business operation page 3 is 20 times.
So keyword 1 and the corresponding float factor of keyword 2 can be 30/ (30+50+20)=0.3;Similarly, crucial The corresponding float factor of word 3 can be 50/ (30+50+20)=0.5;The float factor that keyword 4 and keyword 5 are answered can be 20/ (30+50+20)=0.2.
Step S307:Obtain the frequency that each keyword occurs in the text to be measured in keyword set.
Step S308:Obtain the number comprising the text of each keyword in keyword set in the corpus.
Step S304 to step S308 does not have sequencing.
Step S309:For keyword each in keyword set, it is based on the corresponding float factor of the keyword, the key Include the number of the text of the keyword in the frequency that word occurs in the text to be measured and the corpus, obtains the pass The weight of keyword, to obtain the corresponding weight of each keyword in the keyword set, to obtain in the keyword set The corresponding weight of each keyword.
Keyword weight acquisition methods provided in an embodiment of the present invention, it is contemplated that carry out corresponding first industry when intelligent answer Business operation pages, that is, combine business scenario when carrying out intelligent answer, so that based on keyword each in keyword set Weight, the text obtained in the corpus be user be intended to the text checked probability it is bigger, i.e., it is more accurate.
Method is described in detail in aforementioned present invention disclosed embodiment, diversified forms can be used for method of the invention Device realize that therefore the invention also discloses a kind of devices, and specific embodiment is given below and is described in detail.
As shown in figure 4, a kind of structure of implementation for keyword weight acquisition device provided in an embodiment of the present invention Figure, the device include:
First obtains module 41, for obtaining text to be measured;
Second obtains module 42, and for obtaining keyword set, the keyword set is included at least:The text to be measured At least one keyword for including;
Third obtains module 43, for it is corresponding to obtain the keyword for any keyword in the keyword set The float factor of the business operation page, wherein a business operation page corresponds to a text in corpus, a business operation The corresponding one or more keywords of the page, the float factor of the corresponding business operation page of a keyword indicate the business operation The selected probability of the corresponding text of the page;
4th obtains module 44, for obtaining the keyword described for any keyword in the keyword set The frequency occurred in text to be measured;
5th obtains module 45, for obtaining and wrapping in the corpus for any keyword in the keyword set The number of text containing the keyword;
6th obtains module 46, for it is corresponding to be based on the keyword for any keyword in the keyword set It include the text of the keyword in the frequency that float factor, the keyword occur in the text to be measured and the corpus Number, the weight of the keyword is obtained, to obtain the corresponding weight of each keyword in the keyword set.
Optionally, further include:
7th obtains module, is used for based on the corresponding weight of keyword each in the keyword set, from the corpus Target text is obtained in library.
Optionally, the first acquisition module includes:
Display unit, for showing the first business operation page;
Receiving unit receives the text to be measured of user's input for the operational order in response to inputting text to be measured.
Optionally, keyword set includes the first keyword and at least one second keyword, wherein described first is crucial Word corresponds to the first business operation page, and the third obtains module and includes:
First determination unit, for being directed to any second keyword in the keyword set, by second keyword The determine the probability that the corresponding text of the business operation page is selected within a preset time accordingly is that second keyword is corresponding Float factor;
Second determination unit, for first business operation to be arranged for the first keyword in the keyword set The selected probability of the corresponding text of the page is the first value, and it is corresponding that first value is greater than or equal to any second keyword Float factor;First value is determined as the corresponding float factor of first keyword.
Optionally, corresponding first keyword of the first business operation page, the second acquisition module include:
First acquisition unit, for obtain the text to be measured include described at least one keyword;
Second acquisition unit, if not including first keyword at least one described keyword, by described first Keyword merges at least one described keyword, obtains the keyword set;
Described 4th, which obtains module, includes:
Setting unit is second value for the frequency that first keyword occurs in the text to be measured to be arranged.
As shown in figure 5, a kind of structure chart of implementation for electronic equipment provided in an embodiment of the present invention, the electronics are set It is standby to include:
Memory 51, for storing program;
Processor 52, for executing described program, described program is specifically used for:
Obtain text to be measured;
Keyword set is obtained, the keyword set includes at least:At least one key that the text to be measured includes Word;
For any keyword in the keyword set, obtain the floating of the corresponding business operation page of the keyword because Son, wherein a business operation page corresponds to a text in corpus, the corresponding one or more passes of a business operation page Keyword, the float factor of the corresponding business operation page of a keyword indicate that the corresponding text of the business operation page is selected Probability;
Obtain the frequency that the keyword occurs in the text to be measured;
Obtain the number of the text in the corpus comprising the keyword;
The frequency that is occurred in the text to be measured based on the corresponding float factor of the keyword, the keyword and described The number of text in corpus including the keyword, obtains the weight of the keyword, every in the keyword set to obtain The corresponding weight of one keyword.
Memory 51 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-volatile Memory), a for example, at least magnetic disk storage.
Processor 52 may be a central processor CPU or specific integrated circuit ASIC
(Application Specific Integrated Circuit), or be arranged to implement of the invention real Apply one or more integrated circuits of example.
Optionally, electronic equipment can also include communication bus 53 and communication interface 54, wherein memory 51, processing Device 52, completes mutual communication by communication bus 53 at communication interface 54;
Optionally, communication interface 54 can be the interface of communication module, such as the interface of gsm module.
Optionally, the embodiment of the invention also provides a kind of readable storage medium storing program for executing, are stored thereon with computer program, special Sign is, when the computer program is executed by processor, realizes the keyword weight acquisition methods packet as described in any of the above-described The each step contained.
It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other. For device or system class embodiment, since it is basically similar to the method embodiment, so be described relatively simple, it is related Place illustrates referring to the part of embodiment of the method.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (10)

1. a kind of keyword weight acquisition methods, which is characterized in that including:
Obtain text to be measured;
Keyword set is obtained, the keyword set includes at least:At least one keyword that the text to be measured includes;
For any keyword in the keyword set, the float factor of the corresponding business operation page of the keyword is obtained, Wherein, a business operation page corresponds to a text in corpus, and a business operation page is corresponding one or more crucial Word, the float factor of the corresponding business operation page of a keyword indicate what the corresponding text of the business operation page was selected Probability;
Obtain the frequency that the keyword occurs in the text to be measured;
Obtain the number of the text in the corpus comprising the keyword;
The frequency occurred in the text to be measured based on the corresponding float factor of the keyword, the keyword and the corpus The number of text in library including the keyword, obtains the weight of the keyword, to obtain each pass in the keyword set The corresponding weight of keyword.
2. keyword weight acquisition methods according to claim 1, which is characterized in that further include:
Based on the corresponding weight of keyword each in the keyword set, target text is obtained from the corpus.
3. keyword weight acquisition methods according to claim 1 or claim 2, which is characterized in that described to obtain text to be measured and include:
Show the first business operation page;
In response to inputting the operational order of text to be measured, the text to be measured of user's input is received.
4. keyword weight acquisition methods according to claim 3, which is characterized in that the keyword set includes the first pass Keyword and at least one second keyword, wherein first keyword corresponds to the first business operation page, described to be directed to Any keyword in the keyword set obtains the float factor of the corresponding business operation page of the keyword, including:
It is directed to any second keyword in the keyword set, the corresponding business operation page of second keyword is corresponding The determine the probability that is selected within a preset time of text be the corresponding float factor of the second keyword;
For the first keyword in the keyword set, it is arranged what the corresponding text of the first business operation page was selected Probability is the first value, and first value is greater than or equal to the corresponding float factor of any second keyword;By described first Value is determined as the corresponding float factor of first keyword.
5. according to the keyword weight acquisition methods of claim 3 or 4, which is characterized in that the first business operation page Corresponding first keyword, the acquisition keyword set include:
Obtain at least one described keyword that the text to be measured includes;
If at least one described keyword does not include first keyword, by first keyword and at least one described pass Keyword merges, and obtains the keyword set;
The frequency that occurs in the text to be measured of the keyword that obtains includes:
It is second value that the frequency that first keyword occurs in the text to be measured, which is arranged,.
6. a kind of keyword weight acquisition device, which is characterized in that including:
First obtains module, for obtaining text to be measured;
Second obtains module, and for obtaining keyword set, the keyword set is included at least:The text to be measured includes At least one keyword;
Third obtains module, for obtaining the corresponding business behaviour of the keyword for any keyword in the keyword set Make the float factor of the page, wherein a business operation page corresponds to a text in corpus, a business operation page pair One or more keywords are answered, the float factor of the corresponding business operation page of a keyword indicates the business operation page pair The selected probability of the text answered;
4th obtains module, for obtaining the keyword in the text to be measured for any keyword in the keyword set The frequency occurred in this;
5th obtains module, for for any keyword in the keyword set, obtaining in the corpus comprising the pass The number of the text of keyword;
6th obtains module, for for any keyword in the keyword set, based on the corresponding floating of the keyword because It include the number of the text of the keyword in son, the frequency that occurs in the text to be measured of the keyword and the corpus Mesh obtains the weight of the keyword, to obtain the corresponding weight of each keyword in the keyword set.
7. keyword weight acquisition methods according to claim 6, which is characterized in that described first, which obtains module, includes:
Display unit, for showing the first business operation page;
Receiving unit receives the text to be measured of user's input for the operational order in response to inputting text to be measured.
8. keyword weight acquisition methods according to claim 7, which is characterized in that keyword set includes the first keyword With at least one the second keyword, wherein first keyword corresponds to the first business operation page, and the third obtains Module includes:
First determination unit is corresponding by second keyword for being directed to any second keyword in the keyword set The determine the probability that is selected within a preset time of the corresponding text of the business operation page be the corresponding floating of the second keyword The factor;
Second determination unit, for the first business operation page to be arranged for the first keyword in the keyword set The selected probability of corresponding text is the first value, and it is corresponding floating that first value is greater than or equal to any second keyword Reason;First value is determined as the corresponding float factor of first keyword.
9. a kind of electronic equipment, which is characterized in that including:
Memory, for storing program;
Processor, for executing described program, described program is specifically used for:
Obtain text to be measured;
Keyword set is obtained, the keyword set includes at least:At least one keyword that the text to be measured includes;
For any keyword in the keyword set, the float factor of the corresponding business operation page of the keyword is obtained, Wherein, a business operation page corresponds to a text in corpus, and a business operation page is corresponding one or more crucial Word, the float factor of the corresponding business operation page of a keyword indicate what the corresponding text of the business operation page was selected Probability;
Obtain the frequency that the keyword occurs in the text to be measured;
Obtain the number of the text in the corpus comprising the keyword;
The frequency occurred in the text to be measured based on the corresponding float factor of the keyword, the keyword and the corpus The number of text in library including the keyword, obtains the weight of the keyword, to obtain each pass in the keyword set The corresponding weight of keyword.
10. a kind of readable storage medium storing program for executing, is stored thereon with computer program, which is characterized in that the computer program is processed When device executes, each step that keyword weight acquisition methods as claimed in claim 1 to 5 include is realized.
CN201810723425.3A 2018-07-04 2018-07-04 Keyword weight obtaining method and device, electronic equipment and readable storage medium Active CN108920660B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810723425.3A CN108920660B (en) 2018-07-04 2018-07-04 Keyword weight obtaining method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810723425.3A CN108920660B (en) 2018-07-04 2018-07-04 Keyword weight obtaining method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN108920660A true CN108920660A (en) 2018-11-30
CN108920660B CN108920660B (en) 2020-11-20

Family

ID=64424547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810723425.3A Active CN108920660B (en) 2018-07-04 2018-07-04 Keyword weight obtaining method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN108920660B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070857A (en) * 2019-04-25 2019-07-30 北京梧桐车联科技有限责任公司 The model parameter method of adjustment and device, speech ciphering equipment of voice wake-up model

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1016985A3 (en) * 1998-12-30 2004-04-14 Xerox Corporation Method and system for topic based cross indexing of text and audio
US20110004610A1 (en) * 2009-07-02 2011-01-06 Battelle Memorial Institute Automatic Generation of Stop Word Lists for Information Retrieval and Analysis
CN105389117A (en) * 2015-12-07 2016-03-09 腾讯科技(深圳)有限公司 Resource acquiring method and apparatus and resource processing method, apparatus and system
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topics Crawling method and device
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
CN107102985A (en) * 2017-04-23 2017-08-29 四川用联信息技术有限公司 Multi-threaded keyword extraction techniques in improved document
CN107273409A (en) * 2017-05-03 2017-10-20 广州赫炎大数据科技有限公司 A kind of network data acquisition, storage and processing method and system
CN107590195A (en) * 2017-08-14 2018-01-16 百度在线网络技术(北京)有限公司 Textual classification model training method, file classification method and its device
CN108009149A (en) * 2017-11-23 2018-05-08 东软集团股份有限公司 A kind of keyword extracting method, extraction element, medium and electronic equipment
CN108132927A (en) * 2017-12-07 2018-06-08 西北师范大学 A kind of fusion graph structure and the associated keyword extracting method of node

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1016985A3 (en) * 1998-12-30 2004-04-14 Xerox Corporation Method and system for topic based cross indexing of text and audio
US20110004610A1 (en) * 2009-07-02 2011-01-06 Battelle Memorial Institute Automatic Generation of Stop Word Lists for Information Retrieval and Analysis
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device
CN105389117A (en) * 2015-12-07 2016-03-09 腾讯科技(深圳)有限公司 Resource acquiring method and apparatus and resource processing method, apparatus and system
CN106294314A (en) * 2016-07-19 2017-01-04 北京奇艺世纪科技有限公司 Topics Crawling method and device
CN107102985A (en) * 2017-04-23 2017-08-29 四川用联信息技术有限公司 Multi-threaded keyword extraction techniques in improved document
CN107273409A (en) * 2017-05-03 2017-10-20 广州赫炎大数据科技有限公司 A kind of network data acquisition, storage and processing method and system
CN107590195A (en) * 2017-08-14 2018-01-16 百度在线网络技术(北京)有限公司 Textual classification model training method, file classification method and its device
CN108009149A (en) * 2017-11-23 2018-05-08 东软集团股份有限公司 A kind of keyword extracting method, extraction element, medium and electronic equipment
CN108132927A (en) * 2017-12-07 2018-06-08 西北师范大学 A kind of fusion graph structure and the associated keyword extracting method of node

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙兴东等: "一种基于聚类的微博关键词提取方法的研究与实现", 《技术研究》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070857A (en) * 2019-04-25 2019-07-30 北京梧桐车联科技有限责任公司 The model parameter method of adjustment and device, speech ciphering equipment of voice wake-up model
CN110070857B (en) * 2019-04-25 2021-11-23 北京梧桐车联科技有限责任公司 Model parameter adjusting method and device of voice awakening model and voice equipment

Also Published As

Publication number Publication date
CN108920660B (en) 2020-11-20

Similar Documents

Publication Publication Date Title
US11636341B2 (en) Processing sequential interaction data
CN104598539A (en) Internet event hot degree calculation method and terminal
WO2015007175A1 (en) Subject-matter analysis of tabular data
Moriña et al. Probability estimation of a Carrington-like geomagnetic storm
CN110489558A (en) Polymerizable clc method and apparatus, medium and calculating equipment
CN105630931A (en) Document classification method and device
CN103577989A (en) Method and system for information classification based on product identification
CN106375413A (en) Lawyer information base creation method and apparatus, and lawyer recommendation method, apparatus and system
CN104679768A (en) Method and device for extracting keywords from documents
CN107679680A (en) A kind of financial forward prediction method, apparatus, equipment and storage medium
Lin et al. Convergence of barrier option prices in the binomial model
CN102053978A (en) Method and device for extracting subject term from simple sentence
CN101470724B (en) Character conversion system and method
CN108920660A (en) Keyword weight acquisition methods, device, electronic equipment and readable storage medium storing program for executing
CN105335886A (en) Method and device for processing financial data
CN107644101A (en) Information classification approach and device, information classification equipment and computer-readable medium
CN112560445A (en) Method and device for detecting hot line hot spot appeal topics of captain
CN112599182A (en) Nonvolatile storage life prediction method, device, equipment and medium
CN103646053A (en) Website providing object recommendation method and device
CN114840634B (en) Information storage method and device, electronic equipment and computer readable medium
CN104090918B (en) Sentence similarity calculation method based on information amount
CN110427492A (en) Generate the method, apparatus and electronic equipment of keywords database
CN110059312A (en) Short phrase picking method, apparatus and electronic equipment
CN104391981A (en) Text classification method and device
US20130339003A1 (en) Assisted Free Form Decision Definition Using Rules Vocabulary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant