CN109101485A - A kind of information processing method, device, electronic equipment and computer storage medium - Google Patents
A kind of information processing method, device, electronic equipment and computer storage medium Download PDFInfo
- Publication number
- CN109101485A CN109101485A CN201810745000.2A CN201810745000A CN109101485A CN 109101485 A CN109101485 A CN 109101485A CN 201810745000 A CN201810745000 A CN 201810745000A CN 109101485 A CN109101485 A CN 109101485A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- synonym
- synonym collection
- collection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 26
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000000034 method Methods 0.000 claims description 52
- 239000013598 vector Substances 0.000 claims description 42
- 238000012545 processing Methods 0.000 claims description 21
- 230000011218 segmentation Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000011430 maximum method Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of information processing method, device, electronic equipment and computer storage mediums, word frequency-inverse file frequency (TF-IDF) can be used for assessing words to the significance level of Mr. Yu's file, in current way, only using each words as independent element, so that lower using the accuracy that the TF-IDF value that present practice obtains carries out text classification and information retrieval.The embodiment of the present invention is by obtaining synonym of each text word in text information in text information, and the first synonym collection of text word is obtained according to the synonym of text word and text word, and then it is based on first synonym collection, the second synonym collection of text information is obtained, the TF-IDF value of second synonym collection is finally calculated.Due to considering the synonym relationship in text information between text word, further, it is based on the TF-IDF value, the accuracy of text classification or information retrieval can be improved.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of information processing method, device, electronic equipment and meters
Calculation machine storage medium.
Background technique
Word frequency-inverse file frequency (Term Frequency-Inverse Document Frequency, TF-IDF) is
A kind of weighting technique for text classification and information retrieval, TF-IDF can be used for assessing words to Mr. Yu's file set or certain
The significance level of a copy of it file in corpus.The number that the importance of words occurs in this document with the words
Directly proportional increase, but the frequency that can occur in corpus with the words simultaneously is inversely proportional decline.
In current way, each words as independent element and is only calculated into its TF-IDF value, so that using working as
The TF-IDF value that preceding way obtains carries out text classification and the accuracy of information retrieval is lower.
Summary of the invention
The embodiment of the invention discloses a kind of information processing method, device, electronic equipment and computer storage mediums, can
To obtain the TF-IDF value for the synonym collection that text information includes, further, be conducive to improve text classification and information inspection
The accuracy of rope.
In a first aspect, this method may include: reception information processing the embodiment of the invention discloses information processing method
Request, the information process request includes multiple text informations, and each text information includes at least one text word;Root
The text word for including according to the multiple text information obtains the first synonym collection about the text word, and described first is same
Adopted set of words includes at least one synonym of the text word and the text word;For each text information, really
First coefficient of the fixed text information, first coefficient are synonymous with second comprising the text word in the text information
Set of words is corresponding, and first synonym collection includes second synonym collection, and first coefficient is for establishing institute
State the linear expression relationship between the second synonym collection and the text information;According to the first coefficient of the text information,
Obtain word frequency-inverse file frequency of second synonym collection.
In one implementation, the information process request further includes the destination number of target synonym collection, described
It is described after obtaining word frequency-inverse file frequency of second synonym collection according to the first coefficient of the text information
Method can also include: from second synonym collection determine meet the destination number and word frequency-inverse file frequency compared with
Big target synonym collection.
In one implementation, the specific embodiment of the first coefficient of the determination text information can be with are as follows:
Obtain the word frequency of each text word in the text information, the word frequency of the text word is for establishing the text word and described
Linear expression relationship between text information;Obtain the second synonym collection comprising each text word;For each institute
The second synonym collection is stated, the second coefficient and the text of each text word are directed to according to second synonym collection
The word frequency of this word obtains the first coefficient of second synonym collection.
In one implementation, described that the second of each text word is directed to according to second synonym collection
The word frequency of coefficient and the text word, before obtaining the first coefficient of second synonym collection, the method can also be wrapped
It includes: for each text word in the text information, determining the primary vector of the text word;According to the text word
Primary vector, obtains the secondary vector of the third synonym collection comprising the text word, and second synonym collection includes
The third synonym collection;According to the secondary vector of the primary vector of the text word and the third synonym collection, obtain
Cosine similarity between the text word and the third synonym collection;According to the cosine similarity, obtain described
Third synonym collection is directed to the second coefficient of the text word.
In one implementation, the information process request further includes the quantity of all text informations, described according to institute
The first coefficient for stating text information, the word frequency-inverse file frequency specific embodiment for obtaining second synonym collection can
With are as follows: it sums to all first coefficients of the text information, obtains the first numerical value;Second synonym collection is corresponding
The first coefficient divided by first numerical value, obtain second value;Each text is directed to second synonym collection
First coefficient of information is summed, and third value is obtained;The quantity for all text informations that the information process request includes is removed
Logarithm operation is carried out with the result of the third value, obtains the 4th numerical value;By the second value and the 4th numerical value phase
Multiply, obtains word frequency-inverse file frequency of second synonym collection.
In one implementation, the text word for including according to the multiple text information, obtains about the text
The specific embodiment of first synonym collection of this word can be with are as follows: carries out word segmentation processing to the multiple text information, obtains
Text set of words, the text set of words include at least one text word;It is searched in default database of synonyms each described
The synonym of text word, obtains the 4th synonym collection about each text word, and the 4th synonym collection includes
The synonym of the text word and the text word found;According to the 4th synonym collection, described first is obtained
Synonym collection.
In one implementation, described according to the 4th synonym collection, obtain first synonym collection
Specific embodiment can be with are as follows: determines that all synonyms of text word and the text word are present in other the 4th synonyms
The 4th synonym collection of target in set, other described the 4th synsets are combined into described about each text word
The 4th synonym collection in 4th synonym collection in addition to the 4th synonym collection of target;Will it is described other the 4th
Synonym collection is determined as first synonym collection.
Second aspect, the embodiment of the invention discloses a kind of information processing unit, which includes for executing above-mentioned
The unit of method described in one side.
The third aspect, the embodiment of the invention discloses a kind of electronic equipment, should be described to deposit including memory and processor
For reservoir for storing computer program, the computer program includes program instruction, and the processor is configured for calling institute
Program instruction is stated, method described in above-mentioned first aspect is executed.
Fourth aspect, the embodiment of the invention discloses a kind of computer storage medium, the computer storage medium storage
There is computer program, the computer program includes program instruction, and described program instruction makes the place when being executed by a processor
Reason device executes method described in above-mentioned first aspect.
By implementing the embodiment of the present invention, corresponding second synonym collection of available each text information and the
One coefficient, is based on first coefficient, the TF-IDF value of corresponding second synonym collection of available text information, due to
The TF-IDF value considers in text information existing synonym relationship between text word, further, is based on the TF-IDF
Value, can be improved the accuracy of text classification or information retrieval.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will to embodiment or
Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only
Some embodiments of the present invention, for those of ordinary skill in the art, without creative efforts, also
Other drawings may be obtained according to these drawings without any creative labor.
Fig. 1 is a kind of flow diagram of information processing method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another information processing method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of information processing unit provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its
His embodiment, shall fall within the protection scope of the present invention.
The cardinal principle of the technical solution of the application may include: to be existed by obtaining each text word in text information
Synonym in text information, and the first synonymous of text word is obtained according to the synonym of text word and text word
Set of words, and then it is based on first synonym collection, corresponding second synonym collection of text information is obtained, is finally calculated
The TF-IDF value of second synonym collection is obtained, is existed between text word since the TF-IDF value considers in text information
Synonym relationship, further, be based on the TF-IDF value, the accuracy of text classification or information retrieval can be improved.
Referring to Fig. 1, Fig. 1 is a kind of flow diagram of information processing method provided in an embodiment of the present invention.Specifically
, as shown in Figure 1, the information processing method of the embodiment of the present invention can include but is not limited to following steps:
S101, electronic equipment receive information process request, which includes multiple text informations.
Specifically, electronic equipment can extract the information process request packet in the case where receiving information process request
The multiple text informations included.In one implementation, signal processing request can be terminal device transmission, can also be with
It is that electronic equipment is automatically generated in the case where detecting information processing event.The information processing event can be user's point
Hit the confirmation treatment button triggering in the information processing interface that electronic equipment is shown.Wherein, electronic equipment can be terminal and set
It is standby, it is also possible to server.The terminal device can be smart phone, tablet computer, personal computer (Personal
Computer, PC), smart television, smartwatch, mobile unit, wearable device, the following 5th third-generation mobile communication technology
Terminal device etc. in (the 5th Generation, 5G) network.
Text information can be the combination of a sentence or multiple sentences, be also possible to a paragraph or a chapter,
The embodiment of the present invention is not construed as limiting this.Each text information includes at least one text word, and text word can be calling participle
Algorithm carries out an individual word in the word segmentation result obtained after word segmentation processing to text information.For example, when text information is
When " glad and happy be synonym ", the text word that text information includes can be " happiness ", "and", " happy ",
Any one in "Yes", " synonym ".In one implementation, text word, which can also be, calls segmentation methods to text
Information carries out word segmentation processing, and after removing the stop words in word segmentation result, and one obtained in target word segmentation result is independent
Word.For example, when text information is " glad and happy be synonym ", a text word that text information includes can be with
It is " happiness ", " happy ", any one in " synonym ".Wherein, segmentation methods can include but is not limited to based on character string
Matched segmentation methods (such as Forward Maximum Method method, reverse maximum matching method, minimum cutting, two-way maximum matching method), base
Segmentation methods in understanding and the segmentation methods based on statistics, the embodiment of the present invention are not construed as limiting this.Stop words, which refers to, to be believed
In breath retrieval, to save memory space and improving search efficiency, meeting before or after handling natural language data (or text)
The certain words or word that automatic fitration is fallen.Generally, stop words can be divided into following two class: the first kind is using very extensive, very
To being excessively frequent some words, as in Chinese " I ", " if " etc. words;Second class is that the frequency of occurrences is very high in text,
But practical significance little word, including auxiliary words of mood, adverbial word, preposition, conjunction etc., as " ", " ", "and" word.
The text word that S102, electronic equipment include according to multiple text informations, obtains synonymous about the first of text word
Set of words.
Wherein, which includes at least one synonym of text word and text word.Namely
It says, all synonyms of text word and text word in affiliated text information are present in first synset
In conjunction.For example, when text information is " glad and happy be synonym ", when the text word that text information includes is " happiness ",
Synonym of the text word " happiness " in text information is " happy ", the first synonym about text word " happiness "
Set may include { " happiness ", " happy " }.
In one implementation, multiple text informations can be corresponding with multiple first synonym collections, and text information
In each text word be corresponding with first synonym collection.
In one implementation, corresponding first synonym collection of different text words in same text information can be with
It is identical, it can also be different.In one implementation, corresponding first synonym of same text word in different text informations
Gather identical, corresponding first synonym collection of different text words in different text informations may be the same or different.
For example, when the quantity of text information is 2, and text information 1 and text information 2 be respectively as follows: " it is glad and it is happy be synonymous
When word ", " glad and be happily synonym ", which can correspond to 2 the first synonym collections, and this 2 first
Synonym collection is respectively as follows: { " happiness ", " happy ", " happy " }, { " synonym " }, at this point, the text word in text information 1
Corresponding first synonym collection of text word " happiness ", " happy " in " happiness ", " happy " and text information 2 is { " high
It is emerging ", " happy ", " happy ".
It in one implementation, may include text word and this article about the first synonym collection of text word
All synonyms of this word in affiliated text information can also include the text word in other text informations.For example, aforementioned
First synonym collection { " happiness ", " happy ", " happy " } includes the text word in text information 1 and text information 2.?
In a kind of implementation, the text can be present in about all words for including in the first synonym collection of text word
In text information belonging to word.For example, when the quantity of text information is 2, and text information 1 and text information 2 are respectively as follows:
When " glad and happy be synonym ", " today is Monday ", which can correspond to 4 the first synsets
It closes, which is respectively as follows: { " happiness ", " happy " }, { " synonym " }, { " today " }, { " week
One " }, at this point, the first synonym collection of the text word " happiness " in text information 1 are as follows: { " happiness ", " happy " }, and this
All words in one synonym collection are present in text information 1 belonging to text word " happiness ".
In one implementation, multiple text informations can correspond to first synonym collection.For example, above-mentioned text
This information 1 and corresponding first synonym collection of text information 2 can be with are as follows: " happiness ", " happy ", " synonym ", " today ",
" Monday " }.
S103, electronic equipment are directed to each text information, determine the first coefficient of text information, wherein first system
Number is corresponding with comprising the second synonym collection of text word in text information.
Wherein, the first coefficient can be used for establishing the linear expression between second synonym collection and text information
Relationship, the first synonym collection may include second synonym collection.
In one implementation, each text word in each text information can be corresponding with one it is second synonymous
Set of words, and corresponding second synonym collection of different text words in each text information may be the same or different.
For example, when the quantity of text information is 2, and text information 1 and text information 2 be respectively as follows: " it is glad and it is happy be synonymous
When word ", " today is Monday ", which can correspond to 1 the first synonym collection, first synset
It is combined into: { " happiness ", " happy ", " synonym ", " today ", " Monday " }.Text word " happiness " and text in text information 1
Corresponding second synonym collection of this word " happy " can be identical, and is { " happiness ", " happy " }, the text in text information 1
Corresponding second synset of word " synonym " is combined into { " synonym " }.
In one implementation, text information can be indicated with the second synonym collection.For example, text information 1 can
To be indicated with the second synonym collection { " happiness ", " happy " }, { " synonym " }.Specifically, for indicating text information
Second synonym collection is corresponding with first coefficient, can establish second synonym collection by first coefficient and is somebody's turn to do
Linear expression relationship between text information.For example, the second synonym collection { " happiness ", " happy " }, { " synonym " } correspond to
The first coefficient when being respectively s1 and s2, linear expression relationship between the second synonym collection and text information can be with are as follows:
Text information 1=s1* { " happiness ", " happy " }+s2* { " synonym " }.In one implementation, text information is corresponding
The quantity of the quantity of second synonym collection the first coefficient corresponding with text information is identical, is built with will pass through the first coefficient
Linear expression relationship between vertical second synonym collection and text information.In one implementation, each text information pair
First coefficient can be corresponding with by each of answering the second synonym collection.
According to the first coefficient of text information, the word frequency-for obtaining second synonym collection is inverse for S104, electronic equipment
Document-frequency.
Specifically, electronic equipment can obtain the word of second synonym collection according to the first coefficient of text information
Frequency and inverse document frequency, and then it is based on the word frequency and inverse document frequency, obtain the word of second synonym collection
Frequently-inverse file frequency.
Wherein, the word frequency of second synonym collection can be according to text information (text information and second synonym
Gather corresponding) corresponding all first coefficients obtain.For example, the linear list between the second synonym collection and text information 1
Show relationship are as follows: text information 1=s1* { " happiness ", " happy " }+s2* { " synonym " }, the then word of second synonym collection
Frequency can be obtained according to s1 and s2.
In one implementation, the second synonym collection can correspond to one or more text informations, that is to say, that
Second synonym collection can be used for linear expression one or more text informations corresponding with second synonym collection.
If second synonym collection is corresponding with multiple text informations, the inverse document frequency of second synonym collection can be with
According to second synonym collection, corresponding first coefficient is obtained in corresponding each text information.For example, if second is synonymous
The corresponding text information of set of words { " happiness ", " happy ", " happy " } 1 (" glad and happy be synonym ") and text information 2
(" glad and be happily synonym "), and the linear expression relationship between the second synonym collection and text information 1 are as follows: text
Information 1=s1* { " happiness ", " happy ", " happy " }+s2* { " synonym " }, between the second synonym collection and text information 2
Linear expression relationship are as follows: text information 2=s1 ' * { " happiness ", " happy ", " happy " }+s2 ' * { " synonym " }, then this
The inverse document frequency of two synonym collections can be according to s1 and s1 ' it obtains.
By implementing the embodiment of the present invention, corresponding second synonym collection of available each text information and the
One coefficient, is based on first coefficient, the TF-IDF value of corresponding second synonym collection of available text information, due to
The TF-IDF value considers in text information existing synonym relationship between text word, further, is based on the TF-IDF
Value, can be improved the accuracy of text classification or information retrieval.
Referring to Fig. 2, Fig. 2 is the flow diagram of another information processing method provided in an embodiment of the present invention.Specifically
, as shown in Fig. 2, another information processing method of the embodiment of the present invention can include but is not limited to following steps:
S201, electronic equipment receive information process request, which includes multiple text informations.
It should be noted that the implementation procedure of step S201 may refer to the specific descriptions in Fig. 1 in step S101,
This is not repeated.
S202, electronic equipment carry out word segmentation processing to multiple text informations, obtain text set of words, text set of words
Including at least one text word.
Specifically, electronic equipment can carry out word segmentation processing to multiple text informations, a text set of words is obtained.Its
In, the text word in each text information is present in text set of words.
In one implementation, electronic equipment can carry out word segmentation processing parallel to each text information, and will obtain
Multiple word segmentation results merge, obtain text set of words.
S203, electronic equipment search the synonym of each text word in default database of synonyms, obtain about each
4th synonym collection of text word.Wherein, the 4th synonym collection includes text word and the text word found
Synonym.
In one implementation, electronic equipment can search the same of each text word in default database of synonyms
Adopted word, obtains the 5th synonym collection about each text word, and the 5th synonym collection does not include text word.Into one
Step, the intersection of the 5th synonym collection and aforementioned texts set of words can be determined as the 6th synset by electronic equipment
It closes, the 6th synonym collection then is added in text word, obtains the 4th synonym collection.Wherein, the 4th synset
All words in conjunction are present in the corresponding text information of text word.For example, when the quantity of text information is 2, and text
When this information 1 and text information 2 are respectively as follows: " glad and happy be synonym ", " today is Monday ", two text envelopes
Ceasing corresponding text set of words can be with are as follows: and { " happiness ", " happy ", " synonym ", " today ", " Monday " }, if default
The synonym of text word " happiness " in the text information 1 found in database of synonyms are as follows: " happy " and " happy ", then
5th synonym collection of text word " happiness " are as follows: { " happy ", " happy " }, to the 5th synonym collection and text set of words
Take the 6th synonym collection that intersection obtains are as follows: text word " happiness " is added the 6th synonym collection, obtained by { " happy " }
The 4th synonym collection are as follows: { " happiness ", " happy " }.
In one implementation, which can store in the electronic equipment, or be stored in
Yun Shang.In one implementation, which can also be stored in another electronic equipment, another electricity
Sub- equipment can be with the electronic equipment by wired or wirelessly establish connection, so that the electronic equipment can be inquired
The default database of synonyms being stored in another electronic equipment, the embodiment of the present invention deposit default database of synonyms
Storage space, which is set, to be not construed as limiting.
In one implementation, electronic equipment can also be by the synonym of each text word of web search, to obtain
The 4th synonym collection about each text word.
S204, electronic equipment obtain the first synonym collection according to the 4th synonym collection.
In one implementation, electronic equipment obtains the tool of the first synonym collection according to the 4th synonym collection
Body embodiment can be with are as follows: it is the 4th same that electronic equipment determines that all synonyms of text word and text word are present in other
The 4th synonym collection of target in adopted set of words, and other the 4th synonym collections are determined as the first synonym collection.
Wherein, which is combined into the 4th synonym collection about each text word except the target the 4th is synonymous
The 4th synonym collection other than set of words.For example, when the quantity of text information is 2, and text information 1 and text information 2
When being respectively as follows: " glad and happy be synonym ", " I is very glad ", corresponding 4th synonym collection of text information 1 be can wrap
Include: { " happiness ", " happy " }, { " synonym " }, corresponding 4th synonym collection of text information 2 may include: { " I " },
{ " very " }, { " happiness " }, the institute in corresponding 4th synonym collection { " happiness " } of text word " happiness " in text information 2
There is word to be present in 1 corresponding 4th synonym collection { " happiness ", " happy " } of text information, at this point, in text information 2
Corresponding 4th synonym collection { " happiness " } of text word " happiness " be the 4th synonym collection of target, text information 1 is corresponding
The 4th synonym collection { " happiness ", " happy " } be other the 4th synonym collections.
In one implementation, electronic equipment obtains the tool of the first synonym collection according to the 4th synonym collection
Body embodiment can be with are as follows: electronic equipment according to the quantity of the word in the 4th synonym collection, to the 4th synonym collection into
Row sequence;Compression processing is carried out to the 4th synonym collection circulation after sequence, obtains the first synonym collection;Wherein, described
If compression processing includes: first set, all words for including are present in second set, the first set are deleted, by
Two set are determined as the first synonym collection, and the first set and second set are two the 4th different synonym collections.
In one implementation, the specific embodiment that electronic equipment is ranked up the 4th synonym collection can be with
Are as follows: electronic equipment carries out descending sort to the 4th synonym collection, alternatively, electronic equipment rises the 4th synonym collection
Sequence sequence, the embodiment of the present invention are not construed as limiting this.
S205, electronic equipment obtain the word frequency of each text word in text information.Wherein, the word frequency of text word is used
In the linear expression relationship established between text word and text information.
In one implementation, the word frequency of text word can be time that text word occurs in affiliated text information
Number.In one implementation, the word frequency of text word can be number that text word occurs in affiliated text information divided by
The value that the number summation that all text words in text information occur in text information obtains.In a kind of implementation
In, the word frequency of text word can be number that text word occurs in affiliated text information divided by the first text word in this article
The value that the number occurred in this information obtains, wherein it is most that the first text word can be in text information frequency of occurrence
Text word.
In one implementation, text word and the text can be established with the word frequency of text word and text word
Linear expression relationship between information.For example, when the word frequency of text word is time that text word occurs in affiliated text information
Number, and text information 1 be " it is glad and it is happy be synonym, I is very glad " when, the word frequency of text word " happiness " is 2, text word
The word frequency of " happy " is 1, and the word frequency of text word " synonym " is 1, and the word frequency of text word " very " is 1.At this point, text word and text
Linear expression relationship between this information 1 can be with are as follows: text information 1=" happiness " * 2+ " happy "+" synonym "+" very ".
S206, electronic equipment obtain the second synonym collection comprising each text word.Wherein, aforementioned first synonym
Set may include second synonym collection.
In one implementation, electronic equipment can store corresponding second synonym collection of each text information, often
The union of corresponding all second synonym collections of a text information includes all text words in text information.For example, literary
This information 1 (including m text word) be corresponding with 3 the second synonym collections (as set 1, set 2, set 3), then gather 1,
The union of set 2 and set 3 includes the m text word.
In one implementation, after electronic equipment obtains the second synonym collection comprising each text word, may be used also
To execute step s2061-s2064:
S2061: for each text word in text information, the primary vector of text word is determined.
Wherein, which can be used for unique identification text word.In one implementation, the primary vector
It can be term vector, term vector is used to the words in natural language switching to the dense vector that computer is understood that.In one kind
In implementation, which can be one-hot vector.For example, it is assumed that the quantity of different text words is N, each text word
It can be corresponded with the continuous integral number from 0 to N-1, if the respective integer of a text word is expressed as i, this article in order to obtain
The one-hot vector of this word, can create a full 0 and a length of N's vector, and its i-th bit is set as 1.For example, when N is
When 3, the term vector of text word can be with are as follows: [1,0,0].In one implementation, which, which can also be, passes through
Word2vec model or other models obtain, and the embodiment of the present invention is not construed as limiting this.For example, the term vector of text word can be with
Are as follows: [0.5,0.3,0.2].
S2062: according to the primary vector of text word, second of the third synonym collection comprising text word is obtained
Vector.Wherein, aforementioned second synonym collection includes the third synonym collection.
In one implementation, text word can correspond to one or more third synonym collections.If text word
, then there is text word in each third synonym collection in corresponding multiple third synonym collections.
Wherein, the secondary vector of third synonym collection can be used for the unique identification third synonym collection.In one kind
In implementation, the secondary vector of third synonym collection can be the of all text words in the third synonym collection
The arithmetic mean of instantaneous value of one vector.For example, third synset is combined into { " happiness ", " happy " }, and the first of text word " happiness "
Vector is [0.5,0.3,0.2], when the primary vector of text word " happy " is [0.4,0.1,0.2], third synonym collection
Secondary vector are as follows: ([0.5,0.3,0.2]+[0.4,0.1,0.2])/2.
S2063: according to the secondary vector of the primary vector of text word and the third synonym collection, the text is obtained
Cosine similarity between word and the third synonym collection.
In one implementation, electronic equipment can be by the primary vector of text word and the third synonym collection
The dot product of secondary vector is determined as the cosine similarity between text word and the third synonym collection.
S2064: according to the cosine similarity, the second coefficient that the third synonym collection is directed to text word is obtained.
In one implementation, electronic equipment can determine the corresponding all third synonym collections of text word with
The summation of cosine similarity between text word, and the cosine between text word and the third synonym collection is similar
Degree obtains the second coefficient that the third synonym collection is directed to text word divided by the summation.It should be noted that working as this article
When this word is corresponding with multiple third synonym collections, all third synonym collections are directed to the total of the second coefficient of text word
Be 1.
In one implementation, third synonym collection can be established by the second coefficient and third synonym collection
Linear expression relationship between text word.For example, text word " happiness " is corresponding, there are two third synonym collections, wherein
Third synonym collection 1 is { " happiness ", " happy " }, and third synonym collection 2 is { " happiness ", " happy " }, and third is synonymous
Set of words 1 is 0.4 for the second coefficient of text word, and third synonym collection 2 is for the second coefficient of text word
0.6, then the linear expression relationship between third synonym collection and text word can be with are as follows: text word " happiness "=0.4*
{ " happiness ", " happy " }+0.6* { " happiness ", " happy " }.
S207, electronic equipment are directed to each second synonym collection, are directed to each text according to second synonym collection
Second coefficient of word and the word frequency of text word, obtain the first coefficient of second synonym collection.
Text word and text information can be established with the word frequency of text word and text word by mentioning in step S205
Between linear expression relationship, mention that can to establish third by the second coefficient and third synonym collection same in step s2064
Linear expression relationship between adopted set of words and text word, it is to be understood that can be built based on step S205 and s2064
Linear expression relationship between vertical third synonym collection and text information further can establish the second synset
Close the linear expression relationship between text information.
For example, when text information is " glad and happy be synonym ", and the third synonym collection of text word " happiness "
For { " happiness ", " happy " }, the third synset of text word " happy " is combined into { " happiness ", " happy " }, and text word is " synonymous
When the third synset of word " is combined into { " synonym " }, it can be indicated with third synonym collection { " happiness ", " happy " }
Text word " happiness " can indicate text word " happy " with third synonym collection { " happiness ", " happy " }, can use
Three synonym collections { " synonym " } indicate text word " synonym ", but due to can use text word " happiness ", " happy " and
" synonym " indicates text information, it is possible to use third synonym collection { " happiness ", " happy " } and { " synonym " }
Text information is indicated, at this point, it is corresponding second same that { " happiness ", " happy " } and { " synonym " } is determined as text information
Adopted set of words.
In one implementation, when the text word in text information 1 is text word 1 and text word 2, text word 1 is corresponding
Third synset be combined into set a (for text word 1 the second coefficient be 0.4) and set b (for the second of text word 1
Coefficient is that 0.6), the corresponding third synset of text word 2 is combined into set b (the second coefficient for text word 2 is 1), and text
The word frequency of this word 1 is 2, when the word frequency of text word 2 is 1, the linear expression relationship between text word and text information 1 are as follows: text
Information 1=2* text word 1+ text word 2.Linear expression relationship between third synonym collection and each text word is substituted into
Above formula, the linear expression relationship between available third synonym collection and text information 1 are as follows: text information 1=2*
(0.4* set a+0.6* set b)+set b=0.8* set a+2.2* set b.Wherein, set a is corresponding with text word 1
Third synonym collection, meanwhile, set a is also the second synonym collection for text information 1, and set b is similarly.Therefore,
For text information 1 the second synonym collection (the first coefficient of set a) be 0.8, for the second of text information 1
(the first coefficient of set b) is 2.2 to synonym collection.
According to the first coefficient of text information, the word frequency-for obtaining second synonym collection is inverse for S208, electronic equipment
Document-frequency.
In one implementation, electronic equipment is executed according to the first coefficient of text information, and it is second synonymous to obtain this
It is specific to execute step s2081-s2085 when the word frequency of set of words-inverse file frequency step:
S2081: it sums to all first coefficients of text information, obtains the first numerical value.
Specifically, electronic equipment can obtain the first coefficient of corresponding all second synonym collections of text information
Afterwards, it sums to all first coefficients, obtains the first numerical value.For example, if the second synonym collection (set a and set b) and text
Linear expression relationship between information 1 are as follows: text information 1=0.8* set a+2.2* set b, then the first numerical value is 0.8+2.2
=3.
S2082: by corresponding first coefficient of second synonym collection divided by first numerical value, second value is obtained.Tool
Body, the quantity of the quantity of the corresponding second value of text information the second synonym collection corresponding with text information is identical, electricity
Sub- equipment can be with each second value of parallel computation.For example, if the second synonym collection (set a and set b) and text information
Linear expression relationship between 1 are as follows: text information 1=0.8* set a+2.2* set b, then text information 1 corresponding one
Two numerical value are 0.8/3, another the corresponding second value of text information 1 is 2.2/3.
S2083: the first coefficient to second synonym collection for each text information is summed, and third value is obtained.
In one implementation, same second synonym collection can correspond to multiple text informations, and each text information can be used
The second synonym collection linear expression, second synonym collection correspond to each text information and there is first coefficient,
Electronic equipment can sum to second synonym collection for the first coefficient of each text information, obtain third value.Example
Such as, the second synonym collection (the corresponding text information 1 of set a) and text information 2, text information 1 and corresponding second synonym
(the linear expression relationship between set a and set b) are as follows: text information 1=0.8* set a+2.2* set b, text envelope of set
Breath 2 and corresponding second synonym collection (the linear expression relationship between set a and set c) are as follows: text information 2=0.6*
Set a+1.3* set c, then (set a) is 0.8 for the first coefficient of text information 1 to the second synonym collection, and second is synonymous
(set a) is 0.6 for the first coefficient of text information 2 to set of words, i.e., third value is 0.8+0.6=1.4.
S2084: the quantity for all text informations that information process request includes is carried out divided by the result of the third value
Logarithm operation obtains the 4th numerical value.
In one implementation, aforementioned information processing request can also include the quantity of all text informations.For example,
If the quantity of all text informations is 2, third value 1.4, then the 4th numerical value is lg (2/1.4).
S2085: aforementioned second value is multiplied with the 4th numerical value, obtains the inverse text of word frequency-of second synonym collection
Part frequency.
In one implementation, each text information can be corresponding with one or more second synonym collections, when this
When text information corresponds to multiple second synonym collections, electronic equipment can be with the word of each second synonym collection of parallel computation
Frequently-inverse file frequency.For example, if corresponding second synonym collection of text information 1 are as follows: set a and set b, and a pairs of set
The second value answered be 0.8/3 and the 4th numerical value be (2/1.4) lg, the corresponding second value of set b is 2.2/3 and the 4th number
Value is lg (2/1.3), then the second synonym collection (word frequency of set a)-inverse file frequency is (0.8/3) * lg (2/1.4), the
(word frequency of set b)-inverse file frequency is (2.2/3) * lg (2/1.3) to two synonym collections.
S209, electronic equipment from second synonym collection determine meet destination number and word frequency-inverse file frequency compared with
Big target synonym collection.
In one implementation, aforementioned information processing request can also include the destination number of target synonym collection,
Wherein, if second synonym collection corresponds to text information 1, which is combined into is associated with text information 1
Highest second synonym collection of degree.
In one implementation, electronic equipment can be first by corresponding multiple second synonym collections of text information
Maximum second synonym collection of middle word frequency-inverse file frequency is determined as target synonym collection, then again will be same except the target
Maximum second synonym collection of word frequency-inverse file frequency is determined as in multiple second synonym collections except adopted set of words
Target synonym collection, until obtaining the target synonym collection that quantity is destination number.
In one implementation, electronic equipment can be according to word frequency-inverse file frequency of the second synonym collection from big
To small sequence, the second synonym collection of selection target quantity is as target synonym from multiple second synonym collections
Set.For example, if destination number is 1, corresponding second synonym collection of text information 1 are as follows: set a and set b, and set a
Word frequency-inverse file frequency be 0.4, word frequency-inverse file frequency of set b is 0.8, then the target synset of text information 1
It is combined into set b.It should be noted that numerical value involved in above-mentioned example is only used for illustrating, the embodiment of the present invention is not constituted
It limits.
In one implementation, electronic equipment may also receive from the information retrieval requests of terminal device, the information
Retrieval request includes term, and the synonym collection of the available term of electronic equipment obtains and the synonym collection packet
Identical each second synonym collection of the word contained, and each second synonym collection is obtained in corresponding text information
Then word frequency-inverse file frequency sends the corresponding text information of maximum second synonym collection of word frequency-inverse file frequency
To terminal device.Alternatively, electronic equipment can also be by the second synset of word frequency-biggish preset quantity of inverse file frequency
It closes corresponding text information and is sent to terminal device.Wherein, which can be electronic equipment default setting, can also
To be that terminal device is sent to electronic equipment, the embodiment of the present invention is not construed as limiting this.
By implementing the embodiment of the present invention, corresponding second synonym collection of available each text information and the
One coefficient, is based on first coefficient, the TF-IDF value of corresponding second synonym collection of available text information, due to
The TF-IDF value considers in text information existing synonym relationship between text word, further, is based on the TF-IDF
Value, can be improved the accuracy of text classification or information retrieval.
Fig. 3 is referred to, Fig. 3 is a kind of structural schematic diagram of information processing unit provided in an embodiment of the present invention, specifically
, as shown in figure 3, the information processing unit 30, may include:
Receiving unit 301, for receiving information process request, which includes multiple text informations, each
Text information includes at least one text word.
Processing unit 302, the text word for including according to multiple text informations obtain same about the first of text word
Adopted set of words, first synonym collection include at least one synonym of text word and text word.
The processing unit 302 is also used to determine the first coefficient of text information, the first system for each text information
Number is corresponding with comprising the second synonym collection of text word in text information, and the first synonym collection includes second same
Adopted set of words, the first coefficient are used for the linear expression relationship established between the second synonym collection and text information.
The processing unit 302 is also used to the first coefficient according to text information, obtains second synonym collection
Word frequency-inverse file frequency.
In one implementation, information process request can also include the destination number of target synonym collection, at this
Manage unit 302, can be also used for from the second synonym collection determine meet the destination number and word frequency-inverse file frequency compared with
Big target synonym collection.
In one implementation, which is specifically used for: obtaining each text word in text information
Word frequency, the linear expression relationship that the word frequency of text word is used to establish between text word and text information;Obtaining includes each text
Second synonym collection of this word;For each second synonym collection, each text word is directed to according to the second synonym collection
The second coefficient and text word word frequency, obtain the first coefficient of the second synonym collection.
In one implementation, the processing unit 302 can be also used for for each text in text information
Word determines the primary vector of text word;According to the primary vector of text word, the third synonym collection comprising text word is obtained
Secondary vector, the second synonym collection includes third synonym collection;According to the primary vector of text word and third synonym
The secondary vector of set obtains the cosine similarity between text word and third synonym collection;According to the cosine similarity,
Obtain the second coefficient that third synonym collection is directed to text word.
In one implementation, aforementioned information processing request can also include the quantity of all text informations, the processing
Unit 302 is specifically used for: summing to all first coefficients of text information, obtains the first numerical value;By the second synonym collection
Corresponding first coefficient obtains second value divided by first numerical value;To the second synonym collection for each text information
The summation of first coefficient, obtains third value;The quantity for all text informations for including to information process request is divided by the third number
The result of value carries out logarithm operation, obtains the 4th numerical value;Aforementioned second value is multiplied with the 4th numerical value, it is same to obtain second
The word frequency of adopted set of words-inverse file frequency.
In one implementation, which is specifically used for: carrying out word segmentation processing to multiple text informations, obtains
To text set of words, text set of words includes at least one text word;Each text word is searched in default database of synonyms
Synonym, obtain the 4th synonym collection about each text word, the 4th synonym collection includes text word and lookup
The synonym of the text word arrived;According to the 4th synonym collection, the first synonym collection is obtained.
In one implementation, which is specifically used for: determining all same of text word and text word
Adopted word is present in the 4th synonym collection of target in other the 4th synonym collections, which is combined into
About the 4th synonym collection in the 4th synonym collection of each text word in addition to the 4th synonym collection of target;
Other the 4th synonym collections are determined as the first synonym collection.
Embodiment of the method shown in the embodiment of the present invention and Fig. 1, Fig. 2 is based on same design, bring technical effect also phase
Together, concrete principle please refers to the description of Fig. 1, embodiment illustrated in fig. 2, and this will not be repeated here.
Referring to Fig. 4, Fig. 4 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.The electronic equipment
40 may include receiver 401, memory 402 and processor 403, and receiver 401, memory 402 and processor 403 pass through
The connection of one or more communication bus.
Receiver 401 can be used for receiving data, for example, receiver 401 can be used for receiving information process request.
Memory 402 may include read-only memory and random access memory, and to processor 403 provide instruction and
Data.The a part of of memory 402 can also include nonvolatile RAM.
Processor 403 can be central processing unit (Central Processing Unit, CPU), the processor 403
It can also be other general processors, digital signal processor (Digital Signal Processor, DSP), dedicated integrated
Circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic device
Part, discrete hardware components etc..General processor can be microprocessor, and optionally, which is also possible to any normal
The processor etc. of rule.Wherein:
Memory 402, for storing program instruction.
Processor 403, for calling the program instruction stored in memory 402, to be used for:
Information process request is received, which includes multiple text informations, and each text information includes at least
One text word;
The text word for including according to multiple text informations obtains the first synonym collection about text word, this is first same
Adopted set of words includes at least one synonym of text word and text word;
For each text information, determine the first coefficient of text information, the first coefficient with comprising in text information
Text word the second synonym collection it is corresponding, the first synonym collection includes the second synonym collection, and the first coefficient is used for
Establish the linear expression relationship between the second synonym collection and text information;
According to the first coefficient of text information, word frequency-inverse file frequency of second synonym collection is obtained.
In one implementation, information process request can also include the destination number of target synonym collection, processing
Device 403 can be also used for the determination from the second synonym collection and meet the destination number and word frequency-biggish mesh of inverse file frequency
Mark synonym collection.
In one implementation, when which is used to determine the first coefficient of text information, it is specifically used for
The word frequency of each text word in text information is obtained, the word frequency of text word is for establishing between text word and text information
Linear expression relationship;Obtain the second synonym collection comprising each text word;For each second synonym collection, root
According to the second synonym collection for the second coefficient of each text word and the word frequency of text word, the second synonym collection is obtained
The first coefficient.
In one implementation, which can be also used for for each text word in text information,
Determine the primary vector of text word;According to the primary vector of text word, the of the third synonym collection comprising text word is obtained
Two vectors, the second synonym collection include third synonym collection;According to the primary vector of text word and third synonym collection
Secondary vector, obtain the cosine similarity between text word and third synonym collection;According to the cosine similarity, obtain
Third synonym collection is directed to the second coefficient of text word.
In one implementation, aforementioned information processing request can also include the quantity of all text informations, the processing
Device 403 is used for the first coefficient according to text information, when obtaining word frequency-inverse file frequency of second synonym collection, tool
Body is used to sum to all first coefficients of text information, obtains the first numerical value;By the second synonym collection corresponding first
Coefficient obtains second value divided by first numerical value;The first coefficient to the second synonym collection for each text information is asked
With obtain third value;The quantity for all text informations for including to information process request divided by the third value result into
Row logarithm operation obtains the 4th numerical value;Aforementioned second value is multiplied with the 4th numerical value, obtains the second synonym collection
Word frequency-inverse file frequency.
In one implementation, which is used for the text word for including according to multiple text informations, is closed
When the first synonym collection of text word, it is specifically used for carrying out word segmentation processing to multiple text informations, obtains text word set
It closes, text set of words includes at least one text word;The synonym of each text word is searched in default database of synonyms,
Obtain the 4th synonym collection about each text word, the 4th synonym collection includes text word and this article for finding
The synonym of this word;According to the 4th synonym collection, the first synonym collection is obtained.
In one implementation, which is used to obtain the first synset according to the 4th synonym collection
When conjunction, specifically for determining that all synonyms of text word and text word are present in other the 4th synonym collections
The 4th synonym collection of target, which, which is combined into the 4th synonym collection about each text word, removes
The 4th synonym collection other than the 4th synonym collection of target;It is same that other the 4th synonym collections are determined as first
Adopted set of words.
It should be noted that the specific implementation of unmentioned content and each step in the corresponding embodiment of Fig. 4
It can be found in Fig. 1-embodiment illustrated in fig. 3 and foregoing teachings, which is not described herein again.
The embodiment of the present invention also provides a kind of computer readable storage medium, and computer-readable recording medium storage has meter
Calculation machine program, computer program includes program instruction, when program instruction is executed by processor, processor is made to execute such as Fig. 1-Fig. 2
Performed step in shown embodiment of the method.
Above disclosed is only section Example of the invention, cannot limit the power of the present invention with this certainly
Sharp range, those skilled in the art can understand all or part of the processes for realizing the above embodiment, and weighs according to the present invention
Benefit requires made equivalent variations, still belongs to the scope covered by the invention.
Claims (10)
1. a kind of information processing method characterized by comprising
Information process request is received, the information process request includes multiple text informations, and each text information includes extremely
A few text word;
The text word for including according to the multiple text information obtains the first synonym collection about the text word, described
First synonym collection includes at least one synonym of the text word and the text word;
For each text information, the first coefficient of the text information is determined, first coefficient and include the text
Second synonym collection of the text word in this information is corresponding, and first synonym collection includes second synset
It closes, first coefficient is used for the linear expression relationship established between second synonym collection and the text information;
According to the first coefficient of the text information, word frequency-inverse file frequency of second synonym collection is obtained.
2. the method according to claim 1, wherein the information process request further includes target synonym collection
Destination number, first coefficient according to the text information obtains word frequency-inverse file of second synonym collection
After frequency, the method also includes:
Determination meets the destination number from second synonym collection and word frequency-biggish target of inverse file frequency is synonymous
Set of words.
3. the method according to claim 1, wherein the first coefficient of the determination text information, comprising:
The word frequency of each text word in the text information is obtained, the word frequency of the text word is for establishing the text word and institute
State the linear expression relationship between text information;
Obtain the second synonym collection comprising each text word;
For each second synonym collection, the second of each text word is directed to according to second synonym collection
The word frequency of coefficient and the text word obtains the first coefficient of second synonym collection.
4. according to the method described in claim 3, it is characterized in that, described be directed to each institute according to second synonym collection
State the second coefficient of text word and the word frequency of the text word, before obtaining the first coefficient of second synonym collection, institute
State method further include:
For each text word in the text information, the primary vector of the text word is determined;
According to the primary vector of the text word, the secondary vector of the third synonym collection comprising the text word, institute are obtained
Stating the second synonym collection includes the third synonym collection;
According to the secondary vector of the primary vector of the text word and the third synonym collection, the text word and institute are obtained
State the cosine similarity between third synonym collection;
According to the cosine similarity, the second coefficient that the third synonym collection is directed to the text word is obtained.
5. method according to any one of claims 1 to 4, which is characterized in that the information process request further includes owning
The quantity of text information, first coefficient according to the text information, the word frequency-for obtaining second synonym collection are inverse
Document-frequency, comprising:
All first coefficients summation to the text information, obtains the first numerical value;
By corresponding first coefficient of second synonym collection divided by first numerical value, second value is obtained;
The first coefficient to second synonym collection for each text information is summed, and third value is obtained;
Logarithm is carried out divided by the result of the third value to the quantity for all text informations that the information process request includes
Operation obtains the 4th numerical value;
The second value is multiplied with the 4th numerical value, obtains word frequency-inverse file frequency of second synonym collection.
6. method according to any one of claims 1 to 4, which is characterized in that described according to the multiple text information packet
The text word included obtains the first synonym collection about the text word, comprising:
Word segmentation processing is carried out to the multiple text information, obtains text set of words, the text set of words includes at least one
Text word;
The synonym that each text word is searched in default database of synonyms obtains the about each text word
Four synonym collections, the 4th synonym collection include the synonym of the text word and the text word found;
According to the 4th synonym collection, first synonym collection is obtained.
7. according to the method described in claim 6, obtaining described it is characterized in that, described according to the 4th synonym collection
First synonym collection, comprising:
Determine that all synonyms of text word and the text word are present in the target the 4th in other the 4th synonym collections
Synonym collection, other described the 4th synsets are combined into the 4th synonym collection about each text word and remove
The 4th synonym collection other than the 4th synonym collection of target;
Other described the 4th synonym collections are determined as first synonym collection.
8. a kind of information processing unit, which is characterized in that described device includes for executing such as any one of claim 1~7 institute
The unit for the method stated.
9. a kind of electronic equipment, which is characterized in that including memory and processor, the memory is for storing computer journey
Sequence, the computer program include program instruction, and the processor is configured for calling described program instruction, execute such as right
It is required that 1~7 described in any item methods.
10. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with computer program, described
Computer program includes program instruction, and described program instruction makes the processor execute such as claim when being executed by a processor
1~7 described in any item methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810745000.2A CN109101485B (en) | 2018-07-09 | 2018-07-09 | Information processing method and device, electronic equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810745000.2A CN109101485B (en) | 2018-07-09 | 2018-07-09 | Information processing method and device, electronic equipment and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109101485A true CN109101485A (en) | 2018-12-28 |
CN109101485B CN109101485B (en) | 2022-07-29 |
Family
ID=64845870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810745000.2A Active CN109101485B (en) | 2018-07-09 | 2018-07-09 | Information processing method and device, electronic equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109101485B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112825078A (en) * | 2019-11-21 | 2021-05-21 | 北京沃东天骏信息技术有限公司 | Information processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104298715A (en) * | 2014-09-16 | 2015-01-21 | 北京航空航天大学 | TF-IDF based multiple-index result merging and sequencing method |
CN104778276A (en) * | 2015-04-29 | 2015-07-15 | 北京航空航天大学 | Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency) |
CN106502990A (en) * | 2016-10-27 | 2017-03-15 | 广东工业大学 | A kind of microblogging Attribute selection method and improvement TF IDF method for normalizing |
CN108132930A (en) * | 2017-12-27 | 2018-06-08 | 曙光信息产业(北京)有限公司 | Feature Words extracting method and device |
-
2018
- 2018-07-09 CN CN201810745000.2A patent/CN109101485B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104298715A (en) * | 2014-09-16 | 2015-01-21 | 北京航空航天大学 | TF-IDF based multiple-index result merging and sequencing method |
CN104778276A (en) * | 2015-04-29 | 2015-07-15 | 北京航空航天大学 | Multi-index combining and sequencing algorithm based on improved TF-IDF (term frequency-inverse document frequency) |
CN106502990A (en) * | 2016-10-27 | 2017-03-15 | 广东工业大学 | A kind of microblogging Attribute selection method and improvement TF IDF method for normalizing |
CN108132930A (en) * | 2017-12-27 | 2018-06-08 | 曙光信息产业(北京)有限公司 | Feature Words extracting method and device |
Non-Patent Citations (2)
Title |
---|
周由 等: "语义分析与TF-IDF方法相结合的新闻推荐技术", 《计算机科学》 * |
徐建民 等: "基于量化同义词关系的改进特征词提取方法", 《河北大学学报(自然科学版)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112825078A (en) * | 2019-11-21 | 2021-05-21 | 北京沃东天骏信息技术有限公司 | Information processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109101485B (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111475729B (en) | Search content recommendation method and device | |
CN105069143B (en) | Extract the method and device of keyword in document | |
US9613025B2 (en) | Natural language question answering system and method, and paraphrase module | |
US7774197B1 (en) | Modular approach to building large language models | |
CN106407280A (en) | Query target matching method and device | |
CN110083681A (en) | Searching method, device and terminal based on data analysis | |
CN116186200B (en) | Model training method, device, electronic equipment and storage medium | |
CN110019832B (en) | Method and device for acquiring language model | |
CN112732898A (en) | Document abstract generation method and device, computer equipment and storage medium | |
CN113095065B (en) | Chinese character vector learning method and device | |
CN109101485A (en) | A kind of information processing method, device, electronic equipment and computer storage medium | |
CN113743090A (en) | Keyword extraction method and device | |
CN117633190A (en) | Question-answer pair generation method, computer device and storage medium | |
CN111401070B (en) | Word meaning similarity determining method and device, electronic equipment and storage medium | |
CN110059312A (en) | Short phrase picking method, apparatus and electronic equipment | |
Heinrich et al. | A transnational analysis of news and tweets about nuclear phase-out in the aftermath of the Fukushima incident | |
CN110413735B (en) | Question and answer retrieval method and system, computer equipment and readable storage medium | |
CN112926295A (en) | Model recommendation method and device | |
CN114115878A (en) | Workflow node recommendation method and device | |
CN110428814B (en) | Voice recognition method and device | |
CN109801710A (en) | Capacity determination method and device, terminal and computer readable storage medium | |
CN108763208A (en) | Topic information acquisition methods, device, server and computer readable storage medium | |
CN114626340B (en) | Behavior feature extraction method based on mobile phone signaling and related device | |
CN111914536B (en) | Viewpoint analysis method, viewpoint analysis device, viewpoint analysis equipment and storage medium | |
CN118377783B (en) | SQL sentence generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231213 Address after: 430070 Hubei Province, Wuhan city Hongshan District Luoyu Road No. 546 Patentee after: HUBEI CENTRAL CHINA TECHNOLOGY DEVELOPMENT OF ELECTRIC POWER Co.,Ltd. Address before: 400000 Floor 9, building 11, Internet Industrial Park, No. 106, west section of Jinkai Avenue, Yubei District, Chongqing Patentee before: CHONGQING XIEZHI TECHNOLOGY Co.,Ltd. |