CN108829669A - Support the term vector generation method and device of polarity differentiation and ambiguity - Google Patents
Support the term vector generation method and device of polarity differentiation and ambiguity Download PDFInfo
- Publication number
- CN108829669A CN108829669A CN201810557309.9A CN201810557309A CN108829669A CN 108829669 A CN108829669 A CN 108829669A CN 201810557309 A CN201810557309 A CN 201810557309A CN 108829669 A CN108829669 A CN 108829669A
- Authority
- CN
- China
- Prior art keywords
- term vector
- target word
- word
- justice
- institute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Abstract
The application provides a kind of term vector generation method and device for supporting polarity differentiation and ambiguity, and the method is weighted the value of each dimension in target word term vector, generates new term vector according to term vector established under current business scene and resource file.The application respectively by institute's ariyoshi member number in resource file, include the first number of justice under the most semanteme of the first quantity of justice, and the first number of justice under most association semanteme, determine the operation weight of each dimension values in new term vector, to which the term vector according to operation weight in target word and justice member is weighted summation, obtains new term vector and determine true semantic.Method being capable of the new term vector of dynamic generation, new term vector more accurately reacts practical semantic features, and operation weight is determined based on semantic information, the influence of antonym and polysemy situation to matching result can be significantly improved, solves the problems, such as that the term vector of conventional method building is easy matching error in polysemy and antonym.
Description
Technical field
This application involves machine learning techniques field more particularly to a kind of term vector generations for supporting polarity differentiation and ambiguity
Method and device.
Background technique
Term vector is to allow a kind of word representation of computer understanding human language by language digit.Term vector
A word can be indicated by the vector of certain dimension, and discloses the incidence relation between the word and other words, such as
[0.792, -0.177, -0.107,0.109, -0.542 ... ...], term vector generally by CBOW in term vector training pattern,
The methods of Skip, GloVe are trained to be obtained, the occurrence of each dimension in term vector, according to the training of the corpus of collection and corpus
Mode determines.Term vector can be applied during intelligent answer or text classification, by carrying out to text information and term vector
Matching, determines the meaning of text information.
In actual text information process, single word can correspond to a variety of semantemes, in order to which computer can identify, every kind
Semanteme is indicated by multiple adopted members.Adopted member is that one kind is most basic, is not easy to divide the semantic primitive of meaning again.For example, word
" apple " at least means that two kinds of semantemes, i.e. Apple Inc. and fruit, wherein multiple justice members are corresponded under Apple Inc.'s semanteme,
Such as:Specific brand, computer etc., and corresponding adopted member is under fruit is semantic:Tree, fruit.In actual use, collect each word with
And the corresponding lower semanteme and justice member of word may be constructed the resource file that can be called directly, such as by OEC, Chinese thesaurus,
The resource file of the offers such as HowNet.In the prior art, the term vector obtained by training method is the training corpus with collection
The appearance situation of middle target word arranges the semanteme of word.It, can not in term vector when target word has polysemy
Multiple semantemes of target word are fully demonstrated, it, can accurately not when so that carrying out intelligent answer or text classification using this term vector
Match true semanteme of the target word in context of use.
In addition, during actual text information processing, it usually needs according to distance between the term vector of two words come
Determine the semantic relevance between corresponding word, for example, the distance between word is determined by Euclidean distance or COS distance,
It is generally acknowledged that the semantic similarity apart from two nearest words.But when being matched using term vector obtained by the above method,
Really antonym semantically opposite comprising some semantemes apart from close word.For example, " raising " and " reduction " and " credit card volume
The distance of degree " is all close, and when being matched by above-mentioned term vector model, the case where being likely to result in erroneous judgement will improve and believe
It is matched to and is reduced in credit card amount with card amount, raising credit card amount.
Summary of the invention
This application provides a kind of term vector generation methods and device for supporting polarity differentiation and ambiguity, to solve tradition side
The term vector of method building is easy the problem of matching error when having polysemy and antonym.
In a first aspect, the application provides a kind of term vector generation method for supporting polarity differentiation and ambiguity, including:
The term vector model and resource file under current business scene are obtained, the resource file includes current business scene
Under the corresponding justice member of multiple semantemes;
The corresponding original term vector of target word is determined according to the term vector model;The target word is extracted in the resource
Corresponding semantic information in file, institute's semantic information include the number of the justice member and each justice member appearance under multiple semantemes;
Operation weight is determined according to institute's semantic information and the target word calculated value of setting;
According to the operation weight, it is weighted summation operation respectively per one-dimensional value in the original term vector, it is raw
At the corresponding new term vector of the target word.
Optionally, operation weight is determined according to the target word calculated value of institute's semantic information and setting, including:
It according to institute's semantic information, counts under the corresponding all semantemes of current goal word, justice member and each justice member appearance
Number;
According to it is all it is described justice member occur total degrees, and with the target word calculated value and, determine weight calculation
Total value;
The ratio for calculating separately each justice the member number occurred in institute's semantic information and the total value, determines each justice
The operation weight of member and the operation weight of target word.
Optionally, operation weight is determined according to the target word calculated value of institute's semantic information and setting, including:
It counts in institute's semantic information, goes out comprising the corresponding institute's ariyoshi member of the most semanteme of adopted first quantity and each adopted member
Existing number;
According to the total degrees that all justice members occur under comprising the most semanteme of adopted first quantity, and with the target
The sum of word calculated value determines the total value of weight calculation;
Calculate separately it is each justice member occur number and the total value ratio, determine it is each justice member operation weight and
The operation weight of target word.
Optionally, the target word calculated value judges degree according to the difference of the target word, is equal to 1 or is equal to described
The summation of institute's ariyoshi member frequency of occurrence in semantic information.
Optionally, according to the operation weight, it is weighted summation respectively per one-dimensional value in the original term vector
Operation generates the corresponding new term vector of the target word, including, extracted in the term vector model justice corresponding word of member to
Amount, and according to the following formula and the operation weight, it is weighted summation operation respectively per one-dimensional value in the original term vector,
Generate the corresponding new term vector of the target word:
The value X of n-th dimension in new term vector0n=Xan×Wa+Xbn×Wb+Xcn×Wc+……+Xn×W;
In formula:XanThe value of the n-th dimension in term vector, W are corresponded to for justice member aaFor the operation weight of adopted member a;XbnIt is corresponding for justice member b
The value of n-th dimension, W in term vectorbFor the operation weight of adopted member b;XnThe value of the n-th dimension in term vector is corresponded to for target word, W is target
The operation weight of word.
Second aspect, the application provide a kind of term vector generation method for supporting polarity differentiation and ambiguity, including:
The term vector model and resource file under current business scene are obtained, and obtains the sentence text comprising target word
This, the resource file includes the corresponding adopted member of multiple semantemes under current business scene;
The corresponding original term vector of the target word is determined according to the term vector model;The target word is extracted described
Corresponding semantic information in resource file, institute's semantic information include time of the justice member and each justice member appearance under multiple semantemes
Number;
Determine that the word set of closing on of the target word, the neighbouring word set are in the statement text in the statement text
Neighbouring multiple set of words with the target word;
According to the neighbouring word set and institute's semantic information, determining and most pass of the target word under current business scene
Corresponding each first frequency of occurrence of justice under connection semanteme and the most association are semantic;
Each justice member is determined according to the target word calculated value of the first frequency of occurrence of justice each under the most association semanteme and setting
Operation weight and target word operation weight;
According to the operation weight, it is weighted summation operation respectively per one-dimensional value in the original term vector, it is raw
At the corresponding new term vector of the target word.
Optionally, according to the neighbouring word set and institute's semantic information, the determining and target word is in current business scene
Under most association it is semantic, including:
It sets window value and the neighbouring word set of the target word, institute is extracted in the statement text according to the window value
Stating neighbouring word set includes the preceding cliction before the target word, and the rear cliction after the target word;
According to the original term vector, calculate separately in the neighbouring word set it is each before cliction, it is each after cliction and each
Word distance between the justice member;
The distance average under each semanteme is determined according to the word distance;
The distance average under each semanteme is compared, determines that the corresponding semanteme of minimum value is described in the distance average
The most association of target word is semantic.
Optionally, according to the determining fortune of target word calculated value of the most association each justice member frequency of occurrence and setting under semantic
Weight is calculated, including:
According to institute's semantic information, the number of justice member and each justice member appearance under the statistics most association semanteme;
According to the total degrees that all justice members occur in the case where the most association is semantic, and with the target word calculated value
Sum, determine the total value of weight calculation;
The ratio for calculating separately each justice the member number occurred and the total value determines the fortune of each justice member and target word
Calculate weight.
The third aspect, the application provide a kind of term vector generating means for supporting polarity differentiation and ambiguity, including:
Acquiring unit, for obtaining term vector model and resource file under current business scene, the resource bundle
Include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit, for determining the corresponding original term vector of target word according to the term vector model;
The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes the justice member under each semanteme
And the number that each justice member occurs;
Operation weight determining unit, for determining that operation is weighed according to the target word calculated value of institute's semantic information and setting
Weight;
New term vector generation unit, for dividing per one-dimensional value in the original term vector according to the operation weight
It is not weighted summation operation, generates the corresponding new term vector of the target word.
Fourth aspect, the application provide a kind of term vector generating means for supporting polarity differentiation and ambiguity, including:
Information acquisition unit, for obtaining term vector model and resource file under current business scene, and acquisition packet
Statement text containing target word, the resource file include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit, for according to the term vector model determine the corresponding prime word of the target word to
Amount;The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes under each semanteme
The number that adopted member and each justice member occur;
Word set determination unit is closed on, closes on word set, the neighbour for the target word determining in the statement text
Nearly word set is multiple set of words neighbouring with the target word in the statement text;
It is most associated with semantic determination unit, for according to the neighbouring word set and institute's semantic information, the determining and target
Corresponding each first frequency of occurrence of justice under most association semanteme and the most association of the word under current business scene are semantic;
Operation weight determining unit, for the target according to the most association each justice first frequency of occurrence and setting under semantic
Word calculated value determines the operation weight of each justice member and the operation weight of target word;
New term vector generation unit, for dividing per one-dimensional value in the original term vector according to the operation weight
It is not weighted summation operation, generates the corresponding new term vector of the target word.
From the above technical scheme, the application provide it is a kind of support polarity distinguish and ambiguity term vector generation method and
Device, in practical applications, the method is according to term vector established under current business scene and resource file, to target word
The value of each dimension is weighted in term vector, generates new term vector.Method is respectively by institute's ariyoshi member in resource file
Number is most associated under semanteme comprising the first number of justice under the most semanteme of adopted first quantity, and with the statement text comprising target word
The first number of justice, determine the operation weight of each dimension values in new term vector, thus according to operation weight target word and justice member word
It is weighted summation between vector, obtains new term vector.
Term vector generation method provided by the present application, can be according to the term vector model and resource file constructed, dynamic
Generating new term vector, new term vector generated can more accurately react the semantic features of practical business scene, and by
During ranking operation, operation weight is determined based on semantic information, thus can significantly improve antonym and
Influence of the polysemy situation to matching result solves the term vector of conventional method building with polysemy and antonym feelings
The problem of matching error is easy under condition.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below
Singly introduce, it should be apparent that, for those of ordinary skills, without creative efforts, also
Other drawings may be obtained according to these drawings without any creative labor.
Fig. 1 is a kind of term vector generation method flow diagram for supporting polarity differentiation and ambiguity;
Fig. 2 is the flow diagram that the embodiment of the present application one determines operation weight;
Fig. 3 is the flow diagram that the embodiment of the present application two determines operation weight;
Fig. 4 is the structural schematic diagram of term vector generating means in the embodiment of the present application;
Fig. 5 is another term vector generation method flow diagram for supporting polarity differentiation and ambiguity;
Fig. 6 is that the embodiment of the present application three determines the flow diagram for being most associated with semanteme;
Fig. 7 is the flow diagram that the embodiment of the present application three determines operation weight;
Fig. 8 is the structural schematic diagram of another term vector generating means in the embodiment of the present application.
Specific embodiment
Embodiment will be illustrated in detail below, the example is illustrated in the accompanying drawings.In the following description when referring to the accompanying drawings,
Unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Implement described in following embodiment
Mode does not represent all embodiments consistent with the application.It is only and be described in detail in claims, the application
The example of the consistent system and method for some aspects.
In technical solution provided by the present application, the current business scene refers to the business neck of specific business activity ownership
Domain, such as finance, financing, insurance, science and technology, network.For different business scenarios, the text information often used has very big
Difference, phrase semantic used in corresponding text information also have the characteristics that diversified.Therefore, in order to making computer exist
Under different business scenarios, text information used in business activity is understood, to realize that intelligent answer, text classification etc. are automatic
Treatment process is needed by digitizing to each vocabulary in text information, i.e., by way of term vector by word into
Row indicates.
In the application, the corresponding word of term vector is referred to as target word, i.e. target word is to be present in current business scene
Text information in, such as " apple " in " I wants to buy an Apple Computers ".During intelligent answer or text classification
The term vector used is by way of machine learning, is training language with the large amount of text information collected under business scenario
Material is trained, to obtain to react the vector of incidence relation between target word semanteme and target word and other words.
For term vector generally after dimensionality reduction, dimension will not be too high, related according to specific business scenario and the training corpus being collected into,
Such as common term vector dimension is 50 dimensions or 100 dimensions.
It should be noted that the semanteme being previously mentioned in the application refers to multiple physical meanings that a word is included, example
Such as " apple ", it is considered that include two semantemes, one is " plant ", another kind is " Apple Inc. ", and each semantic correspondence is more
A word shows, and each to be used to show semantic word be exactly a justice member, for example, under " plant " this semanteme, Ke Yitong
" fruit ", " tree " etc. are crossed to indicate.Therefore, in technical solution provided by the present application, resource file is to refer in particular to a kind of word collection
It closes.In resource file, comprising whole words under business scenario, and each word is according to the knot of " word-semanteme-justice member "
Structure is arranged, and is constituted a huge text file (txt), is often shown as such as flowering structure:
" 22 function word structures help 4 orientations shoot apparatus component ";
" 22 function words dynamic help 1 to finish ";
" be 41 be 1 there are 3 to show agreement indicate word 1 it is specific ";
……
" apple 35 carries the 3 tree fruit reproduction of specific brand computer 1 fruit of energy of pattern value "
……
With the first row data instance, " " it is exactly target word in the application, the first digit after target word represents mesh
The semantic quantity that mark word has, i.e., " " there are 2 semantemes.Target word " " after digital representation it is a kind of it is semantic under include
Adopted member quantity, i.e., " " first semanteme include 2 justice members, " function word " and " structure helps ";" " second semanteme include 4
A justice member, " orientation ", " shooting ", " apparatus " and " component " can carry out all semantic of target word through this structure
Describe, and can determine the justice member of different semantemes.
Based on resource file and term vector that the above content provides, the application, which provides, a kind of supports that polarity is distinguished and ambiguity
Term vector generation method, to improve target word in the influence with polysemy and antonym to term vector matching result, specifically
Including following embodiment.
Embodiment one
Referring to Fig. 1, polarity is supported to distinguish the term vector generation method flow diagram with ambiguity to be a kind of.The present embodiment
In, included the following steps according to the method that resource file generates term vector:
S101:The term vector model and resource file under current business scene are obtained, the resource file includes current industry
The corresponding adopted member of multiple semantemes under scene of being engaged in.
In the present embodiment, after being determined which kind of business scope text information to be treated belongs to, it is necessary first to obtain
Take the term vector model under current business scene, generally can by called in server or database the word having had built up to
Measure model.Here term vector model refers to the set being made of a large amount of term vector, that is, passes through training corpus and current business
The incidence relation between each word occurred in service profile in scene, obtained each word correspond to the collection of term vector composition
It closes.
Semanteme of the resource file due to covering whole words, data volume is very huge, for the ease of data
It calls, in technical solution provided by the present application, the resource file is the text of multiple justice member compositions under current business scene.
I.e. in this application, can only call corresponding with current business scene resource file, for those with current business scene without
The word of pass can not be considered in the generating process for carrying out term vector, to reduce data processing amount, it is raw to be easy to implement dynamic
At term vector.
Further, since the building mode of resource file has differences, there is also huge for the form of expression of resource file
Big difference, for example, some resource files are the text files being made of adopted first word and number, some resource files be then with
The database file that the form of field and value is constituted.It therefore, can be according to current industry during calling corresponding resource file
The specific content in resource file of business scene is screened, and the semanteme and justice of word under current business scene are determined for compliance with
Member.
For example, for the resource file of textual form, it can be before calling, according to the word in current business scene to money
Content in source file is retrieved, and is extracted in current business scene, the word that can be used, corresponding resource file
The data of structure, then by the data summarization of extraction, constitute and be used for current business scene, but size of population is smaller than entire resource file
New resources file.During subsequent further generation term vector, the new resources file of call establishment.
S102:The corresponding original term vector of target word is determined according to the term vector model;The target word is extracted in institute
Corresponding semantic information in resource file is stated, institute's semantic information includes justice member and each justice member appearance under multiple semantemes
Number.
In the present embodiment, the corresponding term vector of target word is determined according to term vector model after obtaining term vector model,
Since determining term vector is the term vector having had been built up, it is referred to as original term vector in this application.It determines original
While term vector, the present embodiment also extracts the corresponding semantic information of target word in resource file, for the ease of subsequent determining power
It is heavy, it include the number that justice member and adopted member occur in the semantic information of extraction.
For example, target word " arriving ", corresponding resource file structure are:
" reaching 1 to 61 function word, 2 function word amplitude, 2 function word goes to 1 arrival 1 careful ";
According to above-mentioned resource file structure, when extracting semantic information, the frequency of occurrence for obtaining adopted member and justice member is:" function
Can word " -3 times, " amplitude " -1 time, " reaching " -1 time, " going to " -1 time, " going to " -1 time, " arrival " -1 time, " carefulness " -1 time.
After term vector model has been determined and has been extracted semantic information, technical solution provided by the present application need according to word to
It measures model and semantic information determines the operation weight for generating term vector, i.e.,:
S103:Operation weight is determined according to institute's semantic information and the target word calculated value of setting.
It further, referring to fig. 2, can according to institute's semantic information and the target word calculated value of setting in the present embodiment
To determine operation weight in the following way:
S1031:According to institute's semantic information, count under the corresponding all semantemes of current goal word, justice member and each justice
The number that member occurs;
S1032:According to it is all it is described justice member occur total degrees, and with the target word calculated value and, determine weigh
The total value of re-computation;
S1033:The ratio for calculating separately each justice the member number occurred in institute's semantic information and the total value, determines
The operation weight of each justice member and the operation weight of target word.
In the present embodiment, determine that operation weight is needed to the justice member and justice member under determining semantic information, under all semantemes
The number of appearance is counted, and is determined the total degree that institute's ariyoshi member occurs, is determined in calculating process further according to target word calculated value
Weight total value, further according to it is each justice member frequency of occurrence and weight total value ratio determine, justice member operation weight.
In technical solution provided by the present application, the target word calculated value judges journey according to the difference of the target word
Degree, the summation equal to 1 or equal to institute's ariyoshi member frequency of occurrence in institute's semantic information.That is, if during actual match
The word that the term vector constructed is matched to semantic similarity is more, it is desirable that the term vector of generation will have area in the judgement of target word
Other property, therefore occur in such a case, it is possible to which the target word calculated value is equal to institute's ariyoshi member in semantic information
The summation of number;If not focusing on the distinctiveness of target word during actual match, then the target word meter is taken in operation
Calculation value is 1.
Such as:Target word is that " arriving " counts after obtaining resource file and corresponding semantic information according to semantic information
Under the corresponding all semantemes of target word " arriving ", the number of justice member and each justice member appearance, i.e.,:
" function word " -3 times, " amplitude " -1 time, " reaching " -1 time, " going to " -1 time, " going to " -1 time, " arrival " -1 time,
" carefulness " -1 time;
It calculates under target word again, the total degree that all justice member occurs, i.e.,:
The total degree that adopted member occurs is:3+1+1+1+1+1+1=8.If taking target word calculated value for the appearance of institute's ariyoshi member
The summation of number, as 8, then calculating weight total value is:8+8=16.
According to the frequency of occurrence of the weight total value of calculating and each justice member, weight is calculated, i.e.,:
The weight of adopted member corresponding word is respectively:" function word " weight Wa=3/16, " amplitude " weight are Wb=1/16,
" reaching " weight is Wc=1/16 ... ...;
The weight W=8/16 of target word is calculated simultaneously.
S104:According to the operation weight, transported to summation is weighted in the original term vector respectively per one-dimensional value
It calculates, generates the corresponding new term vector of the target word.
In the present embodiment, after obtaining operation weight, can according to the words of obtained all weighted values and each word to
Magnitude determines the corresponding new term vector of target word, i.e.,:
The value X of first dimension of new term vector01=Xa1×Wa+Xb1×Wb+Xc1×Wc+……+X1×W;
The two-dimensional value X of new term vector02=Xa2×Wa+Xb2×Wb+Xc2×Wc+……+X2×W;
……
The value X of n-th dimension of new term vector0n=Xan×Wa+Xbn×Wb+Xcn×Wc+……+Xn×W;
Illustratively, if in current business scene target word " arriving " term vector be [0.563,0.727, -0.165,
0.328,0.265,……];
The term vector of adopted member corresponding word " function word " is [0.423,0.187,0.598,0.856, -0.796 ...];
The term vector of adopted member corresponding word " amplitude " is [0.598,0.326, -0.224,0.852,0.367 ... ...];
The term vector for successively determining institute's ariyoshi member, then can calculate the numerical value in new term vector in all dimensions, i.e.,:
The value X of first dimension of new term vector01=0.423 × 3/16+0.598 × 1/16+ ...+0.563 × 8/16;
The two-dimensional value X of new term vector02=0.187 × 3/16+0.326 × 1/16+ ...+0.727 × 8/16;
From the above technical scheme, term vector generation method in the present embodiment, is asked being weighted to original term vector
In calculating process, according in resource file, the numbers that all semantic and justice member of target word occurs, to change prime word
The distance between justice member and target word, can obtain the term vector for more meeting current business scene, convenient for answering subsequent in vector
It is accurately semantic with middle matching.
Embodiment two
The difference between this embodiment and the first embodiment lies in as shown in figure 3, according to institute's semantic information and the target of setting
Word calculated value determines in the step of operation weight, including:
S201:It counts in institute's semantic information, comprising the corresponding institute's ariyoshi member of the most semanteme of the first quantity of justice and each
The number that adopted member occurs;
S202:According to the total degrees that all justice members occur under comprising the most semanteme of adopted first quantity, and with institute
The sum for stating target word calculated value determines the total value of weight calculation;
S203:The ratio for calculating separately each justice the member number occurred and the total value determines the operation power of each justice member
The operation weight of weight and target word.
In the present embodiment, determine that operation weight needs to include the most semanteme of adopted first quantity under determining semantic information
It is chosen, that is, can be identified and be extracted by comparing the corresponding numerical value of semanteme each in resource file, one by one whole languages
The size of this lower numerical value of justice, to select the semanteme most comprising the first quantity of justice.It is corresponding again under the semanteme most to quantity
Justice member and justice member occur number counted, determine institute's ariyoshi member occur total degree.It is determined according to target word calculated value
The ratio of weight total value in calculating process, frequency of occurrence and weight total value further according to each justice member is determining, the operation of justice member
Weight.
It should be noted that when the largest number of semantic by the corresponding justice member of resource file acquisition target word, if target
In word corresponding resource file, the first number of justice is identical in semanteme, or when the first most semantemes of number of justice have multiple, selects in multiple semantemes
It selects first or randomly selects one.It is possible to further which according to adopted member, the frequency occurred in current business is come true
Which semantic progress weight calculation fixed selection chooses.
For example, target word " arriving ", corresponding resource file structure are:
" reaching 1 to 61 function word, 2 function word amplitude, 2 function word goes to 1 arrival 1 careful ";
Wherein, it is respectively " 2 function word amplitude " and " 2 function words are reached " comprising the most semanteme of adopted first quantity, selects the
One semanteme is calculated, it is determined that the frequency of occurrence of justice member is in corresponding semanteme:
" function word " -1 time, " amplitude " -1 time;
Correspondingly, calculating the total degree that all justice members occur under comprising the most semanteme of adopted first quantity is 1+1
=2, if taking target word calculated value for the summation of institute's ariyoshi member frequency of occurrence, as 8, then calculating weight total value is:8+2=
10。
According to the frequency of occurrence of the weight total value of calculating and each justice member, weight is calculated, i.e.,:
The weight of adopted member corresponding word is respectively:" function word " weight Wa=1/10, " amplitude " weight is Wb=1/10, mesh
The weight for marking word " arriving " is W=8/10.Further according to above-mentioned calculation formula, summation is weighted to original term vector, obtains neologisms
In vector, per one-dimensional corresponding value.
From the above technical scheme, the embodiment of the present application two in resource file by determining the most language of the first quantity of justice
Justice determines the weight of weighted sum, to obtain new term vector further according to the semanteme most comprising adopted first quantity one by one.Relatively
In embodiment one, the present embodiment can reduce calculating and the extracted amount of data while guaranteeing to obtain new term vector, convenient for fast
Speed obtains new term vector, realizes the dynamic generation of term vector.
Based on both examples above, the application provides a kind of term vector generating means for supporting polarity differentiation and ambiguity,
As shown in figure 4, shown term vector generating means include:
Acquiring unit 1, for obtaining term vector model and resource file under current business scene, the resource bundle
Include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit 2, for determining the corresponding original term vector of target word according to the term vector model;
The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes the justice member under each semanteme
And the number that each justice member occurs;
Operation weight determining unit 3, for determining operation according to institute's semantic information and the target word calculated value of setting
Weight;
New term vector generation unit 4, for dividing per one-dimensional value in the original term vector according to the operation weight
It is not weighted summation operation, generates the corresponding new term vector of the target word.
Above-mentioned two embodiment is suitable for the case where generating new term vector according to single word, can improve antonym to rear
The influence of continuous matching result.In order to obtain the new term vector that can react practical semanteme, in practical applications, can also generate
During term vector, using the practical application scene of target word as reference.
Embodiment three
Referring to Fig. 5, in the present embodiment, the method for generating term vector includes the following steps:
S301:The term vector model and resource file under current business scene are obtained, and obtains the language comprising target word
Sentence text, the resource file include the corresponding adopted member of multiple semantemes under current business scene;
S302:The corresponding original term vector of the target word is determined according to the term vector model;Extract the target word
The corresponding semantic information in the resource file, institute's semantic information include that adopted first and each adopted member under multiple semantemes goes out
Existing number;
S303:Determine that the word set of closing on of the target word, the neighbouring word set are the sentence in the statement text
The multiple set of words neighbouring with the target word in text;
S304:According to the neighbouring word set and institute's semantic information, the determining and target word is under current business scene
Most association is semantic and the most association it is semantic under corresponding each first frequency of occurrence of justice;
S305:It is determined according to the target word calculated value of the first frequency of occurrence of justice each under the most association semanteme and setting each
The operation weight of adopted member and the operation weight of target word;
S306:According to the operation weight, transported to summation is weighted in the original term vector respectively per one-dimensional value
It calculates, generates the corresponding new term vector of the target word.
By above step it is found that the difference of the present embodiment and above-described embodiment is, in the case where obtaining current business scene
While term vector model and resource file, the statement text comprising target word is also obtained.Statement text can be from current
Service profile used in business scenario makes pauses in reading unpunctuated ancient writings to service profile by punctuation mark or paragraph format, obtain
Multiple statement texts, to extract that comprising target word in multiple statement texts.
In the present embodiment, after obtaining the statement text comprising target word, the mesh can be determined in statement text
Mark word closes on word set, so that according to neighbouring word set and institute's semantic information, determination is with the target word in current business scene
Under most association is semantic and the most association it is semantic under corresponding each first frequency of occurrence of justice.Word set is closed on to refer in sentence
Before and after the target word of text, the set of the word composition in setting range.That is, obtaining the statement text comprising target word
Afterwards, statement text is split using Word Intelligent Segmentation tool, obtains the word segmentation result of multiple words composition, and from word segmentation result
The middle satisfactory word composition of selection closes on word set.
Further, determining to exist with the target word as shown in fig. 6, according to the neighbouring word set and institute's semantic information
Most association under current business scene is semantic, including:
S3031:It sets window value and the neighbouring of the target word is extracted in the statement text according to the window value
Word set, the neighbouring word set include the preceding cliction before the target word, and after the target word hereinafter
Word;
S3032:According to the original term vector, each preceding cliction in the neighbouring word set, each rear cliction are calculated separately
With the word distance between each justice member;
S3033:The distance average under each semanteme is determined according to the word distance;
S3034:The distance average under each semanteme is compared, determines the corresponding semanteme of minimum value in the distance average
It is semantic for the most association of the target word.
In the present embodiment, after segmenting to the statement text containing target word, a window value is set.The window of setting
Mouth value, can also can be set automatically taking human as being set according to the length to statement text.Due in most cases
Under, there are semantic associations between the word most probable and target word of target word adjacent position, therefore in practical applications, window value
It is often smaller.For example, can choose 1 or 2.When window value is 1, in the word segmentation result of statement text, target is extracted
The previous word of word is preceding cliction, and the latter word for extracting target word is rear cliction;When window value is 2, in statement text
In word segmentation result, the first two words for extracting target word are preceding cliction, latter two word for extracting target word is rear cliction.It needs to illustrate
, when beginning of the sentence or sentence tail of the target word in statement text, then it is corresponding only extract after cliction and preceding cliction, as judgement
Foundation.
After preceding cliction and rear cliction has been determined, technical solution provided by the present application can be according to adopted member and preceding cliction
With the term vector of rear cliction, the incidence relation of each justice member and preceding cliction, rear cliction is determined.Determining foundation is between term vector
Distance is referred to as word distance for the ease of distinguishing in the present embodiment.The distance between term vector is smaller, then illustrates two words
Between relevance it is higher, in practical deterministic process, can be carried out by the calculation of Euclidean distance or COS distance true
It is fixed.
First distinguish in the present embodiment to obtain the incidence relation between preceding cliction, rear cliction and each semanteme of target word
Each justice first the distance between term vector and preceding cliction are judged, then to it is each it is semantic under, multiple justice members and preceding cliction
Distance is averaged, and same mode calculates the distance averages of multiple justice members under rear cliction and each semanteme again, finally
The average value obtained twice is calculated again and is averaged, is so successively calculated, the distance average under each semanteme is obtained.
For example, target word is " apple ", statement text is " I wants to buy an Apple Computers ";
Statement text is segmented, obtaining word segmentation result is " I buys/mono-/apple/computer at/thinking/", if setting window
Mouth value is 1, then preceding cliction is "one", and rear cliction is " computer ";
In resource file, determine that the resource file structure of target word " apple " is:
" apple 35 carries the 3 tree fruit reproduction of specific brand computer 1 fruit of energy of pattern value ";
As it can be seen that target word " apple " mainly includes 3 semantemes, i.e. " 5 carry the specific brand computer energy of pattern value ", " 1 water
Fruit ", " 3 tree fruit reproduction ".
Successively apart from calculating, i.e., justice member in semanteme is carried out with preceding cliction:Calculate " carrying " corresponding term vector and " one
The distance between it is a " corresponding term vector;It calculates the distance between " pattern value " corresponding term vector and " primary " ..., is counting
It lets it pass after the distance between " energy " and " primary ", averages to calculated distance.
Successively apart from calculating, i.e., justice member in semanteme is carried out with rear cliction again:Calculate " carrying " and " computer ", " pattern value "
The corresponding distance average of rear cliction is obtained with " computer " ....
It is last that mean value calculation is carried out according to the distance average calculated twice again, obtain " the 5 carrying specific brands of pattern value
Distance average under the corresponding semanteme of computer energy ".The distance average under each semanteme is finally compared, determines the range averaging
The corresponding semantic most association for the target word of minimum value is semantic in value.
Further, after being determined that most association is semantic, as shown in fig. 7, the application is according to every under the most association semanteme
A justice member frequency of occurrence and the target word calculated value of setting determine operation weight, including:
S3051:According to institute's semantic information, what justice member and each justice member under the statistics most association is semantic occurred
Number;
S3052:According to the total degrees that all justice members occur in the case where the most association is semantic, and with the target word
The sum of calculated value determines the total value of weight calculation;
S3053:The ratio for calculating separately each justice the member number occurred and the total value determines each justice member and target
The operation weight of word.
By above step it is found that the present embodiment is essentially identical with above-described embodiment the step of determining operation weight, difference
It is to count the number that the semantic lower justice member of most association occurs according to semantic information, occurs in most association semanteme further according to adopted member
Number and the target word calculated value of setting determine weight total value, so that determining the operation power of each justice member under most association semanteme
Weight.Finally, the present embodiment calculates neologisms according to original term vector, the term vector of justice member and the operation weight being calculated
The step of vector, calculating, is same as the previously described embodiments, and details are not described herein again.
Further, since the present embodiment is when generating new term vector, it can determine that target word is being worked as according to word distance
True semanteme under preceding business scenario, therefore the present embodiment can also directly judge the semanteme of target word, specially:
According to the distance between preceding cliction in neighbouring word set and multiple semantic lower adopted members Ai(x), and rear cliction with it is multiple
The distance between the lower adopted member of semanteme Bi(y), the corresponding distance value of each semanteme is determined.Again from the corresponding distance value of each semanteme
In, determine that apart from the smallest semanteme be the true semanteme of target word under current business scene.
Wherein, distance value includes COS distance value cos θ and Euclidean distance d, when the distance value of calculating is COS distance value
When:
When the distance value of calculating is Euclidean distance:
That is, in the present embodiment, it can be by determining that the distance between target word and semantic corresponding adopted member determine target
The real meaning of word, to be directly called in the matching process.
For example, statement text " I wants to buy an Apple Computers ", wherein the distance between target word " apple " and each semanteme
Respectively:
Semantic 1 distance:0.52552, corresponding justice member " carrying the specific brand computer energy of pattern value ";
Semantic 2 distances:0.6278, corresponding justice is first " fruit ";
Semantic 3 distances:0.64891, corresponding justice member " tree fruit reproduction ";
As it can be seen that target word is nearest at a distance from semanteme 1, therefore its true semanteme is determined as " carrying the specific brand of pattern value
The corresponding semanteme of computer energy ".
Based on embodiment three, the application also provides a kind of term vector generating means for supporting polarity differentiation and ambiguity, such as Fig. 8
Shown, shown device includes:
Information acquisition unit 1, for obtaining term vector model and resource file under current business scene, and acquisition packet
Statement text containing target word, the resource file include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit 2, for determining the corresponding prime word of the target word according to the term vector model
Vector;The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes under each semanteme
Justice member and it is each justice member occur number;
Word set determination unit 5 is closed on, closes on word set, the neighbour for the target word determining in the statement text
Nearly word set is multiple set of words neighbouring with the target word in the statement text;
It is most associated with semantic determination unit 6, for according to the neighbouring word set and institute's semantic information, the determining and target
Corresponding each first frequency of occurrence of justice under most association semanteme and the most association of the word under current business scene are semantic;
Operation weight determining unit 3, for the mesh according to the most association each justice first frequency of occurrence and setting under semantic
Mark word calculated value determines the operation weight of each justice member and the operation weight of target word;
New term vector generation unit 4, for dividing per one-dimensional value in the original term vector according to the operation weight
It is not weighted summation operation, generates the corresponding new term vector of the target word.Information acquisition unit 1, for obtaining current industry
Term vector model and resource file under scene of being engaged in, and the statement text comprising target word is obtained, the resource file is to work as
The text of multiple semantic corresponding justice member compositions under preceding business scenario;
From the above technical scheme, the application provide it is a kind of support polarity distinguish and ambiguity term vector generation method and
Device, in practical applications, the method is according to term vector established under current business scene and resource file, to target word
The value of each dimension is weighted in term vector, generates new term vector.Method is respectively by institute's ariyoshi member in resource file
Number is most associated under semanteme comprising the first number of justice under the most semanteme of adopted first quantity, and with the statement text comprising target word
The first number of justice, determine the operation weight of each dimension values in new term vector, thus according to operation weight target word and justice member word
It is weighted summation between vector, obtains new term vector.
Term vector generation method provided by the present application, can be according to the term vector model and resource file constructed, dynamic
Generating new term vector, new term vector generated can more accurately react the semantic features of practical business scene, and by
During ranking operation, operation weight is determined based on semantic information, thus can significantly improve antonym and
Influence of the polysemy situation to matching result solves the term vector of conventional method building with polysemy and antonym feelings
The problem of matching error is easy under condition.
Similar portion cross-reference between embodiment provided by the present application, specific embodiment provided above is only
It is several examples under the total design of the application, does not constitute the restriction of the application protection scope.For those skilled in the art
For member, any other embodiment expanded without creative efforts according to application scheme all belongs to
In the protection scope of the application.
Claims (10)
1. a kind of support polarity to distinguish the term vector generation method with ambiguity, which is characterized in that including:
The term vector model and resource file under current business scene are obtained, the resource file includes more under current business scene
The corresponding adopted member of a semanteme;
The corresponding original term vector of target word is determined according to the term vector model;The target word is extracted in the resource file
In corresponding semantic information, institute's semantic information includes the number of multiple semantic lower justice members and each justice member appearance;
Operation weight is determined according to institute's semantic information and the target word calculated value of setting;
Institute is generated to summation operation is weighted in the original term vector respectively per one-dimensional value according to the operation weight
State the corresponding new term vector of target word.
2. term vector generation method according to claim 1, which is characterized in that according to institute's semantic information and setting
Target word calculated value determines operation weight, including:
According to institute's semantic information, count under the corresponding all semantemes of current goal word, time of justice member and each justice member appearance
Number;
According to it is all it is described justice member occur total degrees, and with the target word calculated value and, determine the total of weight calculation
Value;
The ratio for calculating separately each justice the member number occurred in institute's semantic information and the total value determines each justice member
The operation weight of operation weight and target word.
3. term vector generation method according to claim 1, which is characterized in that according to institute's semantic information and setting
Target word calculated value determines operation weight, including:
It counts in institute's semantic information, occurs comprising the corresponding institute's ariyoshi member of the most semanteme of adopted first quantity and each justice member
Number;
According to the total degrees that all justice members occur under comprising the most semanteme of adopted first quantity, and with the target word meter
The sum of calculation value determines the total value of weight calculation;
The ratio for calculating separately each justice the member number occurred and the total value determines the operation weight and target of each justice member
The operation weight of word.
4. term vector generation method according to claim 1 to 3, which is characterized in that the target word calculated value
Judge degree according to the difference of the target word, equal to 1 or equal in institute's semantic information institute's ariyoshi member frequency of occurrence it is total
With.
5. term vector generation method according to claim 1, which is characterized in that according to the operation weight, to the original
It is weighted summation operation respectively per one-dimensional value in beginning term vector, generates the corresponding new term vector of the target word, including,
The corresponding term vector of justice member is extracted in the term vector model, and according to the following formula and the operation weight, to the prime word to
It is weighted summation operation respectively per one-dimensional value in amount, generates the corresponding new term vector of the target word:
The value X of n-th dimension in new term vector0n=Xan×Wa+Xbn×Wb+Xcn×Wc+……+Xn×W;
In formula:XanThe value of the n-th dimension in term vector, W are corresponded to for justice member aaFor the operation weight of adopted member a;XbnFor adopted member b equivalent to
The value of n-th dimension, W in amountbFor the operation weight of adopted member b;XnThe value of the n-th dimension in term vector is corresponded to for target word, W is target word
Operation weight.
6. a kind of support polarity to distinguish the term vector generation method with ambiguity, which is characterized in that including:
The term vector model and resource file under current business scene are obtained, and obtains the statement text comprising target word, institute
Stating resource file includes the corresponding adopted member of multiple semantemes under current business scene;
The corresponding original term vector of the target word is determined according to the term vector model;The target word is extracted in the resource
Corresponding semantic information in file, institute's semantic information include the number of the justice member and each justice member appearance under multiple semantemes;
Determine the word set of closing on of the target word in the statement text, the neighbouring word set in the statement text with institute
State the neighbouring multiple set of words of target word;
According to the neighbouring word set and institute's semantic information, determines and be most associated with language under current business scene with the target word
Corresponding each first frequency of occurrence of justice under justice and the most association are semantic;
The fortune of each justice member is determined according to the target word calculated value of the first frequency of occurrence of justice each under the most association semanteme and setting
Calculate the operation weight of weight and target word;
Institute is generated to summation operation is weighted in the original term vector respectively per one-dimensional value according to the operation weight
State the corresponding new term vector of target word.
7. term vector generation method according to claim 6, which is characterized in that according to the neighbouring word set and the semanteme
Information, it is determining to be most associated with semanteme under current business scene with the target word, including:
It sets window value and the neighbouring word set of the target word, the neighbour is extracted in the statement text according to the window value
Nearly word set includes the preceding cliction before the target word, and the rear cliction after the target word;
According to the original term vector, calculate separately in the neighbouring word set it is each before cliction, it is each after cliction and each described
Word distance between adopted member;
The distance average under each semanteme is determined according to the word distance;
The distance average under each semanteme is compared, determines that minimum value is corresponding semantic for the target in the distance average
The most association of word is semantic.
8. term vector generation method according to claim 6, which is characterized in that according to each justice under the most association semanteme
First frequency of occurrence and the target word calculated value of setting determine operation weight, including:
According to institute's semantic information, the number of justice member and each justice member appearance under the statistics most association semanteme;
According to the total degrees that all justice members occur in the case where the most association is semantic, and with the target word calculated value
With determine the total value of weight calculation;
The ratio for calculating separately each justice the member number occurred and the total value determines the operation power of each justice member and target word
Weight.
9. a kind of support polarity to distinguish the term vector generating means with ambiguity, which is characterized in that including:
Acquiring unit, for obtaining term vector model and resource file under current business scene, the resource file includes working as
The corresponding adopted member of multiple semantemes under preceding business scenario;
Prime word vector determination unit, for determining the corresponding original term vector of target word according to the term vector model;It extracts
The target word corresponding semantic information in the resource file, institute's semantic information include each semantic lower justice it is first and
The number that each justice member occurs;
Operation weight determining unit, for determining operation weight according to institute's semantic information and the target word calculated value of setting;
New term vector generation unit, for according to the operation weight, in the original term vector per one-dimensional value respectively into
Row sum operation with coefficient generates the corresponding new term vector of the target word.
10. a kind of support polarity to distinguish the term vector generating means with ambiguity, which is characterized in that including:
Information acquisition unit, for obtaining term vector model and resource file under current business scene, and obtaining includes mesh
The statement text of word is marked, the resource file includes the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit, for determining the corresponding original term vector of the target word according to the term vector model;
The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes the justice member under each semanteme
And the number that each justice member occurs;
Word set determination unit is closed on, closes on word set, the neighbouring word for the target word determining in the statement text
Collection is multiple set of words neighbouring with the target word in the statement text;
It is most associated with semantic determination unit, for according to the neighbouring word set and institute's semantic information, the determining and target word to exist
Corresponding each first frequency of occurrence of justice under most association semanteme and the most association semanteme under current business scene;
Operation weight determining unit, based on the target word according to the first frequency of occurrence of justice each under the most association semanteme and setting
Calculation value determines the operation weight of each justice member and the operation weight of target word;
New term vector generation unit, for according to the operation weight, in the original term vector per one-dimensional value respectively into
Row sum operation with coefficient generates the corresponding new term vector of the target word.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810557309.9A CN108829669A (en) | 2018-06-01 | 2018-06-01 | Support the term vector generation method and device of polarity differentiation and ambiguity |
CN201811498188.1A CN109614617B (en) | 2018-06-01 | 2018-12-07 | Word vector generation method and device supporting polarity differentiation and polysemous |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810557309.9A CN108829669A (en) | 2018-06-01 | 2018-06-01 | Support the term vector generation method and device of polarity differentiation and ambiguity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108829669A true CN108829669A (en) | 2018-11-16 |
Family
ID=64145816
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810557309.9A Pending CN108829669A (en) | 2018-06-01 | 2018-06-01 | Support the term vector generation method and device of polarity differentiation and ambiguity |
CN201811498188.1A Active CN109614617B (en) | 2018-06-01 | 2018-12-07 | Word vector generation method and device supporting polarity differentiation and polysemous |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811498188.1A Active CN109614617B (en) | 2018-06-01 | 2018-12-07 | Word vector generation method and device supporting polarity differentiation and polysemous |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN108829669A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021111420A (en) * | 2020-01-15 | 2021-08-02 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Method and apparatus for processing semantic description of text entity, and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104375989A (en) * | 2014-12-01 | 2015-02-25 | 国家电网公司 | Natural language text keyword association network construction system |
CN104391963A (en) * | 2014-12-01 | 2015-03-04 | 北京中科创益科技有限公司 | Method for constructing correlation networks of keywords of natural language texts |
CN106528588A (en) * | 2016-09-14 | 2017-03-22 | 厦门幻世网络科技有限公司 | Method and apparatus for matching resources for text information |
CN107092596B (en) * | 2017-04-24 | 2020-08-04 | 重庆邮电大学 | Text emotion analysis method based on attention CNNs and CCR |
-
2018
- 2018-06-01 CN CN201810557309.9A patent/CN108829669A/en active Pending
- 2018-12-07 CN CN201811498188.1A patent/CN109614617B/en active Active
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021111420A (en) * | 2020-01-15 | 2021-08-02 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Method and apparatus for processing semantic description of text entity, and device |
JP7113097B2 (en) | 2020-01-15 | 2022-08-04 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | Sense description processing method, device and equipment for text entities |
US11669690B2 (en) | 2020-01-15 | 2023-06-06 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for processing sematic description of text entity, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109614617A (en) | 2019-04-12 |
CN109614617B (en) | 2022-12-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647309B (en) | Chat content auditing method and system based on sensitive words | |
CN110069784A (en) | A kind of voice quality inspection methods of marking, device, terminal and can storage medium | |
CN105589844B (en) | It is a kind of to be used to take turns the method for lacking semantic supplement in question answering system more | |
CN103729474B (en) | Method and system for recognizing forum user vest account | |
CN104111933B (en) | Obtain business object label, set up the method and device of training pattern | |
CN110362819B (en) | Text emotion analysis method based on convolutional neural network | |
CN104216876B (en) | Information text filter method and system | |
WO2021073116A1 (en) | Method and apparatus for generating legal document, device and storage medium | |
CN106202032A (en) | A kind of sentiment analysis method towards microblogging short text and system thereof | |
CN108345587A (en) | A kind of the authenticity detection method and system of comment | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN109446333A (en) | A kind of method that realizing Chinese Text Categorization and relevant device | |
CN111694937A (en) | Interviewing method and device based on artificial intelligence, computer equipment and storage medium | |
CN108090099B (en) | Text processing method and device | |
CN110232923A (en) | A kind of phonetic control command generation method, device and electronic equipment | |
JP2006350656A (en) | Time-series document grouping method, device, and program, and recording medium storing program | |
CN108269122A (en) | The similarity treating method and apparatus of advertisement | |
CN110955750A (en) | Combined identification method and device for comment area and emotion polarity, and electronic equipment | |
CN103678318B (en) | Multi-word unit extraction method and equipment and artificial neural network training method and equipment | |
CN107818173B (en) | Vector space model-based Chinese false comment filtering method | |
Mestry et al. | Automation in social networking comments with the help of robust fasttext and cnn | |
CN110164417A (en) | A kind of languages vector obtains, languages know method for distinguishing and relevant apparatus | |
CN107341142B (en) | Enterprise relation calculation method and system based on keyword extraction and analysis | |
CN109446393A (en) | A kind of Web Community's topic classification method and device | |
CN107885717A (en) | A kind of keyword extracting method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181116 |