CN108829669A - Support the term vector generation method and device of polarity differentiation and ambiguity - Google Patents

Support the term vector generation method and device of polarity differentiation and ambiguity Download PDF

Info

Publication number
CN108829669A
CN108829669A CN201810557309.9A CN201810557309A CN108829669A CN 108829669 A CN108829669 A CN 108829669A CN 201810557309 A CN201810557309 A CN 201810557309A CN 108829669 A CN108829669 A CN 108829669A
Authority
CN
China
Prior art keywords
term vector
target word
word
justice
institute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810557309.9A
Other languages
Chinese (zh)
Inventor
杨凯程
李健铨
蒋宏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xuan Yi Science And Technology Co Ltd
Original Assignee
Beijing Xuan Yi Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xuan Yi Science And Technology Co Ltd filed Critical Beijing Xuan Yi Science And Technology Co Ltd
Priority to CN201810557309.9A priority Critical patent/CN108829669A/en
Publication of CN108829669A publication Critical patent/CN108829669A/en
Priority to CN201811498188.1A priority patent/CN109614617B/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The application provides a kind of term vector generation method and device for supporting polarity differentiation and ambiguity, and the method is weighted the value of each dimension in target word term vector, generates new term vector according to term vector established under current business scene and resource file.The application respectively by institute's ariyoshi member number in resource file, include the first number of justice under the most semanteme of the first quantity of justice, and the first number of justice under most association semanteme, determine the operation weight of each dimension values in new term vector, to which the term vector according to operation weight in target word and justice member is weighted summation, obtains new term vector and determine true semantic.Method being capable of the new term vector of dynamic generation, new term vector more accurately reacts practical semantic features, and operation weight is determined based on semantic information, the influence of antonym and polysemy situation to matching result can be significantly improved, solves the problems, such as that the term vector of conventional method building is easy matching error in polysemy and antonym.

Description

Support the term vector generation method and device of polarity differentiation and ambiguity
Technical field
This application involves machine learning techniques field more particularly to a kind of term vector generations for supporting polarity differentiation and ambiguity Method and device.
Background technique
Term vector is to allow a kind of word representation of computer understanding human language by language digit.Term vector A word can be indicated by the vector of certain dimension, and discloses the incidence relation between the word and other words, such as [0.792, -0.177, -0.107,0.109, -0.542 ... ...], term vector generally by CBOW in term vector training pattern, The methods of Skip, GloVe are trained to be obtained, the occurrence of each dimension in term vector, according to the training of the corpus of collection and corpus Mode determines.Term vector can be applied during intelligent answer or text classification, by carrying out to text information and term vector Matching, determines the meaning of text information.
In actual text information process, single word can correspond to a variety of semantemes, in order to which computer can identify, every kind Semanteme is indicated by multiple adopted members.Adopted member is that one kind is most basic, is not easy to divide the semantic primitive of meaning again.For example, word " apple " at least means that two kinds of semantemes, i.e. Apple Inc. and fruit, wherein multiple justice members are corresponded under Apple Inc.'s semanteme, Such as:Specific brand, computer etc., and corresponding adopted member is under fruit is semantic:Tree, fruit.In actual use, collect each word with And the corresponding lower semanteme and justice member of word may be constructed the resource file that can be called directly, such as by OEC, Chinese thesaurus, The resource file of the offers such as HowNet.In the prior art, the term vector obtained by training method is the training corpus with collection The appearance situation of middle target word arranges the semanteme of word.It, can not in term vector when target word has polysemy Multiple semantemes of target word are fully demonstrated, it, can accurately not when so that carrying out intelligent answer or text classification using this term vector Match true semanteme of the target word in context of use.
In addition, during actual text information processing, it usually needs according to distance between the term vector of two words come Determine the semantic relevance between corresponding word, for example, the distance between word is determined by Euclidean distance or COS distance, It is generally acknowledged that the semantic similarity apart from two nearest words.But when being matched using term vector obtained by the above method, Really antonym semantically opposite comprising some semantemes apart from close word.For example, " raising " and " reduction " and " credit card volume The distance of degree " is all close, and when being matched by above-mentioned term vector model, the case where being likely to result in erroneous judgement will improve and believe It is matched to and is reduced in credit card amount with card amount, raising credit card amount.
Summary of the invention
This application provides a kind of term vector generation methods and device for supporting polarity differentiation and ambiguity, to solve tradition side The term vector of method building is easy the problem of matching error when having polysemy and antonym.
In a first aspect, the application provides a kind of term vector generation method for supporting polarity differentiation and ambiguity, including:
The term vector model and resource file under current business scene are obtained, the resource file includes current business scene Under the corresponding justice member of multiple semantemes;
The corresponding original term vector of target word is determined according to the term vector model;The target word is extracted in the resource Corresponding semantic information in file, institute's semantic information include the number of the justice member and each justice member appearance under multiple semantemes;
Operation weight is determined according to institute's semantic information and the target word calculated value of setting;
According to the operation weight, it is weighted summation operation respectively per one-dimensional value in the original term vector, it is raw At the corresponding new term vector of the target word.
Optionally, operation weight is determined according to the target word calculated value of institute's semantic information and setting, including:
It according to institute's semantic information, counts under the corresponding all semantemes of current goal word, justice member and each justice member appearance Number;
According to it is all it is described justice member occur total degrees, and with the target word calculated value and, determine weight calculation Total value;
The ratio for calculating separately each justice the member number occurred in institute's semantic information and the total value, determines each justice The operation weight of member and the operation weight of target word.
Optionally, operation weight is determined according to the target word calculated value of institute's semantic information and setting, including:
It counts in institute's semantic information, goes out comprising the corresponding institute's ariyoshi member of the most semanteme of adopted first quantity and each adopted member Existing number;
According to the total degrees that all justice members occur under comprising the most semanteme of adopted first quantity, and with the target The sum of word calculated value determines the total value of weight calculation;
Calculate separately it is each justice member occur number and the total value ratio, determine it is each justice member operation weight and The operation weight of target word.
Optionally, the target word calculated value judges degree according to the difference of the target word, is equal to 1 or is equal to described The summation of institute's ariyoshi member frequency of occurrence in semantic information.
Optionally, according to the operation weight, it is weighted summation respectively per one-dimensional value in the original term vector Operation generates the corresponding new term vector of the target word, including, extracted in the term vector model justice corresponding word of member to Amount, and according to the following formula and the operation weight, it is weighted summation operation respectively per one-dimensional value in the original term vector, Generate the corresponding new term vector of the target word:
The value X of n-th dimension in new term vector0n=Xan×Wa+Xbn×Wb+Xcn×Wc+……+Xn×W;
In formula:XanThe value of the n-th dimension in term vector, W are corresponded to for justice member aaFor the operation weight of adopted member a;XbnIt is corresponding for justice member b The value of n-th dimension, W in term vectorbFor the operation weight of adopted member b;XnThe value of the n-th dimension in term vector is corresponded to for target word, W is target The operation weight of word.
Second aspect, the application provide a kind of term vector generation method for supporting polarity differentiation and ambiguity, including:
The term vector model and resource file under current business scene are obtained, and obtains the sentence text comprising target word This, the resource file includes the corresponding adopted member of multiple semantemes under current business scene;
The corresponding original term vector of the target word is determined according to the term vector model;The target word is extracted described Corresponding semantic information in resource file, institute's semantic information include time of the justice member and each justice member appearance under multiple semantemes Number;
Determine that the word set of closing on of the target word, the neighbouring word set are in the statement text in the statement text Neighbouring multiple set of words with the target word;
According to the neighbouring word set and institute's semantic information, determining and most pass of the target word under current business scene Corresponding each first frequency of occurrence of justice under connection semanteme and the most association are semantic;
Each justice member is determined according to the target word calculated value of the first frequency of occurrence of justice each under the most association semanteme and setting Operation weight and target word operation weight;
According to the operation weight, it is weighted summation operation respectively per one-dimensional value in the original term vector, it is raw At the corresponding new term vector of the target word.
Optionally, according to the neighbouring word set and institute's semantic information, the determining and target word is in current business scene Under most association it is semantic, including:
It sets window value and the neighbouring word set of the target word, institute is extracted in the statement text according to the window value Stating neighbouring word set includes the preceding cliction before the target word, and the rear cliction after the target word;
According to the original term vector, calculate separately in the neighbouring word set it is each before cliction, it is each after cliction and each Word distance between the justice member;
The distance average under each semanteme is determined according to the word distance;
The distance average under each semanteme is compared, determines that the corresponding semanteme of minimum value is described in the distance average The most association of target word is semantic.
Optionally, according to the determining fortune of target word calculated value of the most association each justice member frequency of occurrence and setting under semantic Weight is calculated, including:
According to institute's semantic information, the number of justice member and each justice member appearance under the statistics most association semanteme;
According to the total degrees that all justice members occur in the case where the most association is semantic, and with the target word calculated value Sum, determine the total value of weight calculation;
The ratio for calculating separately each justice the member number occurred and the total value determines the fortune of each justice member and target word Calculate weight.
The third aspect, the application provide a kind of term vector generating means for supporting polarity differentiation and ambiguity, including:
Acquiring unit, for obtaining term vector model and resource file under current business scene, the resource bundle Include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit, for determining the corresponding original term vector of target word according to the term vector model; The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes the justice member under each semanteme And the number that each justice member occurs;
Operation weight determining unit, for determining that operation is weighed according to the target word calculated value of institute's semantic information and setting Weight;
New term vector generation unit, for dividing per one-dimensional value in the original term vector according to the operation weight It is not weighted summation operation, generates the corresponding new term vector of the target word.
Fourth aspect, the application provide a kind of term vector generating means for supporting polarity differentiation and ambiguity, including:
Information acquisition unit, for obtaining term vector model and resource file under current business scene, and acquisition packet Statement text containing target word, the resource file include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit, for according to the term vector model determine the corresponding prime word of the target word to Amount;The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes under each semanteme The number that adopted member and each justice member occur;
Word set determination unit is closed on, closes on word set, the neighbour for the target word determining in the statement text Nearly word set is multiple set of words neighbouring with the target word in the statement text;
It is most associated with semantic determination unit, for according to the neighbouring word set and institute's semantic information, the determining and target Corresponding each first frequency of occurrence of justice under most association semanteme and the most association of the word under current business scene are semantic;
Operation weight determining unit, for the target according to the most association each justice first frequency of occurrence and setting under semantic Word calculated value determines the operation weight of each justice member and the operation weight of target word;
New term vector generation unit, for dividing per one-dimensional value in the original term vector according to the operation weight It is not weighted summation operation, generates the corresponding new term vector of the target word.
From the above technical scheme, the application provide it is a kind of support polarity distinguish and ambiguity term vector generation method and Device, in practical applications, the method is according to term vector established under current business scene and resource file, to target word The value of each dimension is weighted in term vector, generates new term vector.Method is respectively by institute's ariyoshi member in resource file Number is most associated under semanteme comprising the first number of justice under the most semanteme of adopted first quantity, and with the statement text comprising target word The first number of justice, determine the operation weight of each dimension values in new term vector, thus according to operation weight target word and justice member word It is weighted summation between vector, obtains new term vector.
Term vector generation method provided by the present application, can be according to the term vector model and resource file constructed, dynamic Generating new term vector, new term vector generated can more accurately react the semantic features of practical business scene, and by During ranking operation, operation weight is determined based on semantic information, thus can significantly improve antonym and Influence of the polysemy situation to matching result solves the term vector of conventional method building with polysemy and antonym feelings The problem of matching error is easy under condition.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without creative efforts, also Other drawings may be obtained according to these drawings without any creative labor.
Fig. 1 is a kind of term vector generation method flow diagram for supporting polarity differentiation and ambiguity;
Fig. 2 is the flow diagram that the embodiment of the present application one determines operation weight;
Fig. 3 is the flow diagram that the embodiment of the present application two determines operation weight;
Fig. 4 is the structural schematic diagram of term vector generating means in the embodiment of the present application;
Fig. 5 is another term vector generation method flow diagram for supporting polarity differentiation and ambiguity;
Fig. 6 is that the embodiment of the present application three determines the flow diagram for being most associated with semanteme;
Fig. 7 is the flow diagram that the embodiment of the present application three determines operation weight;
Fig. 8 is the structural schematic diagram of another term vector generating means in the embodiment of the present application.
Specific embodiment
Embodiment will be illustrated in detail below, the example is illustrated in the accompanying drawings.In the following description when referring to the accompanying drawings, Unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Implement described in following embodiment Mode does not represent all embodiments consistent with the application.It is only and be described in detail in claims, the application The example of the consistent system and method for some aspects.
In technical solution provided by the present application, the current business scene refers to the business neck of specific business activity ownership Domain, such as finance, financing, insurance, science and technology, network.For different business scenarios, the text information often used has very big Difference, phrase semantic used in corresponding text information also have the characteristics that diversified.Therefore, in order to making computer exist Under different business scenarios, text information used in business activity is understood, to realize that intelligent answer, text classification etc. are automatic Treatment process is needed by digitizing to each vocabulary in text information, i.e., by way of term vector by word into Row indicates.
In the application, the corresponding word of term vector is referred to as target word, i.e. target word is to be present in current business scene Text information in, such as " apple " in " I wants to buy an Apple Computers ".During intelligent answer or text classification The term vector used is by way of machine learning, is training language with the large amount of text information collected under business scenario Material is trained, to obtain to react the vector of incidence relation between target word semanteme and target word and other words. For term vector generally after dimensionality reduction, dimension will not be too high, related according to specific business scenario and the training corpus being collected into, Such as common term vector dimension is 50 dimensions or 100 dimensions.
It should be noted that the semanteme being previously mentioned in the application refers to multiple physical meanings that a word is included, example Such as " apple ", it is considered that include two semantemes, one is " plant ", another kind is " Apple Inc. ", and each semantic correspondence is more A word shows, and each to be used to show semantic word be exactly a justice member, for example, under " plant " this semanteme, Ke Yitong " fruit ", " tree " etc. are crossed to indicate.Therefore, in technical solution provided by the present application, resource file is to refer in particular to a kind of word collection It closes.In resource file, comprising whole words under business scenario, and each word is according to the knot of " word-semanteme-justice member " Structure is arranged, and is constituted a huge text file (txt), is often shown as such as flowering structure:
" 22 function word structures help 4 orientations shoot apparatus component ";
" 22 function words dynamic help 1 to finish ";
" be 41 be 1 there are 3 to show agreement indicate word 1 it is specific ";
……
" apple 35 carries the 3 tree fruit reproduction of specific brand computer 1 fruit of energy of pattern value "
……
With the first row data instance, " " it is exactly target word in the application, the first digit after target word represents mesh The semantic quantity that mark word has, i.e., " " there are 2 semantemes.Target word " " after digital representation it is a kind of it is semantic under include Adopted member quantity, i.e., " " first semanteme include 2 justice members, " function word " and " structure helps ";" " second semanteme include 4 A justice member, " orientation ", " shooting ", " apparatus " and " component " can carry out all semantic of target word through this structure Describe, and can determine the justice member of different semantemes.
Based on resource file and term vector that the above content provides, the application, which provides, a kind of supports that polarity is distinguished and ambiguity Term vector generation method, to improve target word in the influence with polysemy and antonym to term vector matching result, specifically Including following embodiment.
Embodiment one
Referring to Fig. 1, polarity is supported to distinguish the term vector generation method flow diagram with ambiguity to be a kind of.The present embodiment In, included the following steps according to the method that resource file generates term vector:
S101:The term vector model and resource file under current business scene are obtained, the resource file includes current industry The corresponding adopted member of multiple semantemes under scene of being engaged in.
In the present embodiment, after being determined which kind of business scope text information to be treated belongs to, it is necessary first to obtain Take the term vector model under current business scene, generally can by called in server or database the word having had built up to Measure model.Here term vector model refers to the set being made of a large amount of term vector, that is, passes through training corpus and current business The incidence relation between each word occurred in service profile in scene, obtained each word correspond to the collection of term vector composition It closes.
Semanteme of the resource file due to covering whole words, data volume is very huge, for the ease of data It calls, in technical solution provided by the present application, the resource file is the text of multiple justice member compositions under current business scene. I.e. in this application, can only call corresponding with current business scene resource file, for those with current business scene without The word of pass can not be considered in the generating process for carrying out term vector, to reduce data processing amount, it is raw to be easy to implement dynamic At term vector.
Further, since the building mode of resource file has differences, there is also huge for the form of expression of resource file Big difference, for example, some resource files are the text files being made of adopted first word and number, some resource files be then with The database file that the form of field and value is constituted.It therefore, can be according to current industry during calling corresponding resource file The specific content in resource file of business scene is screened, and the semanteme and justice of word under current business scene are determined for compliance with Member.
For example, for the resource file of textual form, it can be before calling, according to the word in current business scene to money Content in source file is retrieved, and is extracted in current business scene, the word that can be used, corresponding resource file The data of structure, then by the data summarization of extraction, constitute and be used for current business scene, but size of population is smaller than entire resource file New resources file.During subsequent further generation term vector, the new resources file of call establishment.
S102:The corresponding original term vector of target word is determined according to the term vector model;The target word is extracted in institute Corresponding semantic information in resource file is stated, institute's semantic information includes justice member and each justice member appearance under multiple semantemes Number.
In the present embodiment, the corresponding term vector of target word is determined according to term vector model after obtaining term vector model, Since determining term vector is the term vector having had been built up, it is referred to as original term vector in this application.It determines original While term vector, the present embodiment also extracts the corresponding semantic information of target word in resource file, for the ease of subsequent determining power It is heavy, it include the number that justice member and adopted member occur in the semantic information of extraction.
For example, target word " arriving ", corresponding resource file structure are:
" reaching 1 to 61 function word, 2 function word amplitude, 2 function word goes to 1 arrival 1 careful ";
According to above-mentioned resource file structure, when extracting semantic information, the frequency of occurrence for obtaining adopted member and justice member is:" function Can word " -3 times, " amplitude " -1 time, " reaching " -1 time, " going to " -1 time, " going to " -1 time, " arrival " -1 time, " carefulness " -1 time.
After term vector model has been determined and has been extracted semantic information, technical solution provided by the present application need according to word to It measures model and semantic information determines the operation weight for generating term vector, i.e.,:
S103:Operation weight is determined according to institute's semantic information and the target word calculated value of setting.
It further, referring to fig. 2, can according to institute's semantic information and the target word calculated value of setting in the present embodiment To determine operation weight in the following way:
S1031:According to institute's semantic information, count under the corresponding all semantemes of current goal word, justice member and each justice The number that member occurs;
S1032:According to it is all it is described justice member occur total degrees, and with the target word calculated value and, determine weigh The total value of re-computation;
S1033:The ratio for calculating separately each justice the member number occurred in institute's semantic information and the total value, determines The operation weight of each justice member and the operation weight of target word.
In the present embodiment, determine that operation weight is needed to the justice member and justice member under determining semantic information, under all semantemes The number of appearance is counted, and is determined the total degree that institute's ariyoshi member occurs, is determined in calculating process further according to target word calculated value Weight total value, further according to it is each justice member frequency of occurrence and weight total value ratio determine, justice member operation weight.
In technical solution provided by the present application, the target word calculated value judges journey according to the difference of the target word Degree, the summation equal to 1 or equal to institute's ariyoshi member frequency of occurrence in institute's semantic information.That is, if during actual match The word that the term vector constructed is matched to semantic similarity is more, it is desirable that the term vector of generation will have area in the judgement of target word Other property, therefore occur in such a case, it is possible to which the target word calculated value is equal to institute's ariyoshi member in semantic information The summation of number;If not focusing on the distinctiveness of target word during actual match, then the target word meter is taken in operation Calculation value is 1.
Such as:Target word is that " arriving " counts after obtaining resource file and corresponding semantic information according to semantic information Under the corresponding all semantemes of target word " arriving ", the number of justice member and each justice member appearance, i.e.,:
" function word " -3 times, " amplitude " -1 time, " reaching " -1 time, " going to " -1 time, " going to " -1 time, " arrival " -1 time, " carefulness " -1 time;
It calculates under target word again, the total degree that all justice member occurs, i.e.,:
The total degree that adopted member occurs is:3+1+1+1+1+1+1=8.If taking target word calculated value for the appearance of institute's ariyoshi member The summation of number, as 8, then calculating weight total value is:8+8=16.
According to the frequency of occurrence of the weight total value of calculating and each justice member, weight is calculated, i.e.,:
The weight of adopted member corresponding word is respectively:" function word " weight Wa=3/16, " amplitude " weight are Wb=1/16, " reaching " weight is Wc=1/16 ... ...;
The weight W=8/16 of target word is calculated simultaneously.
S104:According to the operation weight, transported to summation is weighted in the original term vector respectively per one-dimensional value It calculates, generates the corresponding new term vector of the target word.
In the present embodiment, after obtaining operation weight, can according to the words of obtained all weighted values and each word to Magnitude determines the corresponding new term vector of target word, i.e.,:
The value X of first dimension of new term vector01=Xa1×Wa+Xb1×Wb+Xc1×Wc+……+X1×W;
The two-dimensional value X of new term vector02=Xa2×Wa+Xb2×Wb+Xc2×Wc+……+X2×W;
……
The value X of n-th dimension of new term vector0n=Xan×Wa+Xbn×Wb+Xcn×Wc+……+Xn×W;
Illustratively, if in current business scene target word " arriving " term vector be [0.563,0.727, -0.165, 0.328,0.265,……];
The term vector of adopted member corresponding word " function word " is [0.423,0.187,0.598,0.856, -0.796 ...];
The term vector of adopted member corresponding word " amplitude " is [0.598,0.326, -0.224,0.852,0.367 ... ...];
The term vector for successively determining institute's ariyoshi member, then can calculate the numerical value in new term vector in all dimensions, i.e.,:
The value X of first dimension of new term vector01=0.423 × 3/16+0.598 × 1/16+ ...+0.563 × 8/16;
The two-dimensional value X of new term vector02=0.187 × 3/16+0.326 × 1/16+ ...+0.727 × 8/16;
From the above technical scheme, term vector generation method in the present embodiment, is asked being weighted to original term vector In calculating process, according in resource file, the numbers that all semantic and justice member of target word occurs, to change prime word The distance between justice member and target word, can obtain the term vector for more meeting current business scene, convenient for answering subsequent in vector It is accurately semantic with middle matching.
Embodiment two
The difference between this embodiment and the first embodiment lies in as shown in figure 3, according to institute's semantic information and the target of setting Word calculated value determines in the step of operation weight, including:
S201:It counts in institute's semantic information, comprising the corresponding institute's ariyoshi member of the most semanteme of the first quantity of justice and each The number that adopted member occurs;
S202:According to the total degrees that all justice members occur under comprising the most semanteme of adopted first quantity, and with institute The sum for stating target word calculated value determines the total value of weight calculation;
S203:The ratio for calculating separately each justice the member number occurred and the total value determines the operation power of each justice member The operation weight of weight and target word.
In the present embodiment, determine that operation weight needs to include the most semanteme of adopted first quantity under determining semantic information It is chosen, that is, can be identified and be extracted by comparing the corresponding numerical value of semanteme each in resource file, one by one whole languages The size of this lower numerical value of justice, to select the semanteme most comprising the first quantity of justice.It is corresponding again under the semanteme most to quantity Justice member and justice member occur number counted, determine institute's ariyoshi member occur total degree.It is determined according to target word calculated value The ratio of weight total value in calculating process, frequency of occurrence and weight total value further according to each justice member is determining, the operation of justice member Weight.
It should be noted that when the largest number of semantic by the corresponding justice member of resource file acquisition target word, if target In word corresponding resource file, the first number of justice is identical in semanteme, or when the first most semantemes of number of justice have multiple, selects in multiple semantemes It selects first or randomly selects one.It is possible to further which according to adopted member, the frequency occurred in current business is come true Which semantic progress weight calculation fixed selection chooses.
For example, target word " arriving ", corresponding resource file structure are:
" reaching 1 to 61 function word, 2 function word amplitude, 2 function word goes to 1 arrival 1 careful ";
Wherein, it is respectively " 2 function word amplitude " and " 2 function words are reached " comprising the most semanteme of adopted first quantity, selects the One semanteme is calculated, it is determined that the frequency of occurrence of justice member is in corresponding semanteme:
" function word " -1 time, " amplitude " -1 time;
Correspondingly, calculating the total degree that all justice members occur under comprising the most semanteme of adopted first quantity is 1+1 =2, if taking target word calculated value for the summation of institute's ariyoshi member frequency of occurrence, as 8, then calculating weight total value is:8+2= 10。
According to the frequency of occurrence of the weight total value of calculating and each justice member, weight is calculated, i.e.,:
The weight of adopted member corresponding word is respectively:" function word " weight Wa=1/10, " amplitude " weight is Wb=1/10, mesh The weight for marking word " arriving " is W=8/10.Further according to above-mentioned calculation formula, summation is weighted to original term vector, obtains neologisms In vector, per one-dimensional corresponding value.
From the above technical scheme, the embodiment of the present application two in resource file by determining the most language of the first quantity of justice Justice determines the weight of weighted sum, to obtain new term vector further according to the semanteme most comprising adopted first quantity one by one.Relatively In embodiment one, the present embodiment can reduce calculating and the extracted amount of data while guaranteeing to obtain new term vector, convenient for fast Speed obtains new term vector, realizes the dynamic generation of term vector.
Based on both examples above, the application provides a kind of term vector generating means for supporting polarity differentiation and ambiguity, As shown in figure 4, shown term vector generating means include:
Acquiring unit 1, for obtaining term vector model and resource file under current business scene, the resource bundle Include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit 2, for determining the corresponding original term vector of target word according to the term vector model; The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes the justice member under each semanteme And the number that each justice member occurs;
Operation weight determining unit 3, for determining operation according to institute's semantic information and the target word calculated value of setting Weight;
New term vector generation unit 4, for dividing per one-dimensional value in the original term vector according to the operation weight It is not weighted summation operation, generates the corresponding new term vector of the target word.
Above-mentioned two embodiment is suitable for the case where generating new term vector according to single word, can improve antonym to rear The influence of continuous matching result.In order to obtain the new term vector that can react practical semanteme, in practical applications, can also generate During term vector, using the practical application scene of target word as reference.
Embodiment three
Referring to Fig. 5, in the present embodiment, the method for generating term vector includes the following steps:
S301:The term vector model and resource file under current business scene are obtained, and obtains the language comprising target word Sentence text, the resource file include the corresponding adopted member of multiple semantemes under current business scene;
S302:The corresponding original term vector of the target word is determined according to the term vector model;Extract the target word The corresponding semantic information in the resource file, institute's semantic information include that adopted first and each adopted member under multiple semantemes goes out Existing number;
S303:Determine that the word set of closing on of the target word, the neighbouring word set are the sentence in the statement text The multiple set of words neighbouring with the target word in text;
S304:According to the neighbouring word set and institute's semantic information, the determining and target word is under current business scene Most association is semantic and the most association it is semantic under corresponding each first frequency of occurrence of justice;
S305:It is determined according to the target word calculated value of the first frequency of occurrence of justice each under the most association semanteme and setting each The operation weight of adopted member and the operation weight of target word;
S306:According to the operation weight, transported to summation is weighted in the original term vector respectively per one-dimensional value It calculates, generates the corresponding new term vector of the target word.
By above step it is found that the difference of the present embodiment and above-described embodiment is, in the case where obtaining current business scene While term vector model and resource file, the statement text comprising target word is also obtained.Statement text can be from current Service profile used in business scenario makes pauses in reading unpunctuated ancient writings to service profile by punctuation mark or paragraph format, obtain Multiple statement texts, to extract that comprising target word in multiple statement texts.
In the present embodiment, after obtaining the statement text comprising target word, the mesh can be determined in statement text Mark word closes on word set, so that according to neighbouring word set and institute's semantic information, determination is with the target word in current business scene Under most association is semantic and the most association it is semantic under corresponding each first frequency of occurrence of justice.Word set is closed on to refer in sentence Before and after the target word of text, the set of the word composition in setting range.That is, obtaining the statement text comprising target word Afterwards, statement text is split using Word Intelligent Segmentation tool, obtains the word segmentation result of multiple words composition, and from word segmentation result The middle satisfactory word composition of selection closes on word set.
Further, determining to exist with the target word as shown in fig. 6, according to the neighbouring word set and institute's semantic information Most association under current business scene is semantic, including:
S3031:It sets window value and the neighbouring of the target word is extracted in the statement text according to the window value Word set, the neighbouring word set include the preceding cliction before the target word, and after the target word hereinafter Word;
S3032:According to the original term vector, each preceding cliction in the neighbouring word set, each rear cliction are calculated separately With the word distance between each justice member;
S3033:The distance average under each semanteme is determined according to the word distance;
S3034:The distance average under each semanteme is compared, determines the corresponding semanteme of minimum value in the distance average It is semantic for the most association of the target word.
In the present embodiment, after segmenting to the statement text containing target word, a window value is set.The window of setting Mouth value, can also can be set automatically taking human as being set according to the length to statement text.Due in most cases Under, there are semantic associations between the word most probable and target word of target word adjacent position, therefore in practical applications, window value It is often smaller.For example, can choose 1 or 2.When window value is 1, in the word segmentation result of statement text, target is extracted The previous word of word is preceding cliction, and the latter word for extracting target word is rear cliction;When window value is 2, in statement text In word segmentation result, the first two words for extracting target word are preceding cliction, latter two word for extracting target word is rear cliction.It needs to illustrate , when beginning of the sentence or sentence tail of the target word in statement text, then it is corresponding only extract after cliction and preceding cliction, as judgement Foundation.
After preceding cliction and rear cliction has been determined, technical solution provided by the present application can be according to adopted member and preceding cliction With the term vector of rear cliction, the incidence relation of each justice member and preceding cliction, rear cliction is determined.Determining foundation is between term vector Distance is referred to as word distance for the ease of distinguishing in the present embodiment.The distance between term vector is smaller, then illustrates two words Between relevance it is higher, in practical deterministic process, can be carried out by the calculation of Euclidean distance or COS distance true It is fixed.
First distinguish in the present embodiment to obtain the incidence relation between preceding cliction, rear cliction and each semanteme of target word Each justice first the distance between term vector and preceding cliction are judged, then to it is each it is semantic under, multiple justice members and preceding cliction Distance is averaged, and same mode calculates the distance averages of multiple justice members under rear cliction and each semanteme again, finally The average value obtained twice is calculated again and is averaged, is so successively calculated, the distance average under each semanteme is obtained.
For example, target word is " apple ", statement text is " I wants to buy an Apple Computers ";
Statement text is segmented, obtaining word segmentation result is " I buys/mono-/apple/computer at/thinking/", if setting window Mouth value is 1, then preceding cliction is "one", and rear cliction is " computer ";
In resource file, determine that the resource file structure of target word " apple " is:
" apple 35 carries the 3 tree fruit reproduction of specific brand computer 1 fruit of energy of pattern value ";
As it can be seen that target word " apple " mainly includes 3 semantemes, i.e. " 5 carry the specific brand computer energy of pattern value ", " 1 water Fruit ", " 3 tree fruit reproduction ".
Successively apart from calculating, i.e., justice member in semanteme is carried out with preceding cliction:Calculate " carrying " corresponding term vector and " one The distance between it is a " corresponding term vector;It calculates the distance between " pattern value " corresponding term vector and " primary " ..., is counting It lets it pass after the distance between " energy " and " primary ", averages to calculated distance.
Successively apart from calculating, i.e., justice member in semanteme is carried out with rear cliction again:Calculate " carrying " and " computer ", " pattern value " The corresponding distance average of rear cliction is obtained with " computer " ....
It is last that mean value calculation is carried out according to the distance average calculated twice again, obtain " the 5 carrying specific brands of pattern value Distance average under the corresponding semanteme of computer energy ".The distance average under each semanteme is finally compared, determines the range averaging The corresponding semantic most association for the target word of minimum value is semantic in value.
Further, after being determined that most association is semantic, as shown in fig. 7, the application is according to every under the most association semanteme A justice member frequency of occurrence and the target word calculated value of setting determine operation weight, including:
S3051:According to institute's semantic information, what justice member and each justice member under the statistics most association is semantic occurred Number;
S3052:According to the total degrees that all justice members occur in the case where the most association is semantic, and with the target word The sum of calculated value determines the total value of weight calculation;
S3053:The ratio for calculating separately each justice the member number occurred and the total value determines each justice member and target The operation weight of word.
By above step it is found that the present embodiment is essentially identical with above-described embodiment the step of determining operation weight, difference It is to count the number that the semantic lower justice member of most association occurs according to semantic information, occurs in most association semanteme further according to adopted member Number and the target word calculated value of setting determine weight total value, so that determining the operation power of each justice member under most association semanteme Weight.Finally, the present embodiment calculates neologisms according to original term vector, the term vector of justice member and the operation weight being calculated The step of vector, calculating, is same as the previously described embodiments, and details are not described herein again.
Further, since the present embodiment is when generating new term vector, it can determine that target word is being worked as according to word distance True semanteme under preceding business scenario, therefore the present embodiment can also directly judge the semanteme of target word, specially:
According to the distance between preceding cliction in neighbouring word set and multiple semantic lower adopted members Ai(x), and rear cliction with it is multiple The distance between the lower adopted member of semanteme Bi(y), the corresponding distance value of each semanteme is determined.Again from the corresponding distance value of each semanteme In, determine that apart from the smallest semanteme be the true semanteme of target word under current business scene.
Wherein, distance value includes COS distance value cos θ and Euclidean distance d, when the distance value of calculating is COS distance value When:
When the distance value of calculating is Euclidean distance:
That is, in the present embodiment, it can be by determining that the distance between target word and semantic corresponding adopted member determine target The real meaning of word, to be directly called in the matching process.
For example, statement text " I wants to buy an Apple Computers ", wherein the distance between target word " apple " and each semanteme Respectively:
Semantic 1 distance:0.52552, corresponding justice member " carrying the specific brand computer energy of pattern value ";
Semantic 2 distances:0.6278, corresponding justice is first " fruit ";
Semantic 3 distances:0.64891, corresponding justice member " tree fruit reproduction ";
As it can be seen that target word is nearest at a distance from semanteme 1, therefore its true semanteme is determined as " carrying the specific brand of pattern value The corresponding semanteme of computer energy ".
Based on embodiment three, the application also provides a kind of term vector generating means for supporting polarity differentiation and ambiguity, such as Fig. 8 Shown, shown device includes:
Information acquisition unit 1, for obtaining term vector model and resource file under current business scene, and acquisition packet Statement text containing target word, the resource file include the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit 2, for determining the corresponding prime word of the target word according to the term vector model Vector;The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes under each semanteme Justice member and it is each justice member occur number;
Word set determination unit 5 is closed on, closes on word set, the neighbour for the target word determining in the statement text Nearly word set is multiple set of words neighbouring with the target word in the statement text;
It is most associated with semantic determination unit 6, for according to the neighbouring word set and institute's semantic information, the determining and target Corresponding each first frequency of occurrence of justice under most association semanteme and the most association of the word under current business scene are semantic;
Operation weight determining unit 3, for the mesh according to the most association each justice first frequency of occurrence and setting under semantic Mark word calculated value determines the operation weight of each justice member and the operation weight of target word;
New term vector generation unit 4, for dividing per one-dimensional value in the original term vector according to the operation weight It is not weighted summation operation, generates the corresponding new term vector of the target word.Information acquisition unit 1, for obtaining current industry Term vector model and resource file under scene of being engaged in, and the statement text comprising target word is obtained, the resource file is to work as The text of multiple semantic corresponding justice member compositions under preceding business scenario;
From the above technical scheme, the application provide it is a kind of support polarity distinguish and ambiguity term vector generation method and Device, in practical applications, the method is according to term vector established under current business scene and resource file, to target word The value of each dimension is weighted in term vector, generates new term vector.Method is respectively by institute's ariyoshi member in resource file Number is most associated under semanteme comprising the first number of justice under the most semanteme of adopted first quantity, and with the statement text comprising target word The first number of justice, determine the operation weight of each dimension values in new term vector, thus according to operation weight target word and justice member word It is weighted summation between vector, obtains new term vector.
Term vector generation method provided by the present application, can be according to the term vector model and resource file constructed, dynamic Generating new term vector, new term vector generated can more accurately react the semantic features of practical business scene, and by During ranking operation, operation weight is determined based on semantic information, thus can significantly improve antonym and Influence of the polysemy situation to matching result solves the term vector of conventional method building with polysemy and antonym feelings The problem of matching error is easy under condition.
Similar portion cross-reference between embodiment provided by the present application, specific embodiment provided above is only It is several examples under the total design of the application, does not constitute the restriction of the application protection scope.For those skilled in the art For member, any other embodiment expanded without creative efforts according to application scheme all belongs to In the protection scope of the application.

Claims (10)

1. a kind of support polarity to distinguish the term vector generation method with ambiguity, which is characterized in that including:
The term vector model and resource file under current business scene are obtained, the resource file includes more under current business scene The corresponding adopted member of a semanteme;
The corresponding original term vector of target word is determined according to the term vector model;The target word is extracted in the resource file In corresponding semantic information, institute's semantic information includes the number of multiple semantic lower justice members and each justice member appearance;
Operation weight is determined according to institute's semantic information and the target word calculated value of setting;
Institute is generated to summation operation is weighted in the original term vector respectively per one-dimensional value according to the operation weight State the corresponding new term vector of target word.
2. term vector generation method according to claim 1, which is characterized in that according to institute's semantic information and setting Target word calculated value determines operation weight, including:
According to institute's semantic information, count under the corresponding all semantemes of current goal word, time of justice member and each justice member appearance Number;
According to it is all it is described justice member occur total degrees, and with the target word calculated value and, determine the total of weight calculation Value;
The ratio for calculating separately each justice the member number occurred in institute's semantic information and the total value determines each justice member The operation weight of operation weight and target word.
3. term vector generation method according to claim 1, which is characterized in that according to institute's semantic information and setting Target word calculated value determines operation weight, including:
It counts in institute's semantic information, occurs comprising the corresponding institute's ariyoshi member of the most semanteme of adopted first quantity and each justice member Number;
According to the total degrees that all justice members occur under comprising the most semanteme of adopted first quantity, and with the target word meter The sum of calculation value determines the total value of weight calculation;
The ratio for calculating separately each justice the member number occurred and the total value determines the operation weight and target of each justice member The operation weight of word.
4. term vector generation method according to claim 1 to 3, which is characterized in that the target word calculated value Judge degree according to the difference of the target word, equal to 1 or equal in institute's semantic information institute's ariyoshi member frequency of occurrence it is total With.
5. term vector generation method according to claim 1, which is characterized in that according to the operation weight, to the original It is weighted summation operation respectively per one-dimensional value in beginning term vector, generates the corresponding new term vector of the target word, including, The corresponding term vector of justice member is extracted in the term vector model, and according to the following formula and the operation weight, to the prime word to It is weighted summation operation respectively per one-dimensional value in amount, generates the corresponding new term vector of the target word:
The value X of n-th dimension in new term vector0n=Xan×Wa+Xbn×Wb+Xcn×Wc+……+Xn×W;
In formula:XanThe value of the n-th dimension in term vector, W are corresponded to for justice member aaFor the operation weight of adopted member a;XbnFor adopted member b equivalent to The value of n-th dimension, W in amountbFor the operation weight of adopted member b;XnThe value of the n-th dimension in term vector is corresponded to for target word, W is target word Operation weight.
6. a kind of support polarity to distinguish the term vector generation method with ambiguity, which is characterized in that including:
The term vector model and resource file under current business scene are obtained, and obtains the statement text comprising target word, institute Stating resource file includes the corresponding adopted member of multiple semantemes under current business scene;
The corresponding original term vector of the target word is determined according to the term vector model;The target word is extracted in the resource Corresponding semantic information in file, institute's semantic information include the number of the justice member and each justice member appearance under multiple semantemes;
Determine the word set of closing on of the target word in the statement text, the neighbouring word set in the statement text with institute State the neighbouring multiple set of words of target word;
According to the neighbouring word set and institute's semantic information, determines and be most associated with language under current business scene with the target word Corresponding each first frequency of occurrence of justice under justice and the most association are semantic;
The fortune of each justice member is determined according to the target word calculated value of the first frequency of occurrence of justice each under the most association semanteme and setting Calculate the operation weight of weight and target word;
Institute is generated to summation operation is weighted in the original term vector respectively per one-dimensional value according to the operation weight State the corresponding new term vector of target word.
7. term vector generation method according to claim 6, which is characterized in that according to the neighbouring word set and the semanteme Information, it is determining to be most associated with semanteme under current business scene with the target word, including:
It sets window value and the neighbouring word set of the target word, the neighbour is extracted in the statement text according to the window value Nearly word set includes the preceding cliction before the target word, and the rear cliction after the target word;
According to the original term vector, calculate separately in the neighbouring word set it is each before cliction, it is each after cliction and each described Word distance between adopted member;
The distance average under each semanteme is determined according to the word distance;
The distance average under each semanteme is compared, determines that minimum value is corresponding semantic for the target in the distance average The most association of word is semantic.
8. term vector generation method according to claim 6, which is characterized in that according to each justice under the most association semanteme First frequency of occurrence and the target word calculated value of setting determine operation weight, including:
According to institute's semantic information, the number of justice member and each justice member appearance under the statistics most association semanteme;
According to the total degrees that all justice members occur in the case where the most association is semantic, and with the target word calculated value With determine the total value of weight calculation;
The ratio for calculating separately each justice the member number occurred and the total value determines the operation power of each justice member and target word Weight.
9. a kind of support polarity to distinguish the term vector generating means with ambiguity, which is characterized in that including:
Acquiring unit, for obtaining term vector model and resource file under current business scene, the resource file includes working as The corresponding adopted member of multiple semantemes under preceding business scenario;
Prime word vector determination unit, for determining the corresponding original term vector of target word according to the term vector model;It extracts The target word corresponding semantic information in the resource file, institute's semantic information include each semantic lower justice it is first and The number that each justice member occurs;
Operation weight determining unit, for determining operation weight according to institute's semantic information and the target word calculated value of setting;
New term vector generation unit, for according to the operation weight, in the original term vector per one-dimensional value respectively into Row sum operation with coefficient generates the corresponding new term vector of the target word.
10. a kind of support polarity to distinguish the term vector generating means with ambiguity, which is characterized in that including:
Information acquisition unit, for obtaining term vector model and resource file under current business scene, and obtaining includes mesh The statement text of word is marked, the resource file includes the corresponding adopted member of multiple semantemes under current business scene;
Prime word vector determination unit, for determining the corresponding original term vector of the target word according to the term vector model; The target word corresponding semantic information in the resource file is extracted, institute's semantic information includes the justice member under each semanteme And the number that each justice member occurs;
Word set determination unit is closed on, closes on word set, the neighbouring word for the target word determining in the statement text Collection is multiple set of words neighbouring with the target word in the statement text;
It is most associated with semantic determination unit, for according to the neighbouring word set and institute's semantic information, the determining and target word to exist Corresponding each first frequency of occurrence of justice under most association semanteme and the most association semanteme under current business scene;
Operation weight determining unit, based on the target word according to the first frequency of occurrence of justice each under the most association semanteme and setting Calculation value determines the operation weight of each justice member and the operation weight of target word;
New term vector generation unit, for according to the operation weight, in the original term vector per one-dimensional value respectively into Row sum operation with coefficient generates the corresponding new term vector of the target word.
CN201810557309.9A 2018-06-01 2018-06-01 Support the term vector generation method and device of polarity differentiation and ambiguity Pending CN108829669A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810557309.9A CN108829669A (en) 2018-06-01 2018-06-01 Support the term vector generation method and device of polarity differentiation and ambiguity
CN201811498188.1A CN109614617B (en) 2018-06-01 2018-12-07 Word vector generation method and device supporting polarity differentiation and polysemous

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810557309.9A CN108829669A (en) 2018-06-01 2018-06-01 Support the term vector generation method and device of polarity differentiation and ambiguity

Publications (1)

Publication Number Publication Date
CN108829669A true CN108829669A (en) 2018-11-16

Family

ID=64145816

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201810557309.9A Pending CN108829669A (en) 2018-06-01 2018-06-01 Support the term vector generation method and device of polarity differentiation and ambiguity
CN201811498188.1A Active CN109614617B (en) 2018-06-01 2018-12-07 Word vector generation method and device supporting polarity differentiation and polysemous

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201811498188.1A Active CN109614617B (en) 2018-06-01 2018-12-07 Word vector generation method and device supporting polarity differentiation and polysemous

Country Status (1)

Country Link
CN (2) CN108829669A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021111420A (en) * 2020-01-15 2021-08-02 ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド Method and apparatus for processing semantic description of text entity, and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375989A (en) * 2014-12-01 2015-02-25 国家电网公司 Natural language text keyword association network construction system
CN104391963A (en) * 2014-12-01 2015-03-04 北京中科创益科技有限公司 Method for constructing correlation networks of keywords of natural language texts
CN106528588A (en) * 2016-09-14 2017-03-22 厦门幻世网络科技有限公司 Method and apparatus for matching resources for text information
CN107092596B (en) * 2017-04-24 2020-08-04 重庆邮电大学 Text emotion analysis method based on attention CNNs and CCR

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021111420A (en) * 2020-01-15 2021-08-02 ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド Method and apparatus for processing semantic description of text entity, and device
JP7113097B2 (en) 2020-01-15 2022-08-04 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Sense description processing method, device and equipment for text entities
US11669690B2 (en) 2020-01-15 2023-06-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for processing sematic description of text entity, and storage medium

Also Published As

Publication number Publication date
CN109614617A (en) 2019-04-12
CN109614617B (en) 2022-12-16

Similar Documents

Publication Publication Date Title
CN108647309B (en) Chat content auditing method and system based on sensitive words
CN110069784A (en) A kind of voice quality inspection methods of marking, device, terminal and can storage medium
CN105589844B (en) It is a kind of to be used to take turns the method for lacking semantic supplement in question answering system more
CN103729474B (en) Method and system for recognizing forum user vest account
CN104111933B (en) Obtain business object label, set up the method and device of training pattern
CN110362819B (en) Text emotion analysis method based on convolutional neural network
CN104216876B (en) Information text filter method and system
WO2021073116A1 (en) Method and apparatus for generating legal document, device and storage medium
CN106202032A (en) A kind of sentiment analysis method towards microblogging short text and system thereof
CN108345587A (en) A kind of the authenticity detection method and system of comment
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN109446333A (en) A kind of method that realizing Chinese Text Categorization and relevant device
CN111694937A (en) Interviewing method and device based on artificial intelligence, computer equipment and storage medium
CN108090099B (en) Text processing method and device
CN110232923A (en) A kind of phonetic control command generation method, device and electronic equipment
JP2006350656A (en) Time-series document grouping method, device, and program, and recording medium storing program
CN108269122A (en) The similarity treating method and apparatus of advertisement
CN110955750A (en) Combined identification method and device for comment area and emotion polarity, and electronic equipment
CN103678318B (en) Multi-word unit extraction method and equipment and artificial neural network training method and equipment
CN107818173B (en) Vector space model-based Chinese false comment filtering method
Mestry et al. Automation in social networking comments with the help of robust fasttext and cnn
CN110164417A (en) A kind of languages vector obtains, languages know method for distinguishing and relevant apparatus
CN107341142B (en) Enterprise relation calculation method and system based on keyword extraction and analysis
CN109446393A (en) A kind of Web Community's topic classification method and device
CN107885717A (en) A kind of keyword extracting method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181116