CN109918651A - Synonymous part of speech template acquisition methods and device - Google Patents

Synonymous part of speech template acquisition methods and device Download PDF

Info

Publication number
CN109918651A
CN109918651A CN201910114457.8A CN201910114457A CN109918651A CN 109918651 A CN109918651 A CN 109918651A CN 201910114457 A CN201910114457 A CN 201910114457A CN 109918651 A CN109918651 A CN 109918651A
Authority
CN
China
Prior art keywords
speech
speech template
template
word
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910114457.8A
Other languages
Chinese (zh)
Other versions
CN109918651B (en
Inventor
潘晓彤
刘作鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Intelligent Technology Co Ltd
Original Assignee
Beijing Xiaomi Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Intelligent Technology Co Ltd filed Critical Beijing Xiaomi Intelligent Technology Co Ltd
Priority to CN201910114457.8A priority Critical patent/CN109918651B/en
Publication of CN109918651A publication Critical patent/CN109918651A/en
Application granted granted Critical
Publication of CN109918651B publication Critical patent/CN109918651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The disclosure is directed to synonymous part of speech template acquisition methods and devices.This method comprises: obtaining multiple sample corpus;It determines the part of speech of each word in multiple sample corpus, and generates sample part of speech template corresponding with word each in multiple sample corpus;Target corpus is determined in multiple sample corpus, target part of speech template is determined in multiple sample part of speech templates, and obtain target part of speech template context corpus;The similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template is obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates;One or more sample part of speech templates that similarity meets preset requirement are determined as to the synonymous part of speech template of target part of speech template.The technical solution can be such that synonymous part of speech template is closer to target part of speech template, and context similitude is higher, to reduce the probability for occurring misreading, improve user experience.

Description

Synonymous part of speech template acquisition methods and device
Technical field
This disclosure relates to technical field of data processing more particularly to synonymous part of speech template acquisition methods and device.
Background technique
Synonym refers to word different from the pronunciation of given word and that meaning is identical or essentially identical, finds out accordingly for given word Synset be natural language understanding an important topic.It plays critical work in natural language understanding field With, for example, the problem of proposed by replacement user in synonym, the problem of user being proposed with it is pre- in Q & A database First ready default problem associates, and further presets problem according to this and obtain corresponding answer, reaches and is answered according to this Case answers the purpose for the problem of user proposes automatically.
Summary of the invention
To overcome the problems in correlation technique, embodiment of the disclosure provides a kind of synonymous part of speech template acquisition methods And device.Technical solution is as follows:
It is according to an embodiment of the present disclosure in a first aspect, providing a kind of synonymous part of speech template acquisition methods, comprising:
Multiple sample corpus are obtained, each sample corpus includes multiple words;
It determines the part of speech of each word in multiple sample corpus, and generates sample corresponding with word each in multiple sample corpus Part of speech template, sample part of speech template include the part of speech parameter of centre word, the part of speech parameter of centre word and adjacent word, and centre word is Word corresponding with sample part of speech template, adjacent word is positioned at the above of centre word and the word adjacent with centre word or adjacent word are Positioned at centre word hereafter and the word adjacent with centre word;
Target corpus is determined in multiple sample corpus, and target part of speech template is determined in multiple sample part of speech templates, and Target part of speech template context corpus is obtained, the centre word of target part of speech template is located in target corpus, in target part of speech template Hereafter corpus includes being located at the centre word of target part of speech template in target corpus above and under the centre word of target part of speech template The word of text;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template;
One or more sample part of speech templates that similarity meets preset requirement are determined as the synonymous of target part of speech template Part of speech template.
In the technical scheme provided by this disclosed embodiment, sample part of speech template can serve to indicate that sample part of speech template The meaning of a word of centre word, the part of speech of the centre word of sample part of speech template, the above of the centre word of sample part of speech template and with sample word The part of speech of the adjacent word of the centre word of property template or centre word hereafter and the part of speech of the word adjacent with centre word, target part of speech mould Plate can be equally used for the meaning of a word of the centre word of instruction target part of speech template, part of speech, the target of the centre word of target part of speech template The part of speech of the word above and adjacent with the centre word of target part of speech template of the centre word of part of speech template or target part of speech template Centre word hereafter and the part of speech of the word adjacent with the centre word of target part of speech template, therefore meet in identified similarity pre- If it is required that synonymous part of speech template in, the centre word of synonymous part of speech template and the centre word of target part of speech template are not only in itself language It is closer in justice and itself part of speech, the centre word of synonymous part of speech template and the centre word of target part of speech template are also in adjacent word It is also closer in the part of speech of word i.e. hereinbefore or hereinafter, so that it is closer to synonymous part of speech template with target part of speech template, Ensure that the context similitude of the two is higher, thus when carrying out synonymous replacement to target part of speech template according to synonymous part of speech template, The probability for occurring misreading is reduced, user experience is improved.
In one embodiment, method further include:
Multiple sample part of speech templates are screened according to the frequency of occurrences, and determine that the frequency of occurrences meets according to the selection result The sample part of speech template that the default frequency of occurrences requires;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template, comprising:
Meet the default frequency of occurrences according to target part of speech template, target part of speech template context corpus and the frequency of occurrences to want The sample part of speech template asked obtains the frequency of occurrences and meets each sample part of speech in the sample part of speech template that the default frequency of occurrences requires Similarity between template and target part of speech template.
In one embodiment, method further include:
It is scored according to bigram score algorithm multiple sample part of speech templates, and is determined and met according to appraisal result The sample part of speech template that default scoring requires;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template, comprising:
Meet the sample that default scoring requires according to target part of speech template, target part of speech template context corpus and scoring Part of speech template obtains scoring and meets each sample part of speech template and target part of speech mould in the sample part of speech template that default scoring requires Similarity between plate.
In one embodiment, according to target part of speech template, target part of speech template context corpus and multiple sample words Property template obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template, comprising:
Multiple sample part of speech templates and target part of speech template context corpus are encoded, to obtain multiple sample words Each word pair in the corresponding part of speech template ID of each sample part of speech template and target part of speech template context corpus in property template The word ID answered;
By the corresponding part of speech template ID of sample part of speech template each in multiple sample part of speech templates, above and below target part of speech template Input of the corresponding word ID of each word as part of speech template vector training pattern in literary corpus, will be every in multiple sample part of speech templates In the corresponding part of speech template vector of a sample part of speech template and target part of speech template context corpus the corresponding word of each word to Measure the output as part of speech template vector training pattern, word corresponding to sample part of speech template each in multiple sample part of speech templates Property template vector is trained;
The phase between each part of speech template vector part of speech template vector corresponding with target part of speech template after obtaining training Like degree.
In one embodiment, one or more sample part of speech templates that similarity meets preset requirement are determined as target The synonymous part of speech template of part of speech template, comprising:
One or more sample part of speech templates that similarity meets preset requirement are determined as candidate part of speech template;
Splice the corresponding part of speech template vector of candidate part of speech template part of speech template vector corresponding with target part of speech template, obtains Take splicing vector corresponding with candidate part of speech template;
Splicing vector is inputted into two disaggregated models;
It, will candidate part of speech corresponding with splicing vector when the output of two disaggregated models, which meets default two classification output, to be required Template is determined as the synonymous part of speech template of target part of speech template.
In one embodiment, the output of part of speech template vector training pattern includes in target part of speech template context corpus In the centre word of the target part of speech template corresponding term vector of M word above and target part of speech template context corpus Positioned at the centre word corresponding term vector of M word hereafter of target part of speech template, M is the positive integer more than or equal to 1, part of speech mould Plate vector training pattern is skip-gram model.
Second aspect according to an embodiment of the present disclosure provides a kind of synonymous part of speech template acquisition device, comprising:
Sample expects acquisition module, and for obtaining multiple sample corpus, each sample corpus includes multiple words;
Sample part of speech template generation module, for determining the part of speech of each word in multiple sample corpus, and generate with it is multiple The corresponding sample part of speech template of each word in sample corpus, sample part of speech template include centre word, centre word part of speech parameter with And the part of speech parameter of adjacent word, centre word are word corresponding with sample part of speech template, adjacent word be positioned at the above of centre word and The word adjacent with centre word or adjacent word be positioned at centre word hereafter and the word adjacent with centre word;
Target part of speech template determining module, for determining target corpus in multiple sample corpus, in multiple sample parts of speech Target part of speech template is determined in template, and obtains target part of speech template context corpus, and the centre word of target part of speech template is located at In target corpus, target part of speech template context corpus include in target corpus positioned at target part of speech template centre word above with And the word of the centre word of target part of speech template hereafter;
Similarity obtains module, for according to target part of speech template, target part of speech template context corpus and multiple samples This part of speech template obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template;
Synonymous part of speech template determining module, for similarity to be met to one or more sample part of speech templates of preset requirement It is determined as the synonymous part of speech template of target part of speech template.
In one embodiment, device further include:
Sample part of speech template filter module, for being screened according to the frequency of occurrences to multiple sample part of speech templates, and root Determine that the frequency of occurrences meets the sample part of speech template that the default frequency of occurrences requires according to the selection result;
Similarity obtains module, comprising:
First similarity acquisition submodule, for according to target part of speech template, target part of speech template context corpus and The frequency of occurrences meets the sample part of speech template that the default frequency of occurrences requires and obtains what the default frequency of occurrences of frequency of occurrences satisfaction required Similarity in sample part of speech template between each sample part of speech template and target part of speech template.
In one embodiment, device further include:
Sample part of speech template grading module, for being commented according to bigram score algorithm multiple sample part of speech templates Point, and determined according to appraisal result and meet the sample part of speech template that default scoring requires;
Similarity obtains module, comprising:
Second similarity acquisition submodule, for according to target part of speech template, target part of speech template context corpus and The sample part of speech template that the default scoring of scoring satisfaction requires, which obtains to score, to be met in the sample part of speech template that default scoring requires often Similarity between a sample part of speech template and target part of speech template.
In one embodiment, similarity obtains module, comprising:
Encoding submodule, for being encoded to multiple sample part of speech templates and target part of speech template context corpus, To obtain each corresponding part of speech template ID of sample part of speech template and the target part of speech in the multiple sample part of speech template The corresponding word ID of each word in template context corpus;
Part of speech template vector trains submodule, is used for sample part of speech template pair each in the multiple sample part of speech template The part of speech template ID answered, the corresponding word ID of each word is instructed as part of speech template vector in the target part of speech template context corpus The input for practicing model, by the corresponding part of speech template vector of sample part of speech template each in the multiple sample part of speech template and institute Output of the corresponding term vector of each word in target part of speech template context corpus as part of speech template vector training pattern is stated, it is right The corresponding part of speech template vector of each sample part of speech template is trained in multiple sample part of speech templates;
Third similarity acquisition submodule, for obtaining each part of speech template vector and target part of speech template pair after training The similarity between part of speech template vector answered.
In one embodiment, synonymous part of speech template determining module, comprising:
Candidate part of speech template determines submodule, for similarity to be met to one or more sample part of speech moulds of preset requirement Plate is determined as candidate part of speech template;
Splice vector acquisition submodule, for splicing the corresponding part of speech template vector of candidate part of speech template and target part of speech mould The corresponding part of speech template vector of plate obtains splicing vector corresponding with candidate part of speech template;
Splice vector input submodule, inputs two disaggregated models for vector will to be spliced;
Synonymous part of speech template determines submodule, meets default two classification output for the output when two disaggregated models and requires When, candidate part of speech template corresponding with splicing vector is determined as to the synonymous part of speech template of target part of speech template.
In one embodiment, the output of part of speech template vector training pattern includes in target part of speech template context corpus In the centre word of the target part of speech template corresponding term vector of M word above and target part of speech template context corpus Positioned at the centre word corresponding term vector of M word hereafter of target part of speech template, M is greater than or equal to 1 positive integer, part of speech template Vector training pattern is skip-gram model.
The third aspect according to an embodiment of the present disclosure provides a kind of synonymous phrase acquisition device, comprising:
Processor;
Memory for storage processor executable instruction;
Wherein, processor is configured as:
Multiple sample corpus are obtained, each sample corpus includes multiple words;
It determines the part of speech of each word in multiple sample corpus, and generates sample corresponding with word each in multiple sample corpus Part of speech template, sample part of speech template include the part of speech parameter of centre word, the part of speech parameter of centre word and adjacent word, and centre word is Word corresponding with sample part of speech template, adjacent word is positioned at the above of centre word and the word adjacent with centre word or adjacent word are Positioned at centre word hereafter and the word adjacent with centre word;
Target corpus is determined in multiple sample corpus, and target part of speech template is determined in multiple sample part of speech templates, and Target part of speech template context corpus is obtained, the centre word of target part of speech template is located in target corpus, in target part of speech template Hereafter corpus includes being located at the centre word of target part of speech template in target corpus above and under the centre word of target part of speech template The word of text;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template;
One or more sample part of speech templates that similarity meets preset requirement are determined as the synonymous of target part of speech template Part of speech template.
Fourth aspect according to an embodiment of the present disclosure provides a kind of computer readable storage medium, is stored thereon with meter Calculation machine instruction, when which is executed by processor the step of any one of first aspect of realization embodiment of the disclosure method.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 a is the flow diagram of synonymous part of speech template acquisition methods shown according to an exemplary embodiment;
Fig. 1 b is the flow diagram of synonymous part of speech template acquisition methods shown according to an exemplary embodiment;
Fig. 1 c is the flow diagram of synonymous part of speech template acquisition methods shown according to an exemplary embodiment;
Fig. 1 d is the flow diagram of synonymous part of speech template acquisition methods shown according to an exemplary embodiment;
Fig. 1 e is the flow diagram of synonymous part of speech template acquisition methods shown according to an exemplary embodiment;
Fig. 2 a is the structural schematic diagram of synonymous part of speech template acquisition device shown according to an exemplary embodiment;
Fig. 2 b is the structural schematic diagram of synonymous part of speech template acquisition device shown according to an exemplary embodiment;
Fig. 2 c is the structural schematic diagram of synonymous part of speech template acquisition device shown according to an exemplary embodiment;
Fig. 2 d is the structural schematic diagram of synonymous part of speech template acquisition device shown according to an exemplary embodiment;
Fig. 2 e is the structural schematic diagram of synonymous part of speech template acquisition device shown according to an exemplary embodiment;
Fig. 3 is a kind of block diagram of device shown according to an exemplary embodiment;
Fig. 4 is a kind of block diagram of device shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
With the high speed development and continuous improvement of people's living standards of science and technology, in recent years, NLU (Natural Language Understanding, natural language understanding) technology rapid development, become more active in artificial intelligence field One of research field.NLU technology can also be referred to as interactive, be that the language of research computer simulation people is handed over Border process enables a computer to the natural language such as Chinese, English etc. for understanding and using human society, realizes the nature between man-machine Speech communication to replace the part mental labour of people, including inquiry data, answers a question, takes passages document, compilation data and one Cut with the working process for closing natural language information.
In NLU technology, finding out corresponding synset for given word is an important topic, wherein synonym refer to Given word pronunciation is different and word that meaning is identical or essentially identical.Synonym replacement can play during human-computer dialogue Critical effect, such as the problem of proposed by replacement user in synonym, the problem of user being proposed with ask It answers the default problem of the preprepared in database to associate, and problem acquisition is further preset according to this and is answered accordingly Case.
It in the related technology, can be by manually marking the synonym of determining target word;It can also be by obtaining a plurality of sample Target corpus where corpus, target word and target word, and according to the target where a plurality of sample corpus, target word and target word Corpus determines the similarity in a plurality of sample corpus between each word and target word, and similarity satisfaction in a plurality of sample corpus is wanted The word asked is determined as the synonym of target word.
Although above scheme can determine the synonym of target word, need to expend more human resources, higher cost, The accuracy of the synonym of target word is poor, compromises user experience.
To solve the above-mentioned problems, in the technical scheme provided by this disclosed embodiment, by obtaining multiple sample corpus, It determines the part of speech of each word in multiple sample corpus, and generates sample part of speech mould corresponding with word each in multiple sample corpus Plate determines target corpus in multiple sample corpus, target part of speech template is determined in multiple sample part of speech templates, and obtain mesh Part of speech template context corpus is marked, according to target part of speech template, target part of speech template context corpus and multiple sample parts of speech Template obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template, by similarity The one or more sample part of speech templates for meeting preset requirement are determined as the synonymous part of speech template of target part of speech template.In this technology In scheme, sample part of speech template can serve to indicate that the meaning of a word of the centre word of sample part of speech template, the center of sample part of speech template The part of speech of word, the part of speech of the word above and adjacent with the centre word of sample part of speech template of the centre word of sample part of speech template or in Heart word hereafter and the part of speech of the word adjacent with centre word, target part of speech template can be equally used for instruction target part of speech template The meaning of a word of centre word, part of speech, the above and and target word of the centre word of target part of speech template of the centre word of target part of speech template The centre word of the part of speech or target part of speech template of the adjacent word of the centre word of property template hereafter and in target part of speech template The part of speech of the adjacent word of heart word.Therefore, in the synonymous part of speech template that identified similarity meets preset requirement, synonymous part of speech The centre word of template and the centre word of target part of speech template are not only closer in itself semantic and itself part of speech, synonym Property template centre word and target part of speech template centre word also in the part of speech that adjacent word is word hereinbefore or hereinafter also more It is close, so that synonymous part of speech template be made to be closer to target part of speech template, it is ensured that the context similitude of the two is higher, thus When carrying out synonymous replacement to target part of speech template according to synonymous part of speech template, the probability for occurring misreading is reduced, user is improved Experience.
Embodiment of the disclosure provides a kind of synonymous part of speech template acquisition methods and includes the following steps as shown in Figure 1a 101 to step 105:
In a step 101, multiple sample corpus are obtained.
Wherein, each sample corpus includes multiple words.
Illustratively, sample corpus can be a word being made of multiple words, or one section including more words Words.
In a step 102, the part of speech of each word in multiple sample corpus is determined, and each in generation and multiple sample corpus The corresponding sample part of speech template of word.
Wherein, sample part of speech template includes the part of speech parameter of centre word, the part of speech parameter of centre word and adjacent word, center Word is word corresponding with sample part of speech template, and adjacent word is word or adjacent positioned at the above of centre word and adjacent with centre word Word be positioned at centre word hereafter and the word adjacent with centre word.
Illustratively, the format of sample part of speech template can be wa+pa+pb, wherein waIndicate template center's word, paIn expression The corresponding part of speech of heart word, pbIndicate that part of speech representated by adjacent word, "+" indicate a connector, this template is known as hereafter mould Plate.Such as when sample corpus is " millet/mobile phone/support/infrared ray/", the part-of-speech tagging result of sample corpus is [' n', ' N', ' v', ' n', ' u'], the centre word of sample part of speech template is " support ", then sample part of speech template can be " support+v+n ".
In step 103, target corpus is determined in multiple sample corpus, and target is determined in multiple sample part of speech templates Part of speech template, and obtain target part of speech template context corpus.
Wherein, the centre word of target part of speech template is located in target corpus, and target part of speech template context corpus includes mesh In poster material above and the centre word word hereafter of target part of speech template positioned at the centre word of target part of speech template.
Illustratively, target corpus is determined in multiple sample corpus, and target word is determined in multiple sample part of speech templates Property template, and obtain target part of speech template context corpus, can the target word of preset according to or user input refer to Show that information is retrieved in multiple sample corpus.And it is determined according to search result and determines that target word refers in multiple sample corpus Show the corpus i.e. target corpus where target word indicated by information.Wherein the target word can be determined that target part of speech template Centre word, target part of speech template can be determined in multiple sample part of speech templates according to the centre word of the target part of speech template, And further obtain target part of speech template context corpus.Or a corpus is randomly selected in multiple sample corpus and is made For target corpus, and centre word of the word as target part of speech template is randomly selected in the target corpus, according to the target The centre word of part of speech template determines target part of speech template in multiple sample part of speech templates, and further obtains object phrase or more Literary corpus.
At step 104, according to target part of speech template, target part of speech template context corpus and multiple sample part of speech moulds Plate obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template.
In step 105, one or more sample part of speech templates that similarity meets preset requirement are determined as target word The synonymous part of speech template of property template.
Illustratively, similarity meets one or more sample part of speech templates of preset requirement, it can be understood as similarity In preset similarity value interval, it is understood that be greater than or equal to preset similarity threshold for similarity.
In the technical scheme provided by this disclosed embodiment, sample part of speech template can serve to indicate that sample part of speech template The meaning of a word of centre word, the part of speech of the centre word of sample part of speech template, the above of the centre word of sample part of speech template and with sample word The part of speech of the adjacent word of the centre word of property template or centre word hereafter and the part of speech of the word adjacent with centre word, target part of speech mould Plate can be equally used for the meaning of a word of the centre word of instruction target part of speech template, part of speech, the target of the centre word of target part of speech template The part of speech of the word above and adjacent with the centre word of target part of speech template of the centre word of part of speech template or target part of speech template Centre word hereafter and the part of speech of the word adjacent with the centre word of target part of speech template, therefore meet in identified similarity pre- If it is required that synonymous part of speech template in, the centre word of synonymous part of speech template and the centre word of target part of speech template are not only in itself language It is closer in justice and itself part of speech, the centre word of synonymous part of speech template and the centre word of target part of speech template are also in adjacent word It is also closer in the part of speech of word i.e. hereinbefore or hereinafter, so that it is closer to synonymous part of speech template with target part of speech template, Ensure that the context similitude of the two is higher, thus when carrying out synonymous replacement to target part of speech template according to synonymous part of speech template, The probability for occurring misreading is reduced, user experience is improved.
In one embodiment, as shown in Figure 1 b, method further includes step 106:
In step 106, multiple sample part of speech templates are screened according to the frequency of occurrences, and is determined according to the selection result The frequency of occurrences meets the sample part of speech template that the default frequency of occurrences requires.
At step 104, according to target part of speech template, target part of speech template context corpus and multiple sample part of speech moulds Plate obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template, can pass through step Rapid 1041 realize:
In step 1041, met according to target part of speech template, target part of speech template context corpus and the frequency of occurrences The sample part of speech template that the default frequency of occurrences requires obtains the frequency of occurrences and meets the sample part of speech template that the default frequency of occurrences requires In similarity between each sample part of speech template and target part of speech template.
Illustratively, frequency meets the sample part of speech template that the default frequency of occurrences requires, it can be understood as multiple sample words Property template in identical sample part of speech template frequency of occurrence be greater than or equal to preset times, wherein preset times can be 5 times, 8 It is secondary or 10 times.
By screening according to the frequency of occurrences to multiple sample part of speech templates, and the frequency of occurrences is determined according to the selection result Meet the sample part of speech template that the default frequency of occurrences requires, according to target part of speech template, target part of speech template context corpus with And the frequency of occurrences meets the sample part of speech template that the default frequency of occurrences requires and obtains the default frequency of occurrences requirement of frequency of occurrences satisfaction Sample part of speech template in similarity between each sample part of speech template and target part of speech template, it can be ensured that for obtaining phase Sample part of speech template like degree is more in multiple sample part of speech template frequency of occurrence, that is, determines the sample word for obtaining similarity Property template be more common sample part of speech template, target part of speech template is determined as according to the similarity in the steps afterwards Synonymous part of speech template can be reduced under the premise of not influencing the accuracy of synonymous part of speech template of target part of speech template as far as possible Operand, speed up processing, so as to improve user experience.
In one embodiment, as illustrated in figure 1 c, method further includes step 107:
In step 107, multiple sample part of speech templates are carried out according to bigram score (two-dimensional grammar score) algorithm Scoring, and determined according to appraisal result and meet the sample part of speech template that default scoring requires.
At step 104, according to target part of speech template, target part of speech template context corpus and multiple sample part of speech moulds Plate obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template, can pass through step Rapid 1042 realize:
In step 1042, met according to target part of speech template, target part of speech template context corpus and scoring default The sample part of speech template that scoring requires obtains scoring and meets each sample part of speech mould in the sample part of speech template that default scoring requires Similarity between plate and target part of speech template.
Illustratively, with sample part of speech template be " support+v+n " to according to bigram score algorithm to multiple sample words Property template carry out scoring be illustrated:
When sample part of speech template is " support+v+n ", the sample part of speech template is carried out according to bigram score algorithm Scoring can be denoted as # (support | v) to count the co-occurrence number of " supports " and part of speech " v " in multiple sample part of speech templates, and unite The co-occurrence number for counting " v " and part of speech " n " in multiple sample part of speech templates is denoted as # (v | n), according to S=# (support | v) * # (v | N) appraisal result S/(# (support) * # (v) * # (n)) is obtained.
Appraisal result meets the sample part of speech template that default scoring requires, it can be understood as appraisal result is located at preset comment Divide in result value interval, it is understood that be greater than or equal to preset appraisal result threshold value for appraisal result, wherein scoring knot Fruit threshold value can be 0.01,0.05 or 0.1.
It is determined by being scored according to bigram score algorithm multiple sample part of speech templates, and according to appraisal result Meet the sample part of speech template that default scoring requires, according to target part of speech template, target part of speech template context corpus and comments The sample part of speech template acquisition scoring for meeting default scoring requirement is divided to meet each in the sample part of speech template of default requirement of scoring Similarity between sample part of speech template and target part of speech template, it can be ensured that the sample part of speech template for obtaining similarity exists Multiple sample part of speech templates are more typical and common, are determined as target part of speech template according to the similarity in the steps afterwards Synonymous part of speech template can be reduced under the premise of not influencing the accuracy of synonymous part of speech template of target part of speech template as far as possible Operand, speed up processing, so as to improve user experience.
In one embodiment, as shown in Figure 1 d, at step 104, according in target part of speech template, target part of speech template Hereafter corpus and multiple sample part of speech templates obtain each sample part of speech template and target part of speech in multiple sample part of speech templates Similarity between template can be realized by step 1043 to step 1046:
In step 1043, multiple sample part of speech templates and target part of speech template context corpus are encoded, with Obtain the corresponding part of speech template ID of each sample part of speech template in multiple sample part of speech templates and target part of speech template context The corresponding word ID of each word in corpus.
In step 1044, by the corresponding part of speech template ID of sample part of speech template each in multiple sample part of speech templates, mesh Input of the corresponding word ID of each word as part of speech template vector training pattern in part of speech template context corpus is marked, by multiple samples It is every in the corresponding part of speech template vector of each sample part of speech template and target part of speech template context corpus in this part of speech template Output of the corresponding term vector of a word as part of speech template vector training pattern, to each sample word in multiple sample part of speech templates The property corresponding part of speech template vector of template is trained.
In step 1045, each part of speech template vector part of speech template corresponding with target part of speech template after training is obtained Similarity between vector.
Illustratively, multiple sample part of speech templates and target part of speech template context corpus are encoded, Ke Yiwei Solely hot one-hot coding is carried out to multiple sample part of speech templates and target part of speech template context corpus.
Part of speech template vector training pattern can be skip-gram model, and the defeated place of part of speech template vector training pattern can With include in target part of speech template context corpus positioned at the centre word corresponding word of M word above of target part of speech template to Be located in amount and target part of speech template context corpus the centre word corresponding word of M word hereafter of target part of speech template to Amount, M are the positive integer more than or equal to 1.For example, when target corpus is " it is new// mobile phone/whether/needs/again/purchase/indigo plant Tooth/earphone ", and the centre word of target part of speech template is " needs ", when M=2, object phrase is short with target above in target corpus M adjacent word of language can for " mobile phone ", " whether ", the object phrase hereafter M word adjacent with object phrase in target corpus It can be " again ", " purchase ".
By being encoded to multiple sample part of speech templates and target part of speech template context corpus, to obtain multiple samples It is each in the corresponding part of speech template ID of each sample part of speech template and target part of speech template context corpus in this part of speech template The corresponding word ID of word, by the corresponding part of speech template ID of sample part of speech template each in multiple sample part of speech templates, target part of speech mould Input of the corresponding word ID of each word as part of speech template vector training pattern in plate context corpus, by multiple sample part of speech moulds Each word is corresponding in the corresponding part of speech template vector of each sample part of speech template and target part of speech template context corpus in plate Output of the term vector as part of speech template vector training pattern, to each sample part of speech template pair in multiple sample part of speech templates The part of speech template vector answered is trained, each part of speech template vector part of speech corresponding with target part of speech template after obtaining training Similarity between template vector.Each part of speech template vector after can enabling training accurately reflects the part of speech template vector The feature of corresponding sample part of speech template, so that it is guaranteed that each part of speech template vector after training is corresponding with target part of speech template Similarity between part of speech template vector reflects the similarity degree between each sample part of speech template and target part of speech template, therefore It is higher according to the synonymous part of speech template accuracy of target part of speech template determined by the similarity, so as to improve user experience.
In one embodiment, as described in Fig. 1 e, in step 105, similarity is met one or more of preset requirement A sample part of speech template is determined as the synonymous part of speech template of target part of speech template, can be real to step 1054 by step 1051 It is existing:
In step 1051, one or more sample part of speech templates that similarity meets preset requirement are determined as candidate word Property template.
In step 1052, splice the corresponding part of speech template vector of candidate part of speech template word corresponding with target part of speech template Property template vector, obtain and the corresponding splicing vector of candidate's part of speech template.
In step 1053, splicing vector is inputted into two disaggregated models.
It, will be with splicing vector when the output of two disaggregated models, which meets default two classification output, to be required in step 1054 Corresponding candidate's part of speech template is determined as the synonymous part of speech template of target part of speech template.
Illustratively, the output of two disaggregated models meets default two classification output and requires, it can be understood as output is located at pre- If output value interval in, it is understood that for similarity be greater than or equal to preset output threshold value, wherein output threshold value can Think 0.8.
It is determined as candidate part of speech template by one or more sample part of speech templates that similarity is met preset requirement, and Splice the corresponding part of speech template vector of candidate part of speech template part of speech template vector corresponding with target part of speech template, obtains and candidate Splicing vector is inputted two disaggregated models by the corresponding splicing vector of part of speech template, when the output of two disaggregated models meets default two When classification output requires, candidate part of speech template corresponding with splicing vector is determined as to the synonymous part of speech mould of target part of speech template Plate can make before one or more sample part of speech templates that similarity is met to preset requirement are determined as synonymous part of speech template It is determined with one or more sample part of speech templates that two disaggregated models meet preset requirement to similarity, avoids similarity full The situation that the sample part of speech template and target part of speech template of sufficient preset requirement differ greatly occurs, to improve determining target word The accuracy of the synonymous part of speech template of property template, improves user experience.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.
Fig. 2 a is a kind of block diagram of the synonymous part of speech template acquisition device 20 shown according to an exemplary embodiment, synonymous Part of speech template acquisition device 20 can may be a part of server for server, and synonymous part of speech template acquisition device 20 can With being implemented in combination with as some or all of of electronic equipment by software, hardware or both.As shown in Figure 2 a, this is synonymous Part of speech template acquisition device 20 includes:
Sample expects acquisition module 201, and for obtaining multiple sample corpus, each sample corpus includes multiple words.
Sample part of speech template generation module 202, for determining the part of speech of each word in multiple sample corpus, and generate with it is more The corresponding sample part of speech template of each word in a sample corpus, sample part of speech template includes the part of speech parameter of centre word, centre word And the part of speech parameter of adjacent word, centre word are word corresponding with sample part of speech template, adjacent word is positioned at the above of centre word And the word adjacent with centre word or adjacent word be positioned at centre word hereafter and the word adjacent with centre word.
Target part of speech template determining module 203, for determining target corpus in multiple sample corpus, in multiple sample words Property template in determine target part of speech template, and obtain target part of speech template context corpus, the center lexeme of target part of speech template In target corpus, target part of speech template context corpus includes that the centre word in target corpus positioned at target part of speech template is above And the word of the centre word of target part of speech template hereafter.
Similarity obtains module 204, for according to target part of speech template, target part of speech template context corpus and multiple Sample part of speech template obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template.
Synonymous part of speech template determining module 205, for similarity to be met to one or more sample parts of speech of preset requirement Template is determined as the synonymous part of speech template of target part of speech template.
In one embodiment, as shown in Figure 2 b, synonymous part of speech template acquisition device 20 further include:
Sample part of speech template filter module 206, for being screened according to the frequency of occurrences to multiple sample part of speech templates, and Determine that the frequency of occurrences meets the sample part of speech template that the default frequency of occurrences requires according to the selection result.
Similarity obtains module 204, comprising:
First similarity acquisition submodule 2041, for according to target part of speech template, target part of speech template context corpus And the frequency of occurrences meets the default frequency of occurrences of sample part of speech template acquisition frequency of occurrences satisfaction that the default frequency of occurrences requires and wants Similarity in the sample part of speech template asked between each sample part of speech template and target part of speech template.
In one embodiment, as shown in Figure 2 c, synonymous part of speech template acquisition device 20 further include:
Sample part of speech template grading module 207, for according to bigram score algorithm to multiple sample part of speech templates into Row scoring, and determined according to appraisal result and meet the sample part of speech template that default scoring requires.
Similarity obtains module 204, comprising:
Second similarity acquisition submodule 2042, for according to target part of speech template, target part of speech template context corpus And the sample part of speech template of the default scoring requirement of scoring satisfaction obtains to score and meets the default sample part of speech template for scoring and requiring In similarity between each sample part of speech template and target part of speech template.
In one embodiment, as shown in Figure 2 d, similarity obtains module 204, comprising:
Encoding submodule 2043, for being compiled to multiple sample part of speech templates and target part of speech template context corpus Code, to obtain each corresponding part of speech template ID of sample part of speech template and the target word in the multiple sample part of speech template The corresponding word ID of each word in property template context corpus.
Part of speech template vector trains submodule 2044, is used for sample part of speech mould each in the multiple sample part of speech template The corresponding part of speech template ID of plate, in the target part of speech template context corpus the corresponding word ID of each word as part of speech template to Measure training pattern input, by the corresponding part of speech template vector of sample part of speech template each in the multiple sample part of speech template with And in the target part of speech template context corpus the corresponding term vector of each word as the defeated of part of speech template vector training pattern Out, the corresponding part of speech template vector of sample part of speech template each in multiple sample part of speech templates is trained.
Third similarity acquisition submodule 2045, for obtaining each part of speech template vector and target part of speech mould after training Similarity between the corresponding part of speech template vector of plate.
In one embodiment, as shown in Figure 2 e, synonymous part of speech template determining module 205, comprising:
Candidate part of speech template determines submodule 2051, for similarity to be met to one or more sample words of preset requirement Property template is determined as candidate part of speech template.
Splice vector acquisition submodule 2052, for splicing the corresponding part of speech template vector of candidate part of speech template and target word Property the corresponding part of speech template vector of template, obtain and the corresponding splicing vector of candidate's part of speech template.
Splice vector input submodule 2053, inputs two disaggregated models for vector will to be spliced.
Synonymous part of speech template determines submodule 2054, meets default two classification output for the output when two disaggregated models and wants When asking, candidate part of speech template corresponding with splicing vector is determined as to the synonymous part of speech template of target part of speech template.
In one embodiment, the output of part of speech template vector training pattern includes in target part of speech template context corpus In the centre word of the target part of speech template corresponding term vector of M word above and target part of speech template context corpus Positioned at the centre word corresponding term vector of M word hereafter of target part of speech template, M >=1, part of speech template vector training pattern is Skip-gram model.
Embodiment of the disclosure provides a kind of synonymous part of speech template acquisition device, which can be with By obtaining multiple sample corpus, the part of speech of each word in multiple sample corpus is determined, and every in generation and multiple sample corpus The corresponding sample part of speech template of a word, determines target corpus in multiple sample corpus, determines in multiple sample part of speech templates Target part of speech template, and target part of speech template context corpus is obtained, according to target part of speech template, target part of speech template context Corpus and multiple sample part of speech templates obtain each sample part of speech template and target part of speech template in multiple sample part of speech templates Between similarity, one or more sample part of speech templates that similarity meets preset requirement are determined as target part of speech template Synonymous part of speech template.In the above-mentioned technical solutions, sample part of speech template can serve to indicate that the centre word of sample part of speech template The meaning of a word, the part of speech of the centre word of sample part of speech template, the centre word of sample part of speech template it is above and with sample part of speech template The part of speech of the adjacent word of centre word or centre word hereafter and the part of speech of the word adjacent with centre word, target part of speech template equally may be used To be used to indicate the meaning of a word of the centre word of target part of speech template, part of speech, the target part of speech template of the centre word of target part of speech template Centre word word above and adjacent with the centre word of target part of speech template part of speech or target part of speech template centre word Hereafter and the part of speech of the word adjacent with the centre word of target part of speech template, therefore meet preset requirement in identified similarity In synonymous part of speech template, the centre word of the centre word of synonymous part of speech template and target part of speech template not only itself it is semantic and from Be closer in body part of speech, the centre word of the centre word of synonymous part of speech template and target part of speech template also adjacent word i.e. above or It is also closer in the part of speech of word hereafter, so that synonymous part of speech template be made to be closer to target part of speech template, it is ensured that the two Context similitude it is higher, thus according to synonymous part of speech template to target part of speech template carry out synonymous replacement when, reduce out The probability now misread, improves user experience.
Fig. 3 is a kind of block diagram of synonymous part of speech template acquisition device 30 shown according to an exemplary embodiment, this is synonymous Part of speech template acquisition device 30 can be server, or a part of server, synonymous part of speech template acquisition device 30 Include:
Processor 301;
Memory 302 for 301 executable instruction of storage processor;
Wherein, processor 301 is configured as:
It is according to an embodiment of the present disclosure in a first aspect, providing a kind of synonymous part of speech template acquisition methods, comprising:
Multiple sample corpus are obtained, each sample corpus includes multiple words;
It determines the part of speech of each word in multiple sample corpus, and generates sample corresponding with word each in multiple sample corpus Part of speech template, sample part of speech template include the part of speech parameter of centre word, the part of speech parameter of centre word and adjacent word, and centre word is Word corresponding with sample part of speech template, adjacent word is positioned at the above of centre word and the word adjacent with centre word or adjacent word are Positioned at centre word hereafter and the word adjacent with centre word;
Target corpus is determined in multiple sample corpus, and target part of speech template is determined in multiple sample part of speech templates, and Target part of speech template context corpus is obtained, the centre word of target part of speech template is located in target corpus, in target part of speech template Hereafter corpus includes being located at the centre word of target part of speech template in target corpus above and under the centre word of target part of speech template The word of text;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template;
One or more sample part of speech templates that similarity meets preset requirement are determined as the synonymous of target part of speech template Part of speech template.
In one embodiment, above-mentioned processor 301 can be additionally configured to:
Multiple sample part of speech templates are screened according to the frequency of occurrences, and determine that the frequency of occurrences meets according to the selection result The sample part of speech template that the default frequency of occurrences requires;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template, comprising:
Meet the default frequency of occurrences according to target part of speech template, target part of speech template context corpus and the frequency of occurrences to want The sample part of speech template asked obtains the frequency of occurrences and meets each sample part of speech in the sample part of speech template that the default frequency of occurrences requires Similarity between template and target part of speech template.
In one embodiment, above-mentioned processor 301 can be additionally configured to:
It is scored according to bigram score algorithm multiple sample part of speech templates, and is determined and met according to appraisal result The sample part of speech template that default scoring requires;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template, comprising:
Meet the sample that default scoring requires according to target part of speech template, target part of speech template context corpus and scoring Part of speech template obtains scoring and meets each sample part of speech template and target part of speech mould in the sample part of speech template that default scoring requires Similarity between plate.
In one embodiment, above-mentioned processor 301 can be additionally configured to:
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template, comprising:
Multiple sample part of speech templates and target part of speech template context corpus are encoded, to obtain multiple sample words Each word pair in the corresponding part of speech template ID of each sample part of speech template and target part of speech template context corpus in property template The word ID answered;
By the corresponding part of speech template ID of sample part of speech template each in multiple sample part of speech templates, above and below target part of speech template Input of the corresponding word ID of each word as part of speech template vector training pattern in literary corpus, will be every in multiple sample part of speech templates In the corresponding part of speech template vector of a sample part of speech template and target part of speech template context corpus the corresponding word of each word to Measure the output as part of speech template vector training pattern, word corresponding to sample part of speech template each in multiple sample part of speech templates Property template vector is trained;
The phase between each part of speech template vector part of speech template vector corresponding with target part of speech template after obtaining training Like degree.
In one embodiment, above-mentioned processor 301 can be additionally configured to:
One or more sample part of speech templates that similarity meets preset requirement are determined as the synonymous of target part of speech template Part of speech template, comprising:
One or more sample part of speech templates that similarity meets preset requirement are determined as candidate part of speech template;
Splice the corresponding part of speech template vector of candidate part of speech template part of speech template vector corresponding with target part of speech template, obtains Take splicing vector corresponding with candidate part of speech template;
Splicing vector is inputted into two disaggregated models;
It, will candidate part of speech corresponding with splicing vector when the output of two disaggregated models, which meets default two classification output, to be required Template is determined as the synonymous part of speech template of target part of speech template.
In one embodiment, above-mentioned processor 301 can be additionally configured to:
The output of part of speech template vector training pattern includes being located at target part of speech mould in target part of speech template context corpus It is located at target part of speech mould in the centre word of the plate corresponding term vector of M word above and target part of speech template context corpus The corresponding term vector of M word of the centre word of plate hereafter, M >=1, part of speech template vector training pattern are skip-gram model.
Embodiment of the disclosure provides a kind of synonymous part of speech template acquisition device, which can be with By obtaining multiple sample corpus, the part of speech of each word in multiple sample corpus is determined, and every in generation and multiple sample corpus The corresponding sample part of speech template of a word, determines target corpus in multiple sample corpus, determines in multiple sample part of speech templates Target part of speech template, and target part of speech template context corpus is obtained, according to target part of speech template, target part of speech template context Corpus and multiple sample part of speech templates obtain each sample part of speech template and target part of speech template in multiple sample part of speech templates Between similarity, one or more sample part of speech templates that similarity meets preset requirement are determined as target part of speech template Synonymous part of speech template.In the above-mentioned technical solutions, sample part of speech template can serve to indicate that the centre word of sample part of speech template The meaning of a word, the part of speech of the centre word of sample part of speech template, the centre word of sample part of speech template it is above and with sample part of speech template The part of speech of the adjacent word of centre word or centre word hereafter and the part of speech of the word adjacent with centre word, target part of speech template equally may be used To be used to indicate the meaning of a word of the centre word of target part of speech template, part of speech, the target part of speech template of the centre word of target part of speech template Centre word word above and adjacent with the centre word of target part of speech template part of speech or target part of speech template centre word Hereafter and the part of speech of the word adjacent with the centre word of target part of speech template, therefore meet preset requirement in identified similarity In synonymous part of speech template, the centre word of the centre word of synonymous part of speech template and target part of speech template not only itself it is semantic and from Be closer in body part of speech, the centre word of the centre word of synonymous part of speech template and target part of speech template also adjacent word i.e. above or It is also closer in the part of speech of word hereafter, so that synonymous part of speech template be made to be closer to target part of speech template, it is ensured that the two Context similitude it is higher, thus according to synonymous part of speech template to target part of speech template carry out synonymous replacement when, reduce out The probability now misread, improves user experience.
Fig. 4 is shown according to an exemplary embodiment a kind of for obtaining the block diagram of the device 400 of synonymous part of speech template. For example, device 400 may be provided as a server.Device 400 includes processing component 422, further comprises one or more A processor, and the memory resource as representated by memory 432, can be by the finger of the execution of processing component 422 for storing It enables, such as application program.The application program stored in memory 432 may include it is one or more each correspond to The module of one group of instruction.In addition, processing component 422 is configured as executing instruction, to execute the above method.
Device 400 can also include the power management that a power supply module 426 is configured as executive device 400, and one has Line or radio network interface 450 are configured as device 400 being connected to network and input and output (I/O) interface 458.Dress Setting 400 can operate based on the operating system for being stored in memory 432, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of device 400 When device executes, so that device 400 is able to carry out synonymous part of speech template acquisition methods, which comprises
Multiple sample corpus are obtained, each sample corpus includes multiple words;
It determines the part of speech of each word in multiple sample corpus, and generates sample corresponding with word each in multiple sample corpus Part of speech template, sample part of speech template include the part of speech parameter of centre word, the part of speech parameter of centre word and adjacent word, and centre word is Word corresponding with sample part of speech template, adjacent word is positioned at the above of centre word and the word adjacent with centre word or adjacent word are Positioned at centre word hereafter and the word adjacent with centre word;
Target corpus is determined in multiple sample corpus, and target part of speech template is determined in multiple sample part of speech templates, and Target part of speech template context corpus is obtained, the centre word of target part of speech template is located in target corpus, in target part of speech template Hereafter corpus includes being located at the centre word of target part of speech template in target corpus above and under the centre word of target part of speech template The word of text;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template;
One or more sample part of speech templates that similarity meets preset requirement are determined as the synonymous of target part of speech template Part of speech template.
In the technical scheme provided by this disclosed embodiment, sample part of speech template can serve to indicate that sample part of speech template The meaning of a word of centre word, the part of speech of the centre word of sample part of speech template, the above of the centre word of sample part of speech template and with sample word The part of speech of the adjacent word of the centre word of property template or centre word hereafter and the part of speech of the word adjacent with centre word, target part of speech mould Plate can be equally used for the meaning of a word of the centre word of instruction target part of speech template, part of speech, the target of the centre word of target part of speech template The part of speech of the word above and adjacent with the centre word of target part of speech template of the centre word of part of speech template or target part of speech template Centre word hereafter and the part of speech of the word adjacent with the centre word of target part of speech template, therefore meet in identified similarity pre- If it is required that synonymous part of speech template in, the centre word of synonymous part of speech template and the centre word of target part of speech template are not only in itself language It is closer in justice and itself part of speech, the centre word of synonymous part of speech template and the centre word of target part of speech template are also in adjacent word It is also closer in the part of speech of word i.e. hereinbefore or hereinafter, so that it is closer to synonymous part of speech template with target part of speech template, Ensure that the context similitude of the two is higher, thus when carrying out synonymous replacement to target part of speech template according to synonymous part of speech template, The probability for occurring misreading is reduced, user experience is improved.
In one embodiment, method further include:
Multiple sample part of speech templates are screened according to the frequency of occurrences, and determine that the frequency of occurrences meets according to the selection result The sample part of speech template that the default frequency of occurrences requires;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template, comprising:
Meet the default frequency of occurrences according to target part of speech template, target part of speech template context corpus and the frequency of occurrences to want The sample part of speech template asked obtains the frequency of occurrences and meets each sample part of speech in the sample part of speech template that the default frequency of occurrences requires Similarity between template and target part of speech template.
In one embodiment, method further include:
It is scored according to bigram score algorithm multiple sample part of speech templates, and is determined and met according to appraisal result The sample part of speech template that default scoring requires;
Multiple samples are obtained according to target part of speech template, target part of speech template context corpus and multiple sample part of speech templates Similarity in this part of speech template between each sample part of speech template and target part of speech template, comprising:
Meet the sample that default scoring requires according to target part of speech template, target part of speech template context corpus and scoring Part of speech template obtains scoring and meets each sample part of speech template and target part of speech mould in the sample part of speech template that default scoring requires Similarity between plate.
In one embodiment, according to target part of speech template, target part of speech template context corpus and multiple sample words Property template obtains the similarity in multiple sample part of speech templates between each sample part of speech template and target part of speech template, comprising:
Multiple sample part of speech templates and target part of speech template context corpus are encoded, to obtain multiple sample words Each word pair in the corresponding part of speech template ID of each sample part of speech template and target part of speech template context corpus in property template The word ID answered;
By the corresponding part of speech template ID of sample part of speech template each in multiple sample part of speech templates, above and below target part of speech template Input of the corresponding word ID of each word as part of speech template vector training pattern in literary corpus, will be every in multiple sample part of speech templates In the corresponding part of speech template vector of a sample part of speech template and target part of speech template context corpus the corresponding word of each word to Measure the output as part of speech template vector training pattern, word corresponding to sample part of speech template each in multiple sample part of speech templates Property template vector is trained;
The phase between each part of speech template vector part of speech template vector corresponding with target part of speech template after obtaining training Like degree.
In one embodiment, one or more sample part of speech templates that similarity meets preset requirement are determined as target The synonymous part of speech template of part of speech template, comprising:
One or more sample part of speech templates that similarity meets preset requirement are determined as candidate part of speech template;
Splice the corresponding part of speech template vector of candidate part of speech template part of speech template vector corresponding with target part of speech template, obtains Take splicing vector corresponding with candidate part of speech template;
Splicing vector is inputted into two disaggregated models;
It, will candidate part of speech corresponding with splicing vector when the output of two disaggregated models, which meets default two classification output, to be required Template is determined as the synonymous part of speech template of target part of speech template.
In one embodiment, the output of part of speech template vector training pattern includes in target part of speech template context corpus In the centre word of the target part of speech template corresponding term vector of M word above and target part of speech template context corpus Positioned at the centre word corresponding term vector of M word hereafter of target part of speech template, M >=1, part of speech template vector training pattern is Skip-gram model.
Those skilled in the art will readily occur to its of the disclosure after considering specification and practicing disclosure disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are by following Claim is pointed out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited by the accompanying claims.

Claims (14)

1. a kind of synonymous part of speech template acquisition methods characterized by comprising
Multiple sample corpus are obtained, each sample corpus includes multiple words;
It determines the part of speech of each word in the multiple sample corpus, and generates corresponding with each word in the multiple sample corpus Sample part of speech template, the sample part of speech template include the part of speech parameter of centre word, the part of speech parameter of centre word and adjacent word, The centre word be word corresponding with the sample part of speech template, the adjacent word for positioned at the above of the centre word and with institute State the adjacent word of centre word or the adjacent word be positioned at the centre word hereafter and the word adjacent with the centre word;
Target corpus is determined in the multiple sample corpus, and target part of speech template is determined in multiple sample part of speech templates, and Target part of speech template context corpus is obtained, the centre word of the target part of speech template is located in the target corpus, the mesh Mark part of speech template context corpus include in the target corpus positioned at the target part of speech template centre word above and institute State the word of the centre word of target part of speech template hereafter;
It is obtained according to the target part of speech template, the target part of speech template context corpus and the multiple sample part of speech template Take the similarity in the multiple sample part of speech template between each sample part of speech template and the target part of speech template;
One or more sample part of speech templates that similarity meets preset requirement are determined as the synonymous of the target part of speech template Part of speech template.
2. synonymous part of speech template acquisition methods according to claim 1, which is characterized in that the method also includes:
The multiple sample part of speech template is screened according to the frequency of occurrences, and determines that the frequency of occurrences meets according to the selection result The sample part of speech template that the default frequency of occurrences requires;
It is described according to the target part of speech template, the target part of speech template context corpus and the multiple sample part of speech mould Plate obtains the similarity in the multiple sample part of speech template between each sample part of speech template and the target part of speech template, packet It includes:
It is set out in advance according to the target part of speech template, the target part of speech template context corpus and the frequency of occurrences satisfaction The sample part of speech template of existing frequency requirement obtains the frequency of occurrences and meets in the sample part of speech template that the default frequency of occurrences requires Similarity between each sample part of speech template and the target part of speech template.
3. synonymous part of speech template acquisition methods according to claim 1, which is characterized in that the method also includes:
It is scored according to bigram score algorithm the multiple sample part of speech template, and is determined and met according to appraisal result The sample part of speech template that default scoring requires;
It is described according to the target part of speech template, the target part of speech template context corpus and the multiple sample part of speech mould Plate obtains the similarity in the multiple sample part of speech template between each sample part of speech template and the target part of speech template, packet It includes:
Meet default scoring according to the target part of speech template, the target part of speech template context corpus and the scoring to want The sample part of speech template asked obtains the scoring and meets each sample part of speech template in the sample part of speech template that default scoring requires With the similarity between the target part of speech template.
4. synonymous part of speech template acquisition methods according to claim 1, which is characterized in that described according to the target part of speech Template, the target part of speech template context corpus and the multiple sample part of speech template obtain the multiple sample part of speech mould Similarity in plate between each sample part of speech template and the target part of speech template, comprising:
The multiple sample part of speech template and the target part of speech template context corpus are encoded, it is described more to obtain Each corresponding part of speech template ID of sample part of speech template and the target part of speech template context language in a sample part of speech template The corresponding word ID of each word in material;
By the corresponding part of speech template ID of sample part of speech template each in the multiple sample part of speech template, the target part of speech template Input of the corresponding word ID of each word as part of speech template vector training pattern in context corpus, by the multiple sample part of speech It is each in each corresponding part of speech template vector of sample part of speech template and the target part of speech template context corpus in template Output of the corresponding term vector of word as part of speech template vector training pattern, to each sample part of speech in multiple sample part of speech templates The corresponding part of speech template vector of template is trained;
The phase between each part of speech template vector part of speech template vector corresponding with the target part of speech template after obtaining training Like degree.
5. synonymous part of speech template acquisition methods according to claim 4, which is characterized in that described that similarity satisfaction is default It is required that one or more sample part of speech templates be determined as the synonymous part of speech template of the target part of speech template, comprising:
One or more sample part of speech templates that similarity meets preset requirement are determined as candidate part of speech template;
Splice the corresponding part of speech template vector of candidate part of speech template part of speech template vector corresponding with the target part of speech template, obtains Take splicing vector corresponding with candidate part of speech template;
The splicing vector is inputted into two disaggregated models;
It, will candidate corresponding with the splicing vector when the output of two disaggregated model, which meets default two classification output, to be required Part of speech template is determined as the synonymous part of speech template of the target part of speech template.
6. synonymous phrase acquisition methods according to claim 4, which is characterized in that the part of speech template vector training pattern Output include the centre word M word above for being located at the target part of speech template in the target part of speech template context corpus In corresponding term vector and the target part of speech template context corpus hereafter positioned at the centre word of the target part of speech template The corresponding term vector of M word, M is positive integer more than or equal to 1, and the part of speech template vector training pattern is skip- Gram model.
7. a kind of synonymous part of speech template acquisition device characterized by comprising
Sample expects acquisition module, and for obtaining multiple sample corpus, each sample corpus includes multiple words;
Sample part of speech template generation module, for determining the part of speech of each word in the multiple sample corpus, and generate with it is described The corresponding sample part of speech template of each word in multiple sample corpus, the sample part of speech template includes the word of centre word, centre word Property parameter and adjacent word part of speech parameter, the centre word be word corresponding with the sample part of speech template, the adjacent word For positioned at the above of the centre word and the word adjacent with the centre word or the adjacent word are under the centre word Text and the word adjacent with the centre word;
Target part of speech template determining module, for determining target corpus in the multiple sample corpus, in multiple sample parts of speech Target part of speech template is determined in template, and obtains target part of speech template context corpus, the centre word of the target part of speech template In the target corpus, the target part of speech template context corpus includes being located at the target word in the target corpus The centre word of property template is above and the centre word of target part of speech template word hereafter;
Similarity obtains module, for according to the target part of speech template, the target part of speech template context corpus and institute It states multiple sample part of speech templates and obtains each sample part of speech template and the target part of speech mould in the multiple sample part of speech template Similarity between plate;
Synonymous part of speech template determining module, one or more sample part of speech templates for similarity to be met to preset requirement determine For the synonymous part of speech template of the target part of speech template.
8. synonymous part of speech template acquisition device according to claim 7, which is characterized in that described device further include:
Sample part of speech template filter module, for being screened according to the frequency of occurrences to the multiple sample part of speech template, and root Determine that the frequency of occurrences meets the sample part of speech template that the default frequency of occurrences requires according to the selection result;
The similarity obtains module, comprising:
First similarity acquisition submodule, for according to the target part of speech template, the target part of speech template context corpus And the frequency of occurrences meets the sample part of speech template acquisition frequency of occurrences satisfaction that the default frequency of occurrences requires and sets out in advance Similarity in the sample part of speech template of existing frequency requirement between each sample part of speech template and the target part of speech template.
9. synonymous part of speech template acquisition device according to claim 7, which is characterized in that described device further include:
Sample part of speech template grading module, for being commented according to bigram score algorithm the multiple sample part of speech template Point, and determined according to appraisal result and meet the sample part of speech template that default scoring requires;
The similarity obtains module, comprising:
Second similarity acquisition submodule, for according to the target part of speech template, the target part of speech template context corpus And the scoring meets the default sample part of speech template for scoring requirement and obtains the sample that the scoring meets default requirement of scoring Similarity in part of speech template between each sample part of speech template and the target part of speech template.
10. synonymous part of speech template acquisition device according to claim 7, which is characterized in that the similarity obtains module, Include:
Encoding submodule, for being compiled to the multiple sample part of speech template and the target part of speech template context corpus Code, to obtain each corresponding part of speech template ID of sample part of speech template and the target word in the multiple sample part of speech template The corresponding word ID of each word in property template context corpus;
Part of speech template vector trains submodule, for sample part of speech template each in the multiple sample part of speech template is corresponding Part of speech template ID, the corresponding word ID of each word is as part of speech template vector training mould in the target part of speech template context corpus The input of type, by the corresponding part of speech template vector of sample part of speech template each in the multiple sample part of speech template and the mesh Output of the corresponding term vector of each word as part of speech template vector training pattern in part of speech template context corpus is marked, to multiple The corresponding part of speech template vector of each sample part of speech template is trained in sample part of speech template;;
Third similarity acquisition submodule, for obtaining each part of speech template vector and the target part of speech template pair after training The similarity between part of speech template vector answered.
11. synonymous part of speech template acquisition device according to claim 10, which is characterized in that the synonymous part of speech template is true Cover half block, comprising:
Candidate part of speech template determines submodule, and one or more sample part of speech templates for similarity to be met preset requirement are true It is set to candidate part of speech template;
Splice vector acquisition submodule, for splicing the corresponding part of speech template vector of candidate part of speech template and the target part of speech mould The corresponding part of speech template vector of plate obtains splicing vector corresponding with candidate part of speech template;
Splice vector input submodule, for the splicing vector to be inputted two disaggregated models;
Synonymous part of speech template determines submodule, meets default two classification output for the output when two disaggregated model and requires When, candidate part of speech template corresponding with the splicing vector is determined as to the synonymous part of speech template of the target part of speech template.
12. synonymous phrase acquisition device according to claim 10, which is characterized in that the part of speech template vector training mould The output of type includes above M of the centre word in the target part of speech template context corpus positioned at the target part of speech template In the corresponding term vector of word and the target part of speech template context corpus under the centre word of the target part of speech template The corresponding term vector of M word of text, M are the positive integer more than or equal to 1, and the part of speech template vector training pattern is skip- Gram model.
13. a kind of synonymous phrase acquisition device characterized by comprising
Processor;
Memory for storage processor executable instruction;
Wherein, the processor is configured to:
Multiple sample corpus are obtained, each sample corpus includes multiple words;
It determines the part of speech of each word in the multiple sample corpus, and generates corresponding with each word in the multiple sample corpus Sample part of speech template, the sample part of speech template include the part of speech parameter of centre word, the part of speech parameter of centre word and adjacent word, The centre word be word corresponding with the sample part of speech template, the adjacent word for positioned at the above of the centre word and with institute State the adjacent word of centre word or the adjacent word be positioned at the centre word hereafter and the word adjacent with the centre word;
Target corpus is determined in the multiple sample corpus, and target part of speech template is determined in multiple sample part of speech templates, and Target part of speech template context corpus is obtained, the centre word of the target part of speech template is located in the target corpus, the mesh Mark part of speech template context corpus include in the target corpus positioned at the target part of speech template centre word above and institute State the word of the centre word of target part of speech template hereafter;
It is obtained according to the target part of speech template, the target part of speech template context corpus and the multiple sample part of speech template Take the similarity in the multiple sample part of speech template between each sample part of speech template and the target part of speech template;
One or more sample part of speech templates that similarity meets preset requirement are determined as the synonymous of the target part of speech template Part of speech template.
14. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the instruction is by processor The step of any one of claim 1-6 the method is realized when execution.
CN201910114457.8A 2019-02-14 2019-02-14 Synonym part-of-speech template acquisition method and device Active CN109918651B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910114457.8A CN109918651B (en) 2019-02-14 2019-02-14 Synonym part-of-speech template acquisition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910114457.8A CN109918651B (en) 2019-02-14 2019-02-14 Synonym part-of-speech template acquisition method and device

Publications (2)

Publication Number Publication Date
CN109918651A true CN109918651A (en) 2019-06-21
CN109918651B CN109918651B (en) 2023-05-02

Family

ID=66961530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910114457.8A Active CN109918651B (en) 2019-02-14 2019-02-14 Synonym part-of-speech template acquisition method and device

Country Status (1)

Country Link
CN (1) CN109918651B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184032A (en) * 2020-09-30 2021-01-05 广州思酷信息科技有限公司 Method and system for intelligently scoring subjective questions

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200336A1 (en) * 2005-03-04 2006-09-07 Microsoft Corporation Creating a lexicon using automatic template matching
US20090119095A1 (en) * 2007-11-05 2009-05-07 Enhanced Medical Decisions. Inc. Machine Learning Systems and Methods for Improved Natural Language Processing
CN103390004A (en) * 2012-05-11 2013-11-13 北京百度网讯科技有限公司 Determination method and determination device for semantic redundancy and corresponding search method and device
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN109062892A (en) * 2018-07-10 2018-12-21 东北大学 A kind of Chinese sentence similarity calculating method based on Word2Vec

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060200336A1 (en) * 2005-03-04 2006-09-07 Microsoft Corporation Creating a lexicon using automatic template matching
US20090119095A1 (en) * 2007-11-05 2009-05-07 Enhanced Medical Decisions. Inc. Machine Learning Systems and Methods for Improved Natural Language Processing
CN103390004A (en) * 2012-05-11 2013-11-13 北京百度网讯科技有限公司 Determination method and determination device for semantic redundancy and corresponding search method and device
CN107291693A (en) * 2017-06-15 2017-10-24 广州赫炎大数据科技有限公司 A kind of semantic computation method for improving term vector model
CN109062892A (en) * 2018-07-10 2018-12-21 东北大学 A kind of Chinese sentence similarity calculating method based on Word2Vec

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184032A (en) * 2020-09-30 2021-01-05 广州思酷信息科技有限公司 Method and system for intelligently scoring subjective questions

Also Published As

Publication number Publication date
CN109918651B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN109766540B (en) General text information extraction method and device, computer equipment and storage medium
Williams et al. A broad-coverage challenge corpus for sentence understanding through inference
WO2020206957A1 (en) Intention recognition method and device for intelligent customer service robot
CN106649742A (en) Database maintenance method and device
CN112487139B (en) Text-based automatic question setting method and device and computer equipment
CN108491486B (en) Method, device, terminal equipment and storage medium for simulating patient inquiry dialogue
JP2020071869A (en) Video-based job provider and job seeker matching server and method
CN111933127A (en) Intention recognition method and intention recognition system with self-learning capability
Hung et al. Towards a method for evaluating naturalness in conversational dialog systems
CN111694937A (en) Interviewing method and device based on artificial intelligence, computer equipment and storage medium
CN108345612A (en) A kind of question processing method and device, a kind of device for issue handling
CN108038208A (en) Training method, device and the storage medium of contextual information identification model
CN110321409A (en) Secondary surface method for testing, device, equipment and storage medium based on artificial intelligence
CN110717021A (en) Input text and related device for obtaining artificial intelligence interview
CN103885924A (en) Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method
CN106502988A (en) The method and apparatus that a kind of objective attribute target attribute is extracted
CN113220854A (en) Intelligent dialogue method and device for machine reading understanding
CN109918651A (en) Synonymous part of speech template acquisition methods and device
Gehrmann et al. Improving human text comprehension through semi-Markov CRF-based neural section title generation
CN110348010A (en) Synonymous phrase acquisition methods and device
CN109934347A (en) Extend the device of question and answer knowledge base
KR102395702B1 (en) Method for providing english education service using step-by-step expanding sentence structure unit
CN110263346B (en) Semantic analysis method based on small sample learning, electronic equipment and storage medium
CN114417827A (en) Text context processing method and device, electronic equipment and storage medium
CN112818090B (en) Method and system for generating answer questions and questions based on harmonic words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant